Wikidata:Requests for permissions/Bot/LocodeBot
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 21:21, 16 March 2024 (UTC)[reply]
LocodeBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: EvanProdromou (talk • contribs • logs)
Task/s: en:UN/LOCODE is a coding system for cities, airports, rail terminals, ports and other commercially-important location provided by en:UNECE and used by UN organizations and others around the world. As of this writing, there are about 116K locodes assigned. Only about 10K items in Wikidata have a locode value.
This bot matches LOCODEs to Wikidata entries by ISO 3166-2 region code, name, and distance (if defined). The current version, which only does exact matching of names, increases the number of matched LOCODEs to almost 33K.
Having better coverage for LOCODEs in Wikidata makes the service more useful for organizations that use LOCODE as an identifier. My organization, Open Earth Foundation, uses LOCODEs to identify cities so we can determine greenhouse gas emissions. We use contextual data like population and area from Wikidata, so having good locode coverage helps us look this information up.
Code: https://github.com/Open-Earth-Foundation/LocodeBot
The code is Python, GPLv3, using pywikibot and sparqlwrapper.
The match data is in the data
subdir in the match1.csv, match2.csv, and match3.csv files.
Function details:
There are three steps:
- it first extracts the city data from UN/LOCODE and Wikidata
- it then runs the matching algorithm based on region, name, and location
- it uploads the results to Wikidata.
There are different scripts for each step.
--EvanProdromou (talk) 22:56, 27 February 2024 (UTC)[reply]
- is there no reference that can be added? BrokenSegue (talk) 00:14, 28 February 2024 (UTC)[reply]
- but generally looks good to me Support BrokenSegue (talk) 00:17, 28 February 2024 (UTC)[reply]
- Yes, there's a page for each country that lists all the codes for that country. So, for Denmark, all the codes are listed at https://service.unece.org/trade/locode/dk.htm . Will that work? EvanProdromou (talk) 00:48, 28 February 2024 (UTC)[reply]
- yup that sounds like a good reference. BrokenSegue (talk) 00:58, 28 February 2024 (UTC)[reply]
- OK, I added it. You can see an example here: Q32722126. EvanProdromou (talk) 03:38, 28 February 2024 (UTC)[reply]
- yup that sounds like a good reference. BrokenSegue (talk) 00:58, 28 February 2024 (UTC)[reply]
- I did a sample run of 100 cities with the bot, and it seems to be doing OK. There's one problem that I can see so far, which you can see in Q30022406. It's a town in Italy, the capital of a third-level administrative division called a municipality. The municipality has already been tagged with the same code, so there's a conflict. There seem to have been a few like this. I'm going to see if I can figure out how to avoid these conflicts, although I think the intent of UN/LOCODE is for urban settlements, not for administrative divisions. --EvanProdromou (talk) 04:18, 28 February 2024 (UTC)[reply]
- I did a second sample run, taking the easy way out of just skipping cities/municipalities in Italy. --EvanProdromou (talk) 19:40, 12 March 2024 (UTC)[reply]
- I am going to approve the bot in a couple of days provided no objections have been raised.--Ymblanter (talk) 19:53, 12 March 2024 (UTC)[reply]