Wikidata:Property proposal/numeric ID
numeric ID
[edit]Originally proposed at Wikidata:Property proposal/Generic
Description | the stable numeric identifier for an entry on a website, used as qualifier for potentially unstable human-readable values |
---|---|
Data type | External identifier |
Allowed values | Any |
Example 1 | Kadim Al Sahir (Q1362223) → SoundCloud ID (P3040) → 72830036 (used as a qualifier) |
Example 2 | Justin Bieber (Q34086) → Genius artist ID (P2373) → 357 (used as a qualifier) |
Example 3 | Glenn Greenwald (Q5568842) → TED speaker ID (P2611) → 2081 (used as a qualifier) |
Example 4 | Signal (Q19718090) → Instagram username (P2003) → 8725236333 (used as a qualifier) |
Planned use | Ensure that human-readable identifiers (e.g. usernames/slugs) are still current |
Number of IDs in source | Usually one, but can be more in rare exceptions. |
Expected completeness | eventually complete (Q21873974) |
Robot and gadget jobs | Bots should be used to initially fetch the numeric ID and update the human-readable username/slug if it changes. |
Motivation
[edit]We use human-readable values (e.g. usernames and slugs) as identifiers. These identifiers are easier to obtain and make it possible to link to the full external website experience. However, those are too often unstable and can change resulting in obsolete statements. We have been trying to solve this issue using website-specific properties such as Genius artist numeric ID (P6351) and X numeric user ID (P6552), but this is limiting and biased towards giant digital monopolies. A more proper approach would be to have a generic property that can be used as a qualifier for any potentially unstable identifier. For example, if the only identifier we have for an account on SoundCloud is the human-readable username (e.g. oum), it will be rendered useless once the username changes and it would be difficult to trace the target. As a solution, we should link both the human-readable identifier and, as a qualifier, the site-specific numeric ID (in the oum example: 553726).OsamaK (talk) 06:57, 7 July 2020 (UTC)
Discussion
[edit]- Comment I think Genius artist numeric ID (P6351) is more consistent with Wikidata's concept for external identifier and its formatter URL. It uses a separate property for the stable identifier. The ideal way would be to qualify that with the username. --- Jura 08:01, 7 July 2020 (UTC)
- @Jura1: I feel if we had to generalize a rule on "human-readable identifiers linked to full-featured website" vs. "stable identifiers accessed via an API or otherwise limited website version", I would choose the former human-friendly version to make contributing far more accessible for users regardless of their technical background. Being a cooperative project, this has to bare an immense value. The stabilizing work should be left to bots.--OsamaK (talk) 08:37, 7 July 2020 (UTC)
- It seems somewhat hard to get Wikibase to change for the above to work.
If the numeric ones are stable, there isn't much input needed once the bot set it correctly. --- Jura 08:41, 7 July 2020 (UTC)- What kind of change is referred to here? When it comes to social media identifiers for example, we are already using human-friendly identifiers where ever possible. --OsamaK (talk) 08:47, 7 July 2020 (UTC)
- We try(tried) to use stable identifiers. The above samples can only link in your proposal. --- Jura 09:16, 7 July 2020 (UTC)
- It's true that linking cannot be currently implemented with such a generic property. The workaround would be to have a separate property for each stable website identifier or to redefine the existing properties to refer to stable identifiers instead of the human-friendly ones. Both are not practical and our decision making mechanism in Wikidata would fail to get this task done (would take months if not years). I would accept the down side of giving up linking in exchange of one generic property for stable identifiers.--OsamaK (talk) 10:58, 7 July 2020 (UTC)
- We try(tried) to use stable identifiers. The above samples can only link in your proposal. --- Jura 09:16, 7 July 2020 (UTC)
- What kind of change is referred to here? When it comes to social media identifiers for example, we are already using human-friendly identifiers where ever possible. --OsamaK (talk) 08:47, 7 July 2020 (UTC)
- It seems somewhat hard to get Wikibase to change for the above to work.
- @Jura1: I feel if we had to generalize a rule on "human-readable identifiers linked to full-featured website" vs. "stable identifiers accessed via an API or otherwise limited website version", I would choose the former human-friendly version to make contributing far more accessible for users regardless of their technical background. Being a cooperative project, this has to bare an immense value. The stabilizing work should be left to bots.--OsamaK (talk) 08:37, 7 July 2020 (UTC)
- Oppose The logic seems backwards here to me. I don't understand why we should prioritise human readable identifiers which are unstable? This is Wikidata - machine readable, structured data. You mention in the motivation that these unstable identifiers become useless as soon as they change, but to me that implies that they are inherently useless as a standalone identifier at all times because with just the unstable identifier alone there is no way to tell if it was ever true/untrue without a stable reference.
In contrast, with a stable identifier, you know it doesn't change - so you can always access the identifier and check if the subject is correct to know that it always was or wasn't correct. I think it makes more sense to use the stable numeric identifiers (the true identifiers), and capture username data separately using website username or ID (P554). In theory, any relevant human readable string (not a true identifier) can be retrieved by machine via access using the numeric identifier. An extended discussion around this topic also took place at Wikidata:Requests_for_permissions/Bot/SilentSpikeBot. --SilentSpike (talk) 11:35, 10 July 2020 (UTC)
- @SilentSpike, ChristianKl: I see your point. If the stable identifier is easily accessible, it should be adopted. In other cases, however, the logic behind using the human-readable property is to lower the technical bar required to contribute, which is an immensely worthwhile goal. At the end, this is what Wikimedia projects have always been about. Wikidata, specifically, has always been described as "read and edited by both humans and machines." If the baseline was to require people to dig into HTML or some (private?) API, we would only be open for a small minority of contributors. We need anyone to be able to fill-in Twitter or Soundcloud username, with minimal barriers. Sustainability and permanence can be achieved by bots without compromising the mission nor utility.--OsamaK (talk) 12:33, 18 July 2020 (UTC)
- Oppose per SilentSpike. ChristianKl ❪✉❫ 18:22, 16 July 2020 (UTC)
- I agree we should always have both. Not sure if this is the right way to name or qualify the numeric name/id vs the human-readable id. Is there no other current property that can be used to say "additional ID used by this catalog/site/project"? Sj (talk) 15:22, 17 August 2020 (UTC)
- I propose to introduce a new datatype. See phab:T260639.--GZWDer (talk) 00:59, 18 August 2020 (UTC)
- Oppose as per SilentSpike. Even if there was not such an argument against, the original proposal was flawed since stable identifier are just that and are not always numbers. To that end I changed the datatype of the proposal to External identifier —Uzume (talk) 02:30, 12 December 2020 (UTC)
- Not done There is no obvious consensus to promote creation. (`・ω・´) (talk) 09:41, 8 February 2021 (UTC)