Task to collect some preliminary work on T212843: [EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites. This initial implementation will likely not feature fine-grained usage tracking yet, and parser functions are out of scope for now.
Description
Details
Event Timeline
Change 544205 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme Lua module
https://gerrit.wikimedia.org/r/544205
Change 544206 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme.entity.lexeme Lua module
https://gerrit.wikimedia.org/r/544206
Change 544207 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Make mw.wikibase.lexeme.entity.lexeme inherit mw.wikibase.entity
https://gerrit.wikimedia.org/r/544207
Change 544208 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Specify Lua module to be used for Lexeme entities
https://gerrit.wikimedia.org/r/544208
Change 544234 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add documentation for rudimentary Lua modules
https://gerrit.wikimedia.org/r/544234
The patches linked above add support for code of the following sort:
mw.wikibase.lexeme.getLanguage( 'L1' ) mw.wikibase.getEntity( 'L2' ):getLexicalCategory()
Missing features:
- Lua modules for Senses and Forms, likewise wired up with mw.wikibase.getEntity()
- getSenses() and getForms() functions/methods in the Lexeme modules, returning “instances” of the corresponding modules
Also, lots of cleanup and testing is probably still needed.
Usage tracking is also going to be interesting. Currently, it’s strictly entity-based, as far as I can see (as opposed to page-based), both on the repo (wb_changes_subscription) and on the client (wbc_entity_usage). Does this mean that a Wiktionary page for one lexeme may end up with dozens, if not hundreds of wbc_entity_usage rows, one per form (and aspect)? Or should we say that entity usage stops at subentities, and any usage of a lexeme implies usage of all of its forms? Or do we somehow group usages together, similar as for other aspects, and turn form usages into one “all forms of this lexeme” usage once they exceed a certain threshold?
Change 545377 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add all-usage for all subentities
https://gerrit.wikimedia.org/r/545377
Change 545378 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add getLemmas function to Lua modules
https://gerrit.wikimedia.org/r/545378
Change 545379 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Forms
https://gerrit.wikimedia.org/r/545379
Change 545537 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Senses
https://gerrit.wikimedia.org/r/545537
Change 544205 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme Lua module
https://gerrit.wikimedia.org/r/544205
Change 544206 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme.entity.lexeme Lua module
https://gerrit.wikimedia.org/r/544206
Change 544207 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Make mw.wikibase.lexeme.entity.lexeme inherit mw.wikibase.entity
https://gerrit.wikimedia.org/r/544207
Change 544208 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Specify Lua module to be used for Lexeme entities
https://gerrit.wikimedia.org/r/544208
Change 544234 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add documentation for rudimentary Lua modules
https://gerrit.wikimedia.org/r/544234
Change 545377 abandoned by Lucas Werkmeister (WMDE):
Add all-usage for all subentities
Reason:
not necessary after all
https://gerrit.wikimedia.org/r/545377
Change 545378 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add getLemmas function to Lua modules
https://gerrit.wikimedia.org/r/545378
Change 550662 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Change function declarations to Lua style
https://gerrit.wikimedia.org/r/550662
Change 554116 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Capitalize Lexeme more consistently
https://gerrit.wikimedia.org/r/554116
Change 554117 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add mw.wikibase.lexeme.splitLexemeId function
https://gerrit.wikimedia.org/r/554117
Change 554116 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Capitalize Lexeme more consistently
https://gerrit.wikimedia.org/r/554116
Change 554117 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add mw.wikibase.lexeme.splitLexemeId function
https://gerrit.wikimedia.org/r/554117
One request: could we guard the code behind a per project feature flag? So we can deploy it but switch it on and off through a configuration.
It already is behind a feature flag, $wgLexemeEnableDataTransclusion (after all, the first changes were already merged).
Change 545379 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Forms
https://gerrit.wikimedia.org/r/545379
Change 545537 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Senses
https://gerrit.wikimedia.org/r/545537
Change 550662 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Change function declarations to Lua style
https://gerrit.wikimedia.org/r/550662
The currently merged code tracks lots of ‘X’ (“all”) usages, but it still doesn’t track enough usage. Specifically, if you use mw.wikibase.getEntity( 'L1-S1' ), then the page will get a usage for L1-S1#X, but not for L1; and because we only look for pages using L1 when dispatching changes, the change won’t be notified when the lexeme is edited, and may continue to show untracked data.
I think fixing this is a hard requirement before we enable lexeme data transclusion in production. The easiest solution would be to make sure that mw.wikibase.getEntity( 'L1-S1' ) also tracks an L1#X usage, I’ll see if I can make that work.
Change 732998 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Track \u201Call\u201D usage for whole Lexeme instead of Sense/Form
https://gerrit.wikimedia.org/r/732998
Hm, there’s another thing that I forgot wasn’t done yet: the senses (and probably forms) of a returned lexeme entity aren’t entities themselves, they’re ordinary tables. Only the custom getForms() and getSenses() methods take care of properly creating entities.
mw.wikibase.getEntity('L1').senses[1]:getGlosses() -- error: attempt to call method 'getGlosses' (a nil value). mw.wikibase.getEntity('L1'):getSenses()[1]:getGlosses() -- works
This isn’t as serious as the other issue – by the time getEntity('L1') returns, we’ve already tracked an “all” usage on L1, so being able to get the senses/forms without proper metatables doesn’t constitute a bypass of usage tracking or anything – but it’s still kind of strange, I guess…
Change 732998 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Track \u201Call\u201D usage for whole Lexeme instead of Sense/Form
https://gerrit.wikimedia.org/r/732998
The senses and forms of a returned lexeme entity aren’t entities themselves, they’re ordinary tables. Only the custom getForms() and getSenses() methods take care of properly creating entities.
I think we can leave this open for feedback after the initial Beta rollout. Should getForms() and getSenses() exist at all? Or should .forms and .senses contain entity objects already? And in either case, should they be indexed numerically (1, 2, …) or by ID (L1-F1, L1-F2, … – or just F1, F2, …?)? Maybe the initial testers have some feedback on this.
I think we can leave .forms and .senses as they are at the moment – not documented as part of the stable interface, but not particularly hidden either. Similar to the .claims on all entities (I suppose they’re .statements on MediaInfo?), where we expect users to use :getAllStatements() and other functions instead.
Change 805771 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Declare Lexeme Lua interface stable
https://gerrit.wikimedia.org/r/805771
Change 805771 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Declare Lexeme Lua interface stable
https://gerrit.wikimedia.org/r/805771