- Training models
- Darija Wikipedia ary
- Danish Wikipedia da
- Dinka Wikipedia din
[x] Zazaki Wikipedia diq- Lower Sorbian Wikipedia dsb
[x] Divehi Wikipedia dv-
Dzongkha Wikipedia dzsee T304551#8412493 - Ewe Wikipedia ee
- Greek Wikipedia el
- Emiliano-Romagnolo Wikipedia eml
- Esperanto Wikipedia eo
- Estonian Wikipedia et
- Basque Wikipedia eu
- Extremaduran Wikipedia ext
- Tumbuka Wikipedia tum
- Models verification
- Publish Datasets
- Populate the excluded section titles
- Deploy back-end
- Check how the model works on the wikis
- In Search, use hasrecommendation:link to find articles
- Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
- Inform communities
- Deploy front-end
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | • lbowmaker | T307881 Scaling of link suggestions service | |||
Open | Trizek-WMF | T304110 [EPIC] Deploy "add a link" to all Wikipedias | |||
Resolved | Sgs | T304551 Deploy "add a link" to 7th round of wikis |
Event Timeline
I added Darija Wikipedia (ary) that was skipped from my lists when I created the deployment rounds.
14/15 models were trained successfully in the 7th round of wikis.
The Dzongkha Wikipedia (dzwiki) returned the error in the screenshot below.
I checked the database dumps for dzwiki and they exist.
Going to investigate what the problem could be.
I contacted @MGerlach on whether this error means that there is not enough data to train the model and he said:
Interesting. indeed, it seems that there is not enough data to train the model. The wiki has only around 500 articles (wikistats). looking at the wiki, it seems that most of the articles contain few or no links (I checked a few examples of random pages https://dz.wikipedia.org/wiki/Special:Random ). this means we dont actually have any training examples. as a result, it seems that the table with the features is empty.
For now, it's better to skip the dzwiki, we shall train its model in the future when there is enough training data.
Model evaluation has been completed and below are the backtesting results:
[email protected] | [email protected] | |
arywiki | 0.79 | 0.44 |
dawiki | 0.79 | 0.48 |
dinwiki | 0.94 | 0.48 |
diqwiki | 0.40 | 0.90 |
dsbwiki | 0.89 | 0.66 |
dvwiki | 0.67 | 0.02 |
eewiki | 0.93 | 0.82 |
elwiki | 0.79 | 0.44 |
emlwiki | 0.89 | 0.57 |
eowiki | 0.82 | 0.51 |
etwiki | 0.77 | 0.33 |
euwiki | 0.86 | 0.37 |
extwiki | 0.75 | 0.50 |
tumwiki | 0.84 | 0.62 |
CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.
The conclusion on the backtesting results is that most of the languages look fine but there are some redflags:
- diqwiki has very low precision (0.40).
- although dvwiki's precision (0.67) is not so far from the recommended one (0.75), it has an extremely low recall (0.02).
Talked to @MGerlach about diqwiki and dvwiki and he said:
I would agree with your observation and would recommend not deploy to diqwiki and dvwiki.
- diqwiki: precision too low, i.e. recommendations are not good
- dvwiki: recall extremely low. this indicates that we will likely not be able to generate many recommendations.
As recommended, it's best not to proceed with diqwiki and dvwiki until there is improved performance.
@kostajh, we published datasets for all 12/15 models in this round that passed the evaluation.
I ran this script for adding the link-recommendation task type and and populating the excluded sections:
PHAB=T304551 for WIKI in arywiki dawiki dinwiki diqwiki dsbwiki dvwiki eewiki elwiki emlwiki eowiki etwiki euwiki extwiki tumwiki; do ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'` mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --create-only \ --json \ --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ link-recommendation \ '{ "type": "link-recommendation", "group": "easy" }' jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \ | jq --slurp --compact-output "unique" \ | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --json \ --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \ link-recommendation.excludedSections \ "`cat`" echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json" echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next" echo "Press <Enter> to continue" read # give time for manual verification done
I checked the configuration and it seemed to be correctly updated in all wikis. The only worth mention is for tumwiki which didn't get any excluded section on its config.
Change 892363 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: Enable link recommendation for 7th round wikis
https://gerrit.wikimedia.org/r/892363
Change 892363 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis
https://gerrit.wikimedia.org/r/892363
Mentioned in SAL (#wikimedia-operations) [2023-03-15T20:13:23Z] <samtar@deploy2002> Started scap: Backport for [[gerrit:899673|GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363|GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]]
Mentioned in SAL (#wikimedia-operations) [2023-03-15T20:14:55Z] <samtar@deploy2002> sgimeno and samtar: Backport for [[gerrit:899673|GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363|GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
Mentioned in SAL (#wikimedia-operations) [2023-03-15T20:23:36Z] <samtar@deploy2002> Finished scap: Backport for [[gerrit:899673|GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550)]], [[gerrit:892363|GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)]] (duration: 10m 12s)
Per @kevinbazira comment above it seems these two wikis have been red-flagged. I also missed this on the configuration step so I will rollback the change there.
- ext.wp returns "There were no results matching the query."
The dataset seems correctly exported and it appears in the wikis.txt file. I'm investigating this.
Change 902131 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: disable add a link backend
https://gerrit.wikimedia.org/r/902131
Change 902131 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: disable add a link backend
https://gerrit.wikimedia.org/r/902131
Mentioned in SAL (#wikimedia-operations) [2023-03-23T13:28:16Z] <samtar@deploy2002> Started scap: Backport for [[gerrit:902131|GrowthExperiments: disable add a link backend (T304551)]]
Mentioned in SAL (#wikimedia-operations) [2023-03-23T13:29:49Z] <samtar@deploy2002> samtar and sgimeno: Backport for [[gerrit:902131|GrowthExperiments: disable add a link backend (T304551)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
Mentioned in SAL (#wikimedia-operations) [2023-03-23T13:36:22Z] <samtar@deploy2002> Finished scap: Backport for [[gerrit:902131|GrowthExperiments: disable add a link backend (T304551)]] (duration: 08m 05s)
Sure. I've scheduled the deploy for today at 15h UTC+2. Are communities already informed?
Change 905950 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable add link frontend and backend
https://gerrit.wikimedia.org/r/905950
Communities haven't yet been informed as I was waiting for your reply. One week passed since I suggested the date. :)
We have to reschedule it for next week. Is Wed April 12 possible?
Change 907899 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable add link frontend in 7th round wikis
https://gerrit.wikimedia.org/r/907899
Change 907899 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: enable add link frontend in 7,8th round wikis
https://gerrit.wikimedia.org/r/907899
Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:07:11Z] <lucaswerkmeister-wmde@deploy2002> Started scap: Backport for [[gerrit:907899|GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)]]
Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:08:33Z] <lucaswerkmeister-wmde@deploy2002> sgimeno and lucaswerkmeister-wmde: Backport for [[gerrit:907899|GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
Mentioned in SAL (#wikimedia-operations) [2023-04-12T13:20:42Z] <lucaswerkmeister-wmde@deploy2002> Finished scap: Backport for [[gerrit:907899|GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)]] (duration: 13m 30s)
Checked tumwiki, elwiki, and dawiki - "Add a link" feature seem to be working as expected; no issues found.