User:Polygnotus/Todoes
Appearance
Typotool
[edit]Combine
[edit]- User:Polygnotus/typo.js knows where NOT to fix typos.
- User:Polygnotus/Scripts/GetAPIBatch.js gets typos and FUWs and articlenames from API
- https://en.wikipedia.org/wiki/User:Cacycle/diff
- User:Polygnotus/Scripts/DiffCompare.js got a diff comparison proof of concept based on https://github.com/google/diff-match-patch/tree/master/javascript see also https://neil.fraser.name/software/diff_match_patch/demos/diff.html
- User:Polygnotus/Scripts/WikiTypoInterface.js cleans up the interface (a bit like User:Polygnotus/Scripts/NoDistractions.js) and adds duplicates of the Publish/preview/changes buttons to the top of the wikiEditor
- User:Polygnotus/Scripts/GetContext.js retrieves the context of a typo (Input: article name, word, nth occurrence. Output: the 50 words before and after that word)
- but what if there are multiple occurences? How does Wikipedia:Correct_typos_in_one_click handle that?
- I could use something like User:Polygnotus/Scripts/WikiPageToArray.js to turn a wikipage in to a Javascript array. Perhaps use that for the lists of templates and parameters? testpage
Interface
[edit]- Tool should have buttons for [<<previous fuw] [<previous typo] [next typo>] [next fuw>>]
- it should have stats how many skipped how many fixed
- it should have [add to blacklist] button which blacklists the typo? the article? the FUW?
- it should be possible to add templates like {{as written}} and {{quote}} and the like <------I should make a list of these
- it could have a form and a button where you can report that something should be excluded but no regex has been written for it yet
"Inspiration"
[edit]- In the AWB source code in /WikiFunctions/WikiRegexes.cs there are the regexes to exclude matches based on
chkIgnoreLinks
andchkIgnoreMore
(top left of the "Find & Replace" window).- chkIgnoreLinks: Ignore external/interwiki links, images, nowiki, math and <!-- -->
- chkIgnoreMore: Ignore templates, refs, link targets and headings
- Read Wikipedia:Typo_Team/moss sourcecode and documentation
Blacklisting/Whitelisting
[edit]Moss is pretty interesting.
Perhaps ask here what the best approach is. Parse a dump, run a query, whatever.
Wikipedia:Typo_Team/moss#How_the_lists_are_made says:
- Words that appear in titles in the English Wiktionary (which has definitions of all words in all languages, excluding proper nouns and systematic words like chemical names and large numbers)
- Words that appear in titles in the English Wikipedia (which explains some things that don't appear in the dictionary)
- Words that appear in titles in the Wikispecies (which has many technical words that don't appear in the dictionary or encyclopedia)
but I think I should also check out:
wikidata:
- given name https://www.wikidata.org/wiki/Q202444
- family name https://www.wikidata.org/wiki/Q101352
- placenames are complicated see [1] and [2]
wiktionary:
- https://en.wiktionary.org/wiki/Category:English_lemmas
- https://en.wiktionary.org/wiki/Category:English_nouns (contains places and things)
Titles
[edit]- https://dumps.wikimedia.org/enwiktionary/20240901/enwiktionary-20240901-all-titles-in-ns0.gz
- https://dumps.wikimedia.org/enwiki/20240901/enwiki-20240901-all-titles-in-ns0.gz
- https://dumps.wikimedia.org/specieswiki/20240901/specieswiki-20240901-all-titles-in-ns0.gz
- https://dumps.wikimedia.org/wikidatawiki/20240901/wikidatawiki-20240901-all-titles-in-ns0.gz
Lets see how many of these are in the list. Probably not many.
Allow people to easily judge the reliability of sources
[edit]- Make a API with 5 endpoints
- voteup
- votedown
- trustedvoteup (which would count for, lets say, +5 votes)
- trustedvotedown (ditto, but -5)
- list
- Make a javascript that:
- adds up and down arrows to each source. Click the up arrow to vote that a source is reliable, down arrow for unreliable.
- colors the source a shade of green or red depending on the amount of amount of up or downvotes if there are more than x up or downvotes
- Shows how many ratings this source has.
- Give the trusted people the ability to authenticate to the API and then rate sources.
I already made a list of the top 10.000 most often referenced domains, I could use that to make a table sorted by number of occurences where people can easily rate them.
Duplicate References
[edit]- User:Polygnotus/Scripts/ReferenceHighlighter.js does highlight the [1]'s when you click one of them but not the a b c in the reflist. Fix that and then add it to duplicate references. If I click one duplicate reference I want all others to be highlighted so I can easily see where a source is re-used.
- improving where the template is added per MOS:ORDER, perhaps use Wikipedia:Morebits
Toolhub
[edit]- Check which of the tools 404 like https://toolhub.wikimedia.org/tools/toolforge-missingpedia
Identical references
[edit]- Wikipedia:Village_pump_(technical)/Archive_213#Duplicated_citations
- Wikipedia:Bot_requests#Bot_that_condenses_identical_references
Also detecting CS1 and CS2 errors by bot. I think the REST API was the most viable solution. Maybe transforms?
Google Books
[edit]Diff CSS
[edit]- I had a trick to ensure Diff CSS got loaded iirc but then I forgot. It wasn't in chunk0, nor in the links to the chunks... Adding ?diff=0 does work but I had a more elegant solution iirc. User:Polygnotus/hmm?diff=0
- Or was it just that I added the CSS in the HTML version. If so, I should inline it in the wikicode version.