Spike: How to obtain articles that have images with missing alt text
Open, LowPublic
Actions

Assigned To

Authored By

	Dbrant
	Aug 16 2023, 7:00 PM

Description

Here are some thoughts on alt text:

Question 1: If we have the wikitext of an article, how do we tell if it has images with missing alt text?

In principle this can be done with a regex. Supposing that we have the full wikitext of an article, we can search it for the following regex:
\[\[File:((?!\|\s*alt\s*=)[^]])*\]\]

If this regex produces matches, then each of those matches will be an instance of a [[File:...]] link that doesn’t have an alt= parameter.

Keep in mind, though, that the names of the File: namespace and the alt= parameter are specific to English wikipedia, and would need to be localized for other language wikis. (I'm pretty sure we hard-code a list of File namespace localizations, and the alt parameter is a magic-word that can be localized via API.)

Question 2: How do we insert alt text into an existing File link?

For each match of the regex above...
To insert alt text into it:

Go to the location of the regex match.
Parse until the end of the "File:..." name, i.e. until you reach the first pipe | character or the closing ] brackets.
Insert |alt=<alt text> at that location.

Question 3: How do we get a list and/or queue of articles that have images with missing alt text?

This is a little more tricky. Theoretically it’s possible to feed the same regex directly into CirrusSearch, and search for insource:/\[\[File:((?!\|\s*alt\s*=)[^]])*\]\]/
This will actually search the contents of all articles using that regex, and return matches. The problem is that using regexes with CirrusSearch is very expensive, and will likely cause timeouts and other issues, therefore this approach would not be recommended.

This means we have to use a more efficient method to search articles (imperfectly), and then perform further searching ourselves within those results.

Idea 1:

https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&generator=pageswithprop&gpwppropname=page_image_free&gpwpprop=ids|title|value

This uses generator=pageswithprop which gives us pages that have a specific property, and the property we look for is page_image_free. This ensures that all the returned pages have at least one image in them. You can then fetch the wikitext of each of these articles, and perform the regex search on them (from above). The downside of this generator is that it's not randomized, and returns the names of articles in alphabetical order.

Idea 2:

https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&generator=random&grnnamespace=0&grnfilterredir=nonredirects&grnlimit=50

This uses generator=random which gives us literally random articles (within the main namespace), and we'll look for articles that have a page_image_free property, which ensures that the article has at least one image. You can then fetch the wikitext of each of these articles, and perform the regex search on them (from above). The downside of this is that random will produce a lot of misses. However, if you take a large enough random sample (the query above gives 50), it's highly likely that a few of them will have images. And then, out of those articles that have images, it's ~90% likely that they're missing alt text.

Idea 3:

Set up a new backend service that pre-populates a list of articles (with a db query that runs periodically), and serve up that list to clients. (similar to GrowthExperiments or recommendation-api)

Details

Subject	Repo	Branch	Lines +/-
Bump wikimedia/parsoid to 0.20.0-a8	mediawiki/vendor	master	+4 K -1 K
Suppress missing-image-alt-text lint on aria-hidden or role=presentation	mediawiki/services/parsoid	master	+57 -0
Bump wikimedia/parsoid to 0.20.0-a7	mediawiki/vendor	master	+162 -70
Add missing-image-alt-text lint	mediawiki/services/parsoid	master	+139 -0
Descriptions for new linter error missing-image-alt-text	mediawiki/core	master	+3 -0
Add hidden lint missing-image-alt-text	mediawiki/extensions/Linter	master	+16 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T357437 [Epic] Alt-Text Suggested Edit Experiment on iOS
Open	None	T371333 [Sub-epic] Alt Text Suggested Edit: Improvements for permanent feature
Open	bvibber	T344378 Spike: How to obtain articles that have images with missing alt text

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2023, 7:00 PM

Dbrant triaged this task as Low priority.Aug 16 2023, 8:19 PM

Dbrant added a project: Wikipedia-iOS-App-Backlog.

Dbrant updated the task description. (Show Details)

Thanks for doing this @Dbrant

@Dbrant any thoughts/recommendations around efficiently pulling a meaningful block of text?

Seddon reassigned this task from Dbrant to • brooke.Sep 27 2023, 11:53 AM

In T344378#9192543, @Dmantena wrote:

@Dbrant any thoughts/recommendations around efficiently pulling a meaningful block of text?

It's an interesting question - how to get a block of text in the vicinity of an image to contextualize it, so that the user can write meaningful alt text. I don't think there are specific APIs for this, and doing it with parsing or regexes could get very tricky.
I would just envision an interface that loads the entire article, and pre-scrolls it to the position of the image, so that it's in the center of the screen. And then the dialog for adding alt-text can be a floating (non-modal) component that sits at the bottom, while allowing the user to keep interacting with the article underneath.

Looking more into the linter internals; the actual lint checks are done inside Parsoid, in the Linter class; there's no extension/hook interface for adding arbitrary lints (there does appear to be a way for tag extensions to register custom handling, but it won't help us here). Adding either an additional check for missing alt text straight into main should be pretty straightforward I think, and then we can draw from the recorded lints on newly edited pages (and do an offline batch run to prefill them)...

I'll try whipping up a spike test with a directly hacked-in test, then see what folks think about just adding it to Parsoid vs adding an extensible hook (which might have to expose more internal DOM bits)

Note that linter enthusiasts may have strong opinions about linter data, but there doesn't appear to be a central place to communicate with them. Before deploying any additional lints that go into this API we should ensure we do some proper community discussion to a) warn people it's coming :D b) consider alternatives or mitigations to a possibly very large influx of one type of lint error, and c) improve these processes for the future, so the next time someone wants to add a regular markup check we can do this more easily :D

See for starters:

https://en.wikipedia.org/wiki/Wikipedia_talk:Linter
https://www.mediawiki.org/wiki/Extension_talk:Linter
https://de.wikipedia.org/wiki/Benutzer_Diskussion:PerfektesChaos/js/lintHint [german]
Tech News
wikitech-ambassadors-l

• brooke mentioned this in T330726: Allow extensions to add lint errors.Jan 24 2024, 11:08 PM

Change 1002672 had a related patch set uploaded (by Brion VIBBER; author: Brion VIBBER):

[mediawiki/services/parsoid@master] Add missing-image-alt-text lint

https://gerrit.wikimedia.org/r/1002672

Change 1002673 had a related patch set uploaded (by Brion VIBBER; author: Brion VIBBER):

[mediawiki/extensions/Linter@master] Add low-priority lint missing-image-alt-text

https://gerrit.wikimedia.org/r/1002673

Proposed patches to parsoid & extension:linter above add the "missing-image-alt-text" lint checking for missing alt attributes on <img>s with an attached file resource. This is categorized with "low priority" lints and and the data for each match lists a link to the image file in a machine-readable way as well as the positional info in the markup for editing.

I expect this could create a *very large* number of matches, so it's worth confirming that that is not expected to be a technical/performance problem in production if the resulting queue grows to hundreds of thousands of pages or more on large wikis.

(If it would, then let's either see about fixing that perf problem or find another way to store the same linting data with a different usage profile.)

JTannerWMF edited parent tasks, added: T357437: [Epic] Alt-Text Suggested Edit Experiment on iOS ; removed: T344268: [Sub Epic] Alt-Text Suggested Edits Proof of Concept .Feb 13 2024, 4:29 PM

JTannerWMF added a project: iOS Release FY2023-24 (Archive).Feb 13 2024, 4:33 PM

JTannerWMF moved this task from Tasks from Product Backlog to Doing on the iOS Release FY2023-24 (Archive) board.

bvibber mentioned this in T358272: [Spike] missing-image-alt-text lint randomized queue prototype.Feb 22 2024, 8:02 PM

Seddon reassigned this task from • brooke to bvibber.Feb 23 2024, 10:04 PM

Seddon added a subscriber: • brooke.

Seddon removed a subscriber: • brooke.

UOzurumba added a project: User-notice.Feb 29 2024, 6:02 PM

Hello @Seddon,
When should this be included in Tech News? Is this wording:

Image Recommendations will be added to Wikimedia mobile apps. This will include adding a new lint error type to identify images without alt text, facilitating correction workflows and microtasks within the Wikipedia app. Please give your feedback in this phab ticket or here

okay? Thank you!

UOzurumba moved this task from To Triage to Not ready to announce on the User-notice board.Feb 29 2024, 6:36 PM

See the comments including mine at: https://en.wikipedia.org/wiki/Wikipedia_talk:Linter#Wikipedia_Mobile_App:_Image_Recommendations_and_New_Lint_Error

Making the default alt text "Refer to caption" or something for thumbnail images (so most images will have alt text) would be a million times more preferable and simpler than this.

Using Linter to track missing alt text is a Bad Idea. Missing alt text is not a syntax error. Please learn from your big mistake in the "wide table" Linter tracking fiasco. Linter tracking is for syntax errors that can and should be fixed in all cases; missing alt text, like "wide tables", does not meet any of those criteria.

Missing alt text is not always an error; purely decorative images do not need alt text.

If this condition is to be tracked and the communities do not get a say about whether it should be done at all, it should be done with a normal MediaWiki tracking category at Special:TrackingCategories. Tracking categories can be documented locally by editors from each community, hidden if desired, and ignored if necessary. That makes them a much better tool for this purpose.

Some images are added via protected modules and templates; have you thought about how an "add alt text" tool for newbie editors (probably a different bad idea, but that should be a different discussion) would avoid pointing editors to images that they are unable to modify?

My comments on mw:Talk:Wikimedia Apps/Team/Android/Image Recommendations were already linked, but I do strongly oppose to make this a LINT nor Growth task.

Some core remarks:

You need an advanced understanding what blind people or others shall take from a textual or audible image description if they shall really benefit.
A kind of poetry is necessary: If you close your eyes, then only hearing the description, the image must appear inside your head. When opening the eyes, the same picture shall be visible on the screen. That’s no business for everybody. If no similar image appeared within your mind, the description is waste of time.
To create a good image description in current context will take five or ten minutes. There is no staff to add this for millions of images.
We learnt that almost all article authors are creating contra-productive alt= if they are asked to equip images, and they made the situation worse.
An operational guidance, a cookbook is to be provided first, before pushing people to write alt= texts.
- In English, some good help is available by associations outside WMF, but I am not aware of any wiki page which is breeding alt= text editors.
- In German, the external sites did not establish user guidance yet. Some first steps were made, but not sufficient. German Wikipedia is attempting to develop an operational manual, currently on user page level.
If you clutter many images with not helpful but time consuming bad whaffle-whaffle then blind people and others will switch off image descriptions or stop opening descriptions.
Many images do not need any description but shall be mute; the existence is to be concealed.

In T344378#9611682, @Jonesey95 wrote:

Using Linter to track missing alt text is a Bad Idea. Missing alt text is not a syntax error. Please learn from your big mistake in the "wide table" Linter tracking fiasco. Linter tracking is for syntax errors that can and should be fixed in all cases; missing alt text, like "wide tables", does not meet any of those criteria.

Note that linters don't track *errors* per se, they track potential issues which you might need to fix up with manual intervention.

I have no particular attachment to the linter, though, and am happy to store data elsewhere. Note a tracking category is insufficient as it's missing the contextual information that's already supplied in the linter table, hence piggybacking on that.

Easy enough to move off if folks using the linter system aren't happy with it, though.

Missing alt text is not always an error; purely decorative images do not need alt text.

This can be signalled with alt= to give an empty alt text attribute.

If this condition is to be tracked and the communities do not get a say about whether it should be done at all, it should be done with a normal MediaWiki tracking category at Special:TrackingCategories. Tracking categories can be documented locally by editors from each community, hidden if desired, and ignored if necessary. That makes them a much better tool for this purpose.

Tracking categories don't include the location and target file information we have here.

Some images are added via protected modules and templates; have you thought about how an "add alt text" tool for newbie editors (probably a different bad idea, but that should be a different discussion) would avoid pointing editors to images that they are unable to modify?

Good feedback for the high-level feature.

In T344378#9611773, @PerfektesChaos wrote:

My comments on mw:Talk:Wikimedia Apps/Team/Android/Image Recommendations were already linked, but I do strongly oppose to make this a LINT nor Growth task.

Some core remarks:

You need an advanced understanding what blind people or others shall take from a textual or audible image description if they shall really benefit.

A kind of poetry is necessary: If you close your eyes, then only hearing the description, the image must appear inside your head. When opening the eyes, the same picture shall be visible on the screen. That’s no business for everybody. If no similar image appeared within your mind, the description is waste of time.

To create a good image description in current context will take five or ten minutes. There is no staff to add this for millions of images.

We learnt that almost all article authors are creating contra-productive alt= if they are asked to equip images, and they made the situation worse.

An operational guidance, a cookbook is to be provided first, before pushing people to write alt= texts.

In English, some good help is available by associations outside WMF, but I am not aware of any wiki page which is breeding alt= text editors.

In German, the external sites did not establish user guidance yet. Some first steps were made, but not sufficient. German Wikipedia is attempting to develop an operational manual, currently on user page level.

If you clutter many images with not helpful but time consuming bad whaffle-whaffle then blind people and others will switch off image descriptions or stop opening descriptions.

Many images do not need any description but shall be mute; the existence is to be concealed.

All good feedback for the high-level feature; this feedback belongs on a parent task so it's not lost.

In T344378#9610271, @Graham87 wrote:

See the comments including mine at: https://en.wikipedia.org/wiki/Wikipedia_talk:Linter#Wikipedia_Mobile_App:_Image_Recommendations_and_New_Lint_Error

Making the default alt text "Refer to caption" or something for thumbnail images (so most images will have alt text) would be a million times more preferable and simpler than this.

All good feedback for the high-level feature, this needs to be copied to the appropriate high-level task so it's not lost.

(Do we.... have a good high-level task for this that covers the conceptual feature for both ios and android and not low-level implementation details? Sounds like we haven't done enough of a public consult on the whole project.)

In T344378#9612102, @bvibber wrote:

In T344378#9611682, @Jonesey95 wrote:

Using Linter to track missing alt text is a Bad Idea. Missing alt text is not a syntax error. Please learn from your big mistake in the "wide table" Linter tracking fiasco. Linter tracking is for syntax errors that can and should be fixed in all cases; missing alt text, like "wide tables", does not meet any of those criteria.

Note that linters don't track *errors* per se, they track potential issues which you might need to fix up with manual intervention.

Well, at least on the English Wikipedia, https://en.wikipedia.org/wiki/Special:LintErrors is called "Lint errors" and says "This special page displays lint errors." On the "Page information" page, there is a section called "Lint errors" that includes things that should not be listed as Lint errors, including "Big Tables that are hard to view on mobile". When I ask quarry to show me pages with Linter errors, "Big Tables that are hard to view on mobile" is included, even though it is not a syntax error, and I have to then exclude it from my report query.

So when you say "linter", I have to assume that whatever condition is being tracked will appear in these lists, all of which refer to the tracked conditions as "errors". It is that placement that I am objecting to, since I do not know of another way in which the word "linter" is used in Wikipedia. If you are using "linter" to mean something different, I recommend that you clarify what it means in this context, or perhaps use a different word. Thanks.

In T344378#9612102, @bvibber wrote:

In T344378#9611682, @Jonesey95 wrote:

If this condition is to be tracked and the communities do not get a say about whether it should be done at all, it should be done with a normal MediaWiki tracking category at Special:TrackingCategories. Tracking categories can be documented locally by editors from each community, hidden if desired, and ignored if necessary. That makes them a much better tool for this purpose.

Tracking categories don't include the location and target file information we have here.

Hence my suggestion to show an error message in Preview mode, as we do on the English Wikipedia with the unknown parameter detection module. You can also render a hidden error message that can be turned on via CSS, as we do with our Citation Style 1 modules. These are just two ways to indicate the location of a tracked condition on a page; I imagine that there are more.

In T344378#9612154, @bvibber wrote:

Do we.... have a good high-level task for this that covers the conceptual feature for both ios and android and not low-level implementation details?

In T359582 I just complained that

there should be a separate Phabricator trail for alt= image descriptions (even video).

In T344378#9612962, @Jonesey95 wrote:

In T344378#9612102, @bvibber wrote:

In T344378#9611682, @Jonesey95 wrote:

So when you say "linter", I have to assume that whatever condition is being tracked will appear in these lists, all of which refer to the tracked conditions as "errors". It is that placement that I am objecting to, since I do not know of another way in which the word "linter" is used in Wikipedia. If you are using "linter" to mean something different, I recommend that you clarify what it means in this context, or perhaps use a different word. Thanks.

The one and only LINT feature is meant when talking about linter.

The LINT categories shall be used for trivial syntax errors only, which can be solved within a few seconds by a broad audience.
The LINT system must not be used for any content issue, which needs special experience, and which cannot be solved immediately, or which will stay forever.
- The wide table “error” which is not necessarily an error since a table with more than five narrow columns might be easily presented on a mobile phone, and other tables cannot be presented in any other way and will remain an “error” in eternity.
- Image descriptions are another case of inappropriate usage.

In T344378#9612989, @Jonesey95 wrote:

In T344378#9612102, @bvibber wrote:

In T344378#9611682, @Jonesey95 wrote:

Hence my suggestion to show an error message in Preview mode, as we do on the English Wikipedia with the unknown parameter detection module. You can also render a hidden error message that can be turned on via CSS, as we do with our Citation Style 1 modules. These are just two ways to indicate the location of a tracked condition on a page; I imagine that there are more.

No uneducated people must be pushed nor triggered to add any image description.

This needs experience with the circumstances and very good guidance.
If you ask people to add an image description they just copy the caption=legend again as alt= and blind people will hear the same story twice.
If you request an image description from people who did not dive into details yet they will create a nonsense description, which does not help anybody, but will make the situation worse.

I am deleting more than 75 % of all alt= texts I am encountering on article edits since they were made by AGF but had simply no idea what they were supposed to do. You cannot learn this within five minutes.

Hi @PerfektesChaos thanks for your feedback! I'm Jaz the Lead PM for the apps. I do want to clarify that this is not a Growth feature, but an apps feature. I make that distinction because apps suggested edits are not for newbies. We have gate requirements for suggested edits in the apps. I wrote a more in-depth explanation about the target audience for the feature and our experiment plan on the Linter talk page. If I may invite you to collaborate with me there so folks that are interested in the concept at a higher level can congregate in the same place and not have to follow two separate conversations.

Feel free to continue to leave technical feedback and considerations on the task.

Seddon moved this task from Engineering Backlog to iOS Release FY2023-24 on the Wikipedia-iOS-App-Backlog board.Mar 8 2024, 10:47 AM

Seddon edited projects, added Wikipedia-iOS-App-Backlog (iOS Release FY2023-24); removed Wikipedia-iOS-App-Backlog.

Seddon moved this task from Tasks from Product Backlog to Doing on the Wikipedia-iOS-App-Backlog (iOS Release FY2023-24) board.Mar 8 2024, 11:04 AM

Updated the Parsoid-side patch in case we do want to make use of it later:

https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1002672

And a Linter patch which keeps it disabled/hidden from UI by default, based on feedback that lint-errors folks don't want this in their data sets and we should plan to record it separately if we use it:

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Linter/+/1002673

If we do move forward with something using a lint-based check, we'd have to coordinate pushing the Parsoid side first, then either tweak Linter further or piggy-back on it with another extension.

I think the patches are safe to merge as-is, since they won't be recorded in the lint-errors tables, but we can keep the patch in reserve to be conservative, pending more specific work moving forward that would use this lint as the data source.

I'm going to retool this to move the specific alt-text check into our own extension, with a clean hook point in core, as we'd previously discussed. I think this should allow us to piggyback on the lint-time checks more easily without having to put stuff in core parsoid. :D

Change #1002673 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Add hidden lint missing-image-alt-text

https://gerrit.wikimedia.org/r/1002673

ReleaseTaggerBot added a project: MW-1.43-notes (1.43.0-wmf.8; 2024-06-04).May 30 2024, 12:00 AM

Change #1037542 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/core@master] Descriptions for new linter error missing-image-alt-text

https://gerrit.wikimedia.org/r/1037542

Change #1037542 merged by jenkins-bot:

[mediawiki/core@master] Descriptions for new linter error missing-image-alt-text

https://gerrit.wikimedia.org/r/1037542

Change #1002672 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add missing-image-alt-text lint

https://gerrit.wikimedia.org/r/1002672

Maintenance_bot removed a project: Patch-For-Review.Jun 4 2024, 7:31 PM

Change #1039739 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a7

https://gerrit.wikimedia.org/r/1039739

gerritbot added a project: Patch-For-Review.Jun 6 2024, 2:50 PM

For Tech News (per the User-notice tag that was added earlier), please could someone clarify when the entry needs to be announced, and what it should say? (1-4 sentences, 1-2 links). Thanks!

In T344378#9869476, @Quiddity wrote:

For Tech News (per the User-notice tag that was added earlier), please could someone clarify when the entry needs to be announced, and what it should say? (1-4 sentences, 1-2 links). Thanks!

I suppose it should go out before the update goes live, though since it's hidden by default (priority=none) it won't be user-visible unless you're specifically requesting it. Assuming this is going out with the train next week?

Could say something like "A hidden parsoid lint for missing image alt text is being added to track images without alt text in articles. This has priority 'none' and is hidden from Special:LintErrors by default because it is expected to make a large number of matches; it's intended for use by experimental mobile apps tooling for aiming editors at a specific alt text contribution funnel, but is also available for general queries if missing-image-alt-text lint is specifically requested."

[added second paragraph i forgot to paste before submitting whoops]

ah yes and include prolly a link to https://www.mediawiki.org/wiki/Wikimedia_Apps/iOS_Suggested_edits_project/Alt_Text_Experiment for the wider work :D

Ah, if there are no user-visible changes (or potential unintended affects) then it doesn't need inclusion in Tech News. Thank you for explaining the context though!
I'll remove the User-notice tag from here. If there's a related user-facing aspect to this deployed in the future, please add the tag to the related task!

Woohoo!

Change #1039739 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a7

https://gerrit.wikimedia.org/r/1039739

Maintenance_bot removed a project: Patch-For-Review.Jun 7 2024, 11:31 PM

Quiddity unsubscribed.Jun 8 2024, 12:59 AM

Please note that there are other means in addition to <gallery> and File: transclusion.

Consider aria-hidden="true" role="presentation" which will hide anything within from screenreaders.

These are two approaches with the same effect; if one fails on old software the other one will be caught.
See e.g. Hinweisbaustein in German Wikipedia.
- There is an eye-catcher on the left which summarizes the message on the right hand side at a glance.
- Naturally, “eye-catcher” and “glance” are pointless for blind people, therefore suppressed.
- {{{ICON}}} may be anything, a small image or something build from HTML text elements. Whatever it might be, it is always hidden.
- Therefore, never ever an |alt=| is required.
- 1,881,238 transclusions, which will result in more than 1.5 millions of false positive LINT errors.
- Standardbaustein is the same story, with 60,590 transclusions.
Thousands more templates are providing and suppressing a duplicated graphical and textual information.
- E.g. flag icons, which just tell “USA” and they shall not repeat the textual message and they shall not start to chatter about 13 horizontal stripes alternating white and red and a blue rectangle with 50 white stars. Just omit that by aria-hidden="true" role="presentation" and fine.

Ah, good catch! It should be relatively straightforward to tweak the lint check to exclude subtrees with a suitable aria-hidden etc. I'll make some notes and see if I can knock that out this week.

Change #1041248 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/services/parsoid@master] Suppress missing-image-alt-text lint with aria-hidden or role=presentation

https://gerrit.wikimedia.org/r/1041248

gerritbot added a project: Patch-For-Review.Jun 10 2024, 10:56 PM

I think the above patch is correct; it will suppress the lint for anything with aria-hidden=true or its descendents, and for any media elements that _themselves_ use role=presentation or role=none (it's not clear to me from checking the docs on MDN that those should apply to descendants unless they're specifically part of the parent element's role).

Should avoid false positives on explicit wrapper divs/spans in templates, as well as anything that's producing <img> output direct from the wiki with the role=presentation or role=none on it.

Advice on whether there's more corner cases or arguments to reinterpret that role check are welcome. :D

From my understanding the role= is applied to the wrapping element, and only this one is mentioned and linked when generating a TOC from all role=navigation or role=region elements (in addition to <h2> etc.). Same goes for each single role=alert which may consist of several content elements.

The major effect comes from aria-hidden:

Indicates that the element and all of its descendants are not visible.

Since role=presentation is older than aria-hidden this is provided as a fallback.

aria-hidden may be unkown to older user agents.
Never change a running system if you are blind and cannot read the installation guidance. Therefore screenreaders which are not updated to the most recent standard are quite common. If updating fails on your only device you are also mute.
Older template implementations may still not provide aria-hidden.

For wikitext purpose, if hidden by role=presentation this should apply to all children as well.

That is the behaviour of the former screenreaders.
If in wikitext an element is judged as presentational, the entire story is visual only. We are not providing audible content which would change roles.
The modernized interpretation is quite complicated and just revocating another role from this particular element, which does not matter for the LINT issue.

Please note that display:none and visibiity:hidden also hide elements from all users, but I am not aware that images are used in wikitext here at large scale. The visibility may be toggled by gadgets, and those images would come into effect again, therefore those should get explicit |alt=| if any.

Yeah, on closer reading of the spec I misread it. :D Lemme adjust that so the 'role' inherits as well. Agreed that it's best not to enforce on visibility:none/display:none since that's likely to be expandable text that could become visible later, plus it's hard to ensure without the external stylesheet. :)

patch updated:

https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1041248

While this might be solved technically, the entire task is not a good idea for two reasons:

An issue cannot be solved within a few seconds like other LINT “errors”, adding a missing ' – it is actually not an error.
1. It will need several minutes and requires creative writing and deeper understanding of the context of the image.
2. There is no helpful guidance in most wikis, partial in enWP.
3. The target is to write a text which is “illustrative” – if you close your eyes the image shall appear in your mind before you would have seen it the first time.
4. If I am encountering an alt= I delete them in most cases. They do not describe, they are confusing, 50% are just repeating the legend which will be told twice to the blind.
AI (artificial intelligence) is currently conquering automatic image description for blind people, integrated in screenreader.
1. They get a button “Tell me” for each image, and a few seconds later the speaker starts describing the image; much better than most wiki authors do.
2. There are apps for mobile phones in daily usage now. Blind people hold the camera in the direction incognita, and after some seconds the phone tells about houses, streets, inscriptions of shops, plate with the street name or reading the plate of a monument. They are heading for a dialog, “Which shops are there?”, and the phone will answer “barber shop, grocery, tailor”.
3. Within some years I guess alt= is history and a Nice-to-have rather than pushing people to equip web pages.

Change #1041248 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Suppress missing-image-alt-text lint on aria-hidden or role=presentation

https://gerrit.wikimedia.org/r/1041248

Maintenance_bot removed a project: Patch-For-Review.Jun 13 2024, 2:30 AM

In T344378#9884535, @PerfektesChaos wrote:

While this might be solved technically, the entire task is not a good idea for two reasons:
[snipped]

Actual planned workflows are very much in flux and are intended to be thoughtful and relatively complete if deployed -- we don't want to deploy something half-assed. :)

Note this spike task is meant for the specific technical issues of detecting missing alt text in existing edits, and isn't really the place for general feedback on the idea of encouraging people to edit alt text, but the feedback is appreciated and will be taken into consideration.

Re the task complexity of editing alt text vs what it means for something to be a 'lint error' -- we've set it to priority=none so it's hidden from Special:LinterErrors and its API by default while we continue to evaluate this plan. :) We may end up retooling things and doing entirely different kinds of tracking based on the same checks but tracked differently, we'll see. Agree we want to keep the complexity out of peoples' way when they're working on existing Special:LintErrors workflows.

Re 'AI' (using machine learning bits to assist with alt text, either in our system or outside in the browsers or a bit of both) -- that's another potential area of great interest but outside the scope of current work. Because context is important, this isn't a trivial case of classic "identify the object" and needs to be able to interpret some surrounding text presumably to really work well, but it's something that a lot of people outside our org are working on in which is always helpful in that we can piggyback on browser and OS tools. Very exciting times! :D

Change #1043324 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a8

https://gerrit.wikimedia.org/r/1043324

gerritbot added a project: Patch-For-Review.Jun 14 2024, 12:49 AM

Change #1043324 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a8

https://gerrit.wikimedia.org/r/1043324

Maintenance_bot removed a project: Patch-For-Review.Jun 14 2024, 3:30 AM

HNordeenWMF mentioned this in T367907: Identification of eligible articles for Alt Text article flow C.Jun 18 2024, 6:23 PM

HNordeenWMF mentioned this in T367908: Identification of eligible image for Alt Text Flow C: Article Editor.

HNordeenWMF moved this task from Doing to Blocked or Waiting on the Wikipedia-iOS-App-Backlog (iOS Release FY2023-24) board.Jul 11 2024, 2:50 PM

Seddon moved this task from iOS Release FY2023-24 to iOS Release FY2024-25 on the Wikipedia-iOS-App-Backlog board.Jul 16 2024, 1:20 PM