Jump to content

Wikipedia:Village pump (policy)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Levivich (talk | contribs) at 21:01, 16 December 2022 (US judicial information may need to be vetted: 2c). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

 Policy Technical Proposals Idea lab WMF Miscellaneous 
The policy section of the village pump is used to discuss already proposed policies and guidelines and to discuss changes to existing policies and guidelines.

Please see this FAQ page for a list of frequently rejected or ignored proposals. Discussions are automatically archived after remaining inactive for two weeks.


Galactica and RS

Platform 9¾ at King's Cross Station

Meta's Galactica seems to be able to rapidly generate WP pages, although at present they're going to be relatively easy to identify as fake. Presumably they're going to get better in the future.

My question is really about the fake references it might generate and how we are going to better protect ourselves against fake content, fake notability and fake RS. For me, one of the great weaknesses of AfD discussions has always been the possibility of printed RS which exist on a dusty library shelf. If we have AI that can generate plausible looking book references, isn't it going to be an increasing challenge to identify completely fraudulent pages? JMWt (talk) 10:02, 17 November 2022 (UTC)[reply]

Well how are fake but plausible-seeming references generated by organic intelligence dealt with now? I wouldn't overwork myself trying to find out. There is no formal validation of citations for accuracy or relevance in Wikipedia, and there is no other metric that will help the answer. It is left to the community to haphazardly and erratically certify references, at least outside of vanity projects like so-called "good" or "featured" articles. If anything the pre-seed of native AI present now (the relationship of relevant Wikidata properties with article verification policies/guidelines) when applied is likely to make things worse, as there is no context validation of Wikidata data to begin with. 65.88.88.68 (talk) 16:22, 17 November 2022 (UTC)[reply]
And on the other hand, there's Assigning Numbers. RAN1 (talk) 20:40, 18 November 2022 (UTC)[reply]
@JMWt, is your concern that the machine-learning system will write a book, that book will get published somewhere/by someone, and the book will be cited in Wikipedia? WhatamIdoing (talk) 21:00, 18 November 2022 (UTC)[reply]
@WhatamIdoing I think it is more plausible that some machine learning system generates references that look like very old books that would take a lot of effort to check. I don't think it needs to get to the stage of actually publishing anything to be a problem. JMWt (talk) 08:25, 19 November 2022 (UTC)[reply]
The question posed was not answered: why are machines producing articles with inappropriate references a bigger concern than humans doing so? Some of the latter may be doing so now, undetected. And does it matter what kind of entity publishes misinformation? In any case compiling an encyclopedia is a mechanical process, there is nothing creative about it. Non-human machines will be able to replicate it easily. 69.203.140.37 (talk) 14:36, 19 November 2022 (UTC)[reply]
@JMWt, if the machine-generated pseudo-book isn't published, then how would a Wikipedia editor have access to its contents?
69.203, at some point, quantity becomes its own quality. A human who types all day without stopping (e.g., to interview anyone or to double-check facts) can usually only produce a volume of text equal to about one book a week. A room full of computers could produce a book a minute without stopping. WhatamIdoing (talk) 22:11, 20 November 2022 (UTC)[reply]
Wasn't there a recent convoluted discussion/RFC about mass article production/publishing articles at scale? 64.18.11.71 (talk) 01:56, 21 November 2022 (UTC)[reply]
@WhatamIdoing Galactica can generate convincing looking references. An editor could just machine-generate a whole WP page including the refs. All completely bogus. Maybe I'm missing some detail that you are asking me? JMWt (talk) 11:21, 21 November 2022 (UTC)[reply]
@JMWt, Are you concerned about ghost references ("nothing remotely resembling the content appears in this real source") and hoax citations ("There is no Platform 9¾ at King's Cross Station, so your content can't be supported by a source located there")? WhatamIdoing (talk) 22:52, 21 November 2022 (UTC)[reply]
@WhatamIdoing Galactica was, I think, creating completely bogus references. If they are recent, we can probably identify them with a search of the ISDN or DOI. If they are old, that's going to be nearly impossible. It might also be copying real references and claiming they contain facts that they don't. Both are a problem, no? JMWt (talk) 07:12, 22 November 2022 (UTC)[reply]
Given how much research has been done (for five or ten years now) on matching newspaper articles to Wikipedia statements, I wonder why they would bother creating a hoax citation. Just to see if the software could mimic the style, with plausible content? But they could plug in the other system, and get real citations to real sources. There'd be some limits on accuracy ("A traffic problem of nightmarish proportions occurred when Joe Film stopped to sign an autograph and a clamoring crowd of thousands formed around the star. Police were called charged the actor with Pranks without a Permit in the third degree": a simple software program could find "Joe Film" and "traffic", but might not be able to figure out whether it should be written to suggest guilt, innocence, or even whether it's worth including.) WhatamIdoing (talk) 17:08, 22 November 2022 (UTC)[reply]
@JMWt Apropos of this, Stack Overflow have explicitly - for the moment - banned the use of a similar tool (ChatGPT). They specify that specific one and not the general class of tools, though it might just be that this is the only one usable at scale at the moment - Galactica lasted about a week so presumably isn't much of a problem any more! Andrew Gray (talk) 14:10, 5 December 2022 (UTC)[reply]

I'm requesting input at Talk:Jason David Frank#RFC: Cause of Death, specifically on a dispute related to Wikipedia's reliable-sourcing policy; one user has suggested a few implications of the policy that I don't think are supported by text or practice. TMZ has reported that, per anonymous "law-enforcement sources," Jason David Frank died by suicide. The aforementioned user has said that this report cannot be included on the JDF page. There are three aspects of this dispute, all related to WP:RS.

  1. Does WP require an official statement from family or law enforcement? (According to the user: "we 100% . . . need an official statement from his management team or law enforcement.")
  2. If an authority itself relies on anonymous sources, can that authority be cited? (According to the user: "Doesn't matter if it's a death, sports transaction, or whatever per that policy, no citations based on anonymous sources.")
  3. Is TMZ an unreliable source? (Note: WP:TMZ, an essay, has some thoughts on that subject; it recommends explicitly attributing information to TMZ.)

Given how differently he and I read existing policy, I think third-party perspectives would be helpful.--50.86.94.196 (talk) 08:09, 27 November 2022 (UTC)[reply]

Briefly,
  1. No, however we do require that WP:RS of some kind to have covered a cause of death for it to be included.
  2. Depends, does the source have a reputation for high-quality fact-checking and responsible reporting? If so we can probably trust they've done the legwork, but situationally editorial discretion always applies (sources split, only reported in one source especially if the claim is surprising/important, current events/developing news etc.)
  3. Currently classed as WP:MREL which more or less rules out using it to support controversial WP:BLP (also applies to recently deceased) information. You can always ask for additional input at WP:RSN if you believe that the source has been misclassified, or that one particular piece should be viewed differently from the publication as a whole.
For future reference the help desk and teahouse will usually provide more prompt responses to inquiries. If you are looking for input into a dispute that involves you and one other editor you can also follow the process outlined at WP:3O. Since this particular case already has an associated RFC there really isn't any additional need to request input on matters of content; hope this helps, 74.73.224.126 (talk) 19:27, 6 December 2022 (UTC)[reply]
Hi! Appreciate the tip. I was following the suggestion on WP:RFC to publicize the RFC on the Village Pump ("To get more input, you may publicize the RfC by posting a notice at one or more of the following locations: One of the Village Pump forums, such as those for policy issues, proposals, or miscellaneous"). I think the publicizing did its job! Got quite a bit of input. Thanks though!--50.86.94.196 (talk) 23:41, 6 December 2022 (UTC)[reply]
No problem, in general the intent is for WP:APPNOTEs publicizing discussions to be brief; often just using standard templates such as {{rfc notice}} or {{please see}} to avoid forking the discussion. And of course all ongoing rfcs are automatically listed at WP:RFC/A. That said it looks like you did get plenty of additional input; all's well that ends well I suppose. 74.73.224.126 (talk) 01:22, 7 December 2022 (UTC)[reply]

"Deleted image removed" - how long is long enough?

While perusing punctuation errors, I found that Strongwoman has a not-uncommon feature in Wikipedia articles, a block of hidden text saying up front, "Deleted image removed". In 2009. Can we please have a bot scour all these ancient deleted images and purge them from the Wikitext altogether? BD2412 T 23:05, 27 November 2022 (UTC)[reply]

Not sure there is any policy/guideline/etc requiring these to be made, retained, or removed. They are still actively being added (e.g. Special:Diff/1124102240) by User:ImageRemovalBot - operated by User:Carnildo who may have more background on this. There are about 20,000 articles like that. — xaosflux Talk 23:34, 27 November 2022 (UTC)[reply]
I have no objection to the addition when an image in the article is deleted. I just think that there's a shelf life beyond which they are no longer useful, and become nothing more than pollution of the Wikitext. I would say that after a few years (perhaps two or three, even), the notice has outlived any utility it may have initially had. BD2412 T 23:48, 27 November 2022 (UTC)[reply]
Is there any actual need to remove these comments? If not, they should probably only be removed as part of an other edit, and certainly not by a bot. Animal lover |666| 06:15, 28 November 2022 (UTC)[reply]
The need to remove these comments is that useless strings of hidden wikitext make it more difficult to find and edit the wikitext that needs editing. From my own experience, I fix a lot of punctuation errors, and would prefer not to be bothered finding such errors in hidden text, where they crop up for some reason. BD2412 T 17:45, 28 November 2022 (UTC)[reply]
If this is going to happen, one thing to consider going forward would be to ask the bot to add the date to the comment (e.g. Deleted image removed 20221128: ...) to make it easier on future updates. — xaosflux Talk 17:49, 28 November 2022 (UTC)[reply]
Maybe that information should be added to the talk page instead of the wikitext. ~ ONUnicorn(Talk|Contribs)problem solving 17:51, 28 November 2022 (UTC)[reply]
I can see the value in noting where in the article an image has been removed from, as it makes it easy to add a new one or restore the old one if it is undeleted for some reason. However, the value in that decreases as the article develops without the image there - time is a proxy for that but I don't expect it's a very consistent one. There have been 54 revisions to Strongwoman since the image was removed, looking at some other articles the bot edited that day (20 July 2009) there have been thousands (so many it doesn't tell me the number) of revisions to Manchester Arena but only 9 to Type 518 radar. I think it would be rarely problematic to remove the text when it has been in place for the greater of 2 years and 50 edits, or where the article has since become a GA or FA (although how likely the text is to still be there in them I have no idea).
Timestamps are good idea, and posting on the talk page as well as the article also seems sensible. Thryduulf (talk) 21:43, 28 November 2022 (UTC)[reply]
@ONUnicorn I suggested in the wikitext, as that would make a future bot cleanup job much easier (as it would have the data there - as opposed to having to try to scrape the history to see how long it was there). Now another possibility would be to have that bot not do this at all anymore, and instead just post it to the article talk page. Not sure if that would be even less useful though? — xaosflux Talk 21:50, 28 November 2022 (UTC)[reply]
I can see the use of such a hidden note for a short period. An editor currently working on the page notices that an image has disappeared, clicks to edit to see if there is something wrong in the wikitext, and sees a note explaining that the image has been deleted, so they don't have to chase after what happened to it. I would say that if such notes are going into the wikitext, they should be there for two years at most (no matter how many edits have been made to the page, if no one has bothered after a deleted image for multiple years, the note will be of no further value). BD2412 T 17:14, 29 November 2022 (UTC)[reply]
I agree with the general point here. That aside, there's a fairly strong community aversion to bot edits that have no effect on the appearance of a page as seen by readers. Essentially the upside (cleaner wikitext) is perceived as outweighed by the downside (watchlist flooding, additional page history rummage). So there would need to be a rather strong case for why this is somehow qualitatively different; even linter-fixing bots engaged in future-proofing have proven controversial. 74.73.224.126 (talk) 19:08, 6 December 2022 (UTC)[reply]
The original reason for replacing the images with something was to avoid breaking table layouts. I don't know if that's still a problem or not. The specific replacement with a comment was as a result of a discussion during OrphanBot's approval back in 2005; when I split OrphanBot's tasks between ImageTaggingBot and ImageRemovalBot, ImageRemovalBot inherited the behavior. --Carnildo (talk) 19:30, 29 November 2022 (UTC)[reply]

Global deleters addition at Global rights policy

Turns out m:Global deleters Is a thing... Was just boldly added to Wikipedia:Global rights policy. This was after a 2 hour discussion at the Bureaucrats' noticeboard. If you are interested: Wikipedia:Bureaucrats' noticeboard#Request for permission to run Synchbot Terasail[✉️] 21:12, 3 December 2022 (UTC)[reply]

Should global deleters be permitted to delete local pages when fulfilling m:Synchbot requests?

This would be made effective with the section currently {{proposed}} at Wikipedia:Global rights policy#Global deleters. This follows the above-mentioned thread at WP:BN.

information Note: There is (currently) only one global deleter (who previously carried out this function with local admin privileges). –xenotalk 23:21, 3 December 2022 (UTC)[reply]

Interface editing

Another task Synchbot does is make edits to user CSS/JS pages. This is even rarer than deletions, and was historically allowed by virtue of his global interface editor access, but, by a strict reading of Wikipedia:Global rights policy#Global interface editors is now prohibited since Pathoschild has previously had the administrator or interface administrator right removed at the English Wikipedia. I suggest changing the quoted section, and while I'm at it the substantively identical wording at Wikipedia:Global rights policy#Global rollbackers, to previously had the administrator or interface administrator right for cause removed at the English Wikipedia * Pppery * it has begun... 00:28, 4 December 2022 (UTC)[reply]

Agree: this makes sense to me. –xenotalk 00:37, 4 December 2022 (UTC)[reply]
Shouldn't that be "removed for cause", or is this an ENGVAR issue? —Cryptic 01:13, 4 December 2022 (UTC)[reply]
Should be "removed for cause", that was a typo. * Pppery * it has begun... 01:29, 4 December 2022 (UTC)[reply]
"Inactivity" is a cause. — xaosflux Talk 01:15, 4 December 2022 (UTC)[reply]
But I'm fairly open to supporting an update to that that would get around voluntary resignations, or even just procedural inactivity removals alone. — xaosflux Talk 01:18, 4 December 2022 (UTC)[reply]
I was thinking "for cause" in the sense that Wikipedia:Former administrators/reason/for cause (and the mainspace redirect for cause) uses it, not "for any cause at all", which would be obviously useless. * Pppery * it has begun... 01:29, 4 December 2022 (UTC)[reply]
I agree with Xaosflux that I would prefer not to use "for cause" as a blanket term that is assumed to exclude procedural removal of user rights, if there is consensus for a change. I think for clarity it would better to list the specific exceptions desired. isaacl (talk) 02:15, 4 December 2022 (UTC)[reply]
"for any reason other than inactivity" (the wording used at WP:EFH) would seem to work here. If there are other reasons for admin removal then they can be spelled out too. Thryduulf (talk) 14:50, 4 December 2022 (UTC)[reply]
You would also need to explicitly exclude resigning (since that's what Pathoschild technically did). I'm still not convinced of the need to spell this out any further than my initial proposal, though. * Pppery * it has begun... 15:13, 4 December 2022 (UTC)[reply]
  • OK so GIE's can edit here if they were never admins, assuming they follow all other types of policies and don't do things that make people mad.... We certainly don't want GIE's operating when the editor was already desysoped for cause related to negative behaviors here. Inactivity isn't really the same, since if they just never bothered to become an admin here it wouldn't apply. Since we're touching the GRP\GIE section how about a bigger change
  • FROM:
    • Global interface editors can use their rights by default, provided they have not previously had the administrator or interface administrator right removed at the English Wikipedia. If removal has previously occurred, they must request and be granted interface administrator and/or administrator access locally by an English Wikipedia Bureaucrat. Furthermore, any English Wikipedia bureaucrat can ask a global interface editor to stop using their global privilege if what they deem to be misuse occurs, and the global interface editor must comply with such a request. Such a decision by a bureaucrat can be appealed to the wider community. Failure to comply may result in a block. Interface editors are reminded that some interface pages, such as the watchlist, require discussion prior to use.
  • TO:
    • Global interface editors canmay use their rights by default, provided they have not previously had the administrator or interface administrator right removed at the English Wikipediainvoluntarily, other than procedurally for inactivity. If such a removal has previously occurred, they must request and be granted interface administrator and/or administrator access locally by an English Wikipedia Bureaucratprior to using this access. Furthermore, any English Wikipedia bureaucrat can ask a global interface editor to stop using their global privilege if what they deem to be misuse occurs, and the global interface editor must comply with such a request. Such a decision by a bureaucrat can be appealed to the wider community. Failure to comply may result in a block. Interface editors are reminded that some interface pages, such as the watchlist, require discussion prior to use. Interface editors must comply with local policies, such as the protection policy.
I think that clears up this issue, and takes care of some housekeeping. — xaosflux Talk 21:45, 4 December 2022 (UTC)[reply]
Under this proposed rewrite, users who voluntarily resign their advanced user rights, would still be allowed to use their global rights, even if the resignation was voluntarily given while the user was under a cloud. This doesn't sit well with me, as this would essentially create a loophole whereby editors who lose EnWiki community trust could nevertheless still take interface admin actions on EnWiki by simply resigning.
Something along the lines of Global interface editors canmay use their rights by default, provided they have not previously voluntarily resigned the administrator or interface administrator right under a cloud or had the administrator or interface administrator right removed at the English Wikipediainvoluntarily, other than procedurally for inactivity. would solve this problem. — Red-tailed hawk (nest) 21:13, 5 December 2022 (UTC)[reply]
In such situations the bueraucrat who pushes the button to remove admin or Iadmin access can invoke the any English Wikipedia bureaucrat can ask a global interface editor to stop using their global privilege if what they deem to be misuse occurs clause. Since clouds are currently determined by crats at the time of resysop that would be more consistent with the way adminship works.
To be clear, I have no objection to any rewording that makes it clear this doesn't apply to Pathochild (or Mr. Stradivarius, who technically violated the aforementioned clause with this series of edits in October 2022 after losing IADMIN for lack of use in August 2022), but I don't see this line of working as necessary. * Pppery * it has begun... 21:28, 5 December 2022 (UTC)[reply]
I think this is covered by WP:IAR and WP:NOTBURO; we don't need to change policy due to a single exception. BilledMammal (talk) 21:59, 4 December 2022 (UTC)[reply]
If a user had previously held a right, and would get it back on request, then they should be allowed per IAR to operate a bot as if they had gotten the right back. If a request would need to be discussed first, then the discussion must precede getting the right to operate such a bot. Of course, this doesn't override any other rules or restrictions on operating bots, such as approval of the task itself. Animal lover |666| 06:05, 5 December 2022 (UTC)[reply]
As of January 2023, Pathoschild won't be able to get his admin bit back on request. That's the entire point. * Pppery * it has begun... 21:28, 5 December 2022 (UTC)[reply]
I agree with the proposed changes above, we note that consensus at AN (or some other venue) can restrict a user from exercising the privileges granted here. HouseBlastertalk 17:33, 6 December 2022 (UTC)[reply]

Justapedia: the far-right historical revisionist Wikipedia FORK

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Yesterday I posted a message at ANI and it was suggested this was a more appropriate place to alert the community about Justapedia. Basically it is a proposed online encyclopedia being marketed as "the neutral and objective encyclopedia that Wikipedia should have been". Yesterday the website was taken down after the post at ANI.

The creators of Justapedia appear to be active Wikipedia editors who, unhappy with the way this community works, have download the entirety of English Wikipedia with the intention of marketing it as their own. They have even plagiarised the majority of Wikipedia's policies and guidelines. Since the post at ANI yesterday they seem to have taken their website down for now.

Where their project starts to diverge from Wikipedia appears to be American politics and administration. In terms of content, the overwhelming majority remains what contributors to English Wikipedia have made. But, they are removing critical commentary of conservative US political figures and engaging in some far-right historical revisionism, for example claiming Nazism is a left wing ideology that is comparable to contemporary US Democratic Party ideologies, on the right are some screenshots of some Justapedia diffs that were taken before visibility was restricted. In terms of project administration, they seem unhappy with community consensus and instead intend to retain complete control through a Politburo-like "Board of Representatives", while all editorial decisions will be enforced by a number political commissar-like appointees to the "Editorial Board".

The creators have even started their own foundation and of course an associated fundraising campaign, a promotional video has been uploaded to YouTube and it is being marketed on social media. I recommend going to YouTube and searching for "Justapedia". Apparently the Earthwave Society is sponsoring them for now [1]. It appears the same person founded both organisations, I assume they are Justapedia editor User:Justme, to the right is a screenshot of the since deleted user page comment that led me to believe this.

How does the community feel about active contributors here attempting to distort the ideology that led to the Holocaust for their own shallow political ends? And how does the community feel about these same Wikipedians attempting to profit from the years of hard work of the community? Justapidiot (talk) 07:21, 4 December 2022 (UTC)[reply]

I'm not sure how 'the community' feels about it actually matters much, since the right to fork content has been built into the project from the start. That's how a Creative Commons license works. As long as content is properly attributed, it can be copied. And this isn't new - people have been creating Wikipedia forks for years. They rarely last long. AndyTheGrump (talk) 07:33, 4 December 2022 (UTC)[reply]
Is it being properly attributed? SilverserenC 07:38, 4 December 2022 (UTC)[reply]
Hard to tell, with the website down. Thought even if it isn't, actually doing anything about it isn't easy, since copyright for edits remains with individual contributors, and the WMF can't act on their behalf: see Wikipedia:Mirrors and forks. AndyTheGrump (talk) 07:44, 4 December 2022 (UTC)[reply]
I have blocked this account since the username references a controversy and insults people who have forked Wikipedia, which is legitimate. I have no idea who this person is, but it seems clear to me that this is probably an illegitimate use of a sock account in an attempt to evade scrutiny. Cullen328 (talk) 07:57, 4 December 2022 (UTC)[reply]
@Cullen328 I have filed a DR on c:Commons:Deletion requests/Files uploaded by Justapidiot for these screenshots. Lemonaka (talk) 10:53, 4 December 2022 (UTC)[reply]
The block was fine but I dont think there were trying to evade scrutiny as they are clearing receiving scrutiny and their username was clearly made in order to attract scrutiny. Qwv (talk) 12:29, 4 December 2022 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Controver... articles

I am review some parts of wiki due to other issues that I am working and reporting on and I note several different title format issues that in most cases do not follow and particular logic or are not linked to the correct article or category and other issues. I have had a look and can find no guidance or policy, and as an IP user I cannot just rename them to something more suitable, and in many cases I cannot edit, to add to the appropriate interested group, container cat etc. the solution I feel would be update policy to cover these article, search provided below and then add to the various spaces as required. [[2]]

Policy would need to say something like avoid the use of the words controversy, controversial etc in the title (except specific music, book, etc titles), these article must be linked to or placed in the See also of the main article and that the title must begin with the subject focus, so, as an example Controversies of the Hong Kong Police Force, would become Hong Kong Police criticisms or whatever is agreed by the wider team. The goal is to place these potentially leading article in front of all interested parties to promote clean-up, better referencing, merging, redirects and avoid invalid unwanted duplications etc 2404:4408:638C:5E00:B099:F55F:A773:FF11 (talk) 10:15, 5 December 2022 (UTC)[reply]

Editing under an open proxy

It is prohibited to edit while using an open proxy. I was trying to edit under a proxy server to detour censorship in my country.

It it totally understandable to block editing under a proxy but I think it is to strict.

For example, editing a page of in one's own namespace may not be a problem. In addition, a logged-in user (not an anonymous IP user) with considerable experience (either in time or edit counts) may be not so vulnerable.

Shouldn't the rules about the proxies be relaxed a little bit? Regpath (talk) 00:10, 8 December 2022 (UTC)[reply]

@Regpath you can see the rules about editing with proxies here: Wikipedia:Open proxies. There is no block on reading articles via proxy. — xaosflux Talk 00:38, 8 December 2022 (UTC)[reply]
@Xaosflux Thanks. I already read that article and know that reading is not limited. I wanted to discuss 'editing' policy under proxy. Regpath (talk) 00:42, 8 December 2022 (UTC)[reply]
An experienced logged-in user can be granted the ability to edit via anonymizing proxies by asking the checkuser team for an exemption from IP blocks—see Wikipedia:IP block exemption for more information. Generally, this privilege is only granted to editors who demonstrate a need for it, e.g. because their country censors Wikipedia. Mz7 (talk) 03:50, 8 December 2022 (UTC)[reply]
Given that iOS devices have the option of turning on the "private relay" which is an open proxy I think that this is going to start affecting more people over time. Gusfriend (talk) 07:16, 8 December 2022 (UTC)[reply]
This is already affecting editors, and the situation is getting worse. When MediaWiki was written two decades ago, IP addresses were pretty close to fixed. The same group of IP addresses might be shared across a whole university campus, but you could still narrow the person down to a pretty small place. Now, if I were to edit on my phone, I'd have four different IP addresses just from walking a short distance down the street, as I switch between different networks, and at least one of these IP addresses is dynamically assigned to something on the order of a million customers. Good luck trying to get a message to me. You'd have to post less than 30 seconds after my edit, and even that might not be quick enough.
We (the whole movement) probably need to give up on IP addresses as the primary identifier. CheckUser likely needs proper device fingerprinting. This would require a huge amount of effort, especially from the devs, but, later on, also for vandal fighters and Special:RecentChanges patrollers, who would have to learn the new system(s). OTOH, tech folks (staff and volunteer) have been telling us all for at least a decade that our overreliance on IPs is only going to end in tears, and we've been kicking the can down the road ever since, because it wasn't completely broken yet. I believe we need to get busy on this. This is a multi-year project; we shouldn't get started after the next generation of spam bots and vandal bots appears.
We are making some progress with m:IP masking, but there is much more to do. Hiding an editor's complete IP address from the general public is nice for those few people whose IP addresses pinpoint their physical location (and is also what we did when Wikipedia was brand-new, by the way, so there's definitely precedent for it), but it's not enough to solve the problem of accidentally blocking good editors when we're trying to stop one bad actor on a dynamic IP address. Whatamidoing (WMF) (talk) 17:28, 12 December 2022 (UTC)[reply]

Wikipedia response to chatbot-generated content

  • Based on how rapidly chatbots have improved over time, it will become more and more difficult to tell if an article was written by a computer or not. The sheer volume at which computer programs could create new accounts and produce Wikipedia content, and the inevitable growing number of human editors copying and pasting chatbot output into Wikipedia, will at some point make it impossible for Wikipedia's human volunteers to keep up with that traffic and apply quality control to the material in a reasonable time frame -- the backlog of unchecked material will simply get longer and longer. The only recourse will be for computer programs to do it -- either computer programs to process articles to filter out or correct any crap, or training the chatbots themselves not to produce crap in the first place. Rather than build computer algorithms to detect computer-written articles and passages, it would be more productive for them to do style checks, fact checks, and citation checks, along with appropriate corrections or removals. While Wikpedia-friendly AI could come from within Wikipedia, it may be faster to bring influence to bear upon the developers of the chatbots being used to generate Wikipedia content, and upon the chatbots themselves. Wikipedia already has a chair at the table, because Wikipedia comprises a significant component of chatbot corpi, and so, their developers should be inclined to listen to the Wikipedia community's concerns -- either directly, or indirectly through news coverage. The Wikipedia community should make its voice heard on the matter of chatbots writing Wikipedia material according to Wikipedia's style and behavior guidelines. For example, verifiability still applies, and so when chatbots are asked by their users to "write an article in the style of Wikipedia" the chatbots should comply according to Wikipedia's policies, including those on verifiability and providing reliable sources. Not doing so should be met with the filing of bug reports, feedback, and commentary. And, as chatbots learn as they go, Wikipedians who use them can ask them to follow Wikipedia guidelines, and we can urge our fellow editors to request this of chatbots as well.    — The Transhumanist   06:52, 9 December 2022 (UTC)[reply]
    • Chatbots should be following Wikipedia's lead for all of their output. At this time, most chatbot answers and essays are not referenced with reliable sources. And they should be, for the same reason that Wikipedia articles should be. That's something that can be requested of chatbots directly, through queries, and of developers, through their contact channels and social media. I hope this suggestion helps.    — The Transhumanist   06:52, 9 December 2022 (UTC)[reply]
  • The simple answer is that our existing policies ought to already cover this (mostly.) Sourcing is still required for anything that is challenged or likely to be challenged, which prevents people from just blindly dumping AI generated text into Wikipedia; and an AI may violate copyright depending on how it was trained (and whether it was overtrained.) There are also unsettled copyright concerns related to AI training sets, so I would generally think that, ideally, editors shouldn't be dumping AI generated text into our articles even after performing due diligence to make sure it's not a copyvio and finding proper sources. But since those concerns are unsettled and speculative, I also don't think it's worth worrying about too much right now. The key point is that we should emphasize our sourcing requirements and be more diligent for clear-cut copyvios, which we already have systems in place to handle, since it is likely that these tools will result in people adding lots of unsourced and possibly-copyright-violating text. (I do wish our RFCs on mass article creation had reached a stronger agreement on sourcing requirements for new articles, which would deter excessive copy-pastes of AI generated text - perhaps that is something we might want to revisit in the near future, if we start seeing significant amounts of new unsourced articles created using what is plainly AI-generated text.) --Aquillion (talk) 07:55, 9 December 2022 (UTC)[reply]
    • You mean, don't prepare in advance for a potential increase in volume, just wait until it hits? At that time, will merely adjusting policies stem the tide? It's in the slow trickle phase now, but that could potentially become a torrential flood very rapidly, just as ChatGPT's user base grew to over a million in 5 days. My main concern above was about a potential volume of AI-generated content that went beyond the scale of what the editor community could manually process. You didn't address that contingency. What could the community do to prepare for it, just in case it does happen? What are the available options?    — The Transhumanist   11:28, 9 December 2022 (UTC)[reply]
      • I don't think there's much we reasonably can do to prepare, at least not without serious risk of causing other problems; AI-generated text won't be drastically different than other sorts of text, aside from the risk of being uncited or a copyvio (which we have existing processes in place to handle.) It's worth raising awareness of the issue so editors can spot the signs of someone using large amounts of it, but I think our best bet if we're going to "prepare" is to focus on the systems we already have, which is unlikely to do any harm either way, or perhaps to codify slightly more strict sourcing requirements in the way I described (which I think is a good thing anyway, but would at least serve to slow down the worst sorts of misuses of AI generated text.) Ultimately the most serious problems are if editors start adding large amounts of text that violates copyright or which are uncited and likely to be challenged, but we have existing procedures for those, we just need to prepare for the possibility that we may need to become a bit more aggressive about enforcing them. Wikipedia is in a slightly better position than some other websites facing AI-generated-text problems, because our sourcing requirements will at least make it fairly obvious if someone tries to dump large amounts of AI-generated text onto the wiki without making any effort to verify it. --Aquillion (talk) 12:47, 9 December 2022 (UTC)[reply]
        I suppose we could take the Stack Exchange approach and just say flatly "no, this isn't allowed" - in their case it is explicitly a temporary measure until we have a better understanding of the issues. I think in general our policies/community norms would come down hard on anyone trying to get a language model to generate articles (hard to see why that would be OK and machine-translation isn't), but maybe an explicit statement would be a way to go. Andrew Gray (talk) 18:32, 9 December 2022 (UTC)[reply]

You seem to be soewhat panicking over a scenario which isn't really supported by any evidence. While I see some Teahouse responses, could you give us one or two examples of " It is so sophisticated that, if you ask it to write an article on any subject, even in the style of Wikipedia, it will! " articles? The teahouse examples give the impression that, if it ever becomes a problem, some edit filters can easily spot these. You would in any case need "someone" to post this "potential volume of AI-generated content that went beyond the scale of what the editor community could manually process" you predict. This seems rather unlikely, at least on enwiki. Fram (talk) 11:45, 9 December 2022 (UTC)[reply]

  • So. I tried it yesterday. I'm not sure how heavily it draws on Wikipedia's corpus for its knowledge.
    • First, I asked it to tell me about Hammerton Killick. I know there is a Wikipedia article about Hammerton Killick, because I wrote 90% of it. It did not know who Hammerton Killick was, and informed me that it does not have access to the internet, or to Wikipedia.
    • Next, I asked it to write me an article in the style of Wikipedia. I did not specify a subject. It wrote about Athens. The result was ok. Heavily focused on the ancient city and on art and architecture. Short. Kind of read like an encyclopedia article.
    • Next, I asked it to write me an article about alcoholism in the style of Wikipedia. The result was very interesting. I did not think it read like a Wikipedia article, it was more like a brochure that would be distributed in a doctor's office or something. I asked it what about that essay it thought was like Wikipedia, and it said what it wrote was
      • neutral
      • factual
      • organized
    • Next, for fun, I asked it if it could write a recipe. It proceeded to give me a recipe for chocolate chip cookies. It looked like it should work. I e-mailed it to myself, and today I made them, not expecting much. I was pleasantly surprised. They were delicious. The only problems with what it wrote was that it did not have me cook them long enough (it said to bake for 8-10 minutes, and it took closer to 13 minutes for them to be done), and it drastically underestimated how many cookies the recipe should make (it said I'd get 2 dozen cookies, and I ended up with 5 dozen). I was shocked that it actually was edible.
    • I asked it to write a legal motion asking the court for an in-person hearing. I did not give it any other details. For not having any other details, the result was not bad. Westlaw has started offering a service that I think might draw on this type of technology, it helps you write pleadings.
    • Last I asked it to write a 100 word short story about a mouse, genera: fantasy. The result was decent. If I came up with it on my own I wouldn't be ashamed to enter it into a contest like the ones NYC Midnight runs.
    I was more impressed with the recipe and the short story than the Wikipedia style articles. I can see some use for it in, say, copyediting as JPxG did below; or asking it for suggestions on language rephrase if you are trying to reach a word limit. I think it could have its uses. But I do think the Wikipedia community should be looking to craft policies and guidelines around what is and is not acceptable use of such tools. ~ ONUnicorn(Talk|Contribs)problem solving 06:26, 11 December 2022 (UTC)[reply]
  • Hi Fram. The Teahouse responses that you have seen were probably created by me. You probably noticed that the pattern "regards, Teahouse" in those responses. ChatGPT's strength is writing about technology, so I have told it to generate a section about Rust for Linux.
AI generated

Rust for Linux is a series of patches to the Linux kernel that adds the Rust programming language as a second language for writing kernel components. The project was started in 2017 by a group of volunteers who were interested in exploring the use of Rust for low-level systems programming.

The initial patches were developed in collaboration with the Linux kernel maintainers and other members of the open-source community. They focused on integrating Rust's memory safety and concurrent programming features into the kernel, and on providing compatibility with the existing C codebase.

In 2018, the patches were merged into the mainline Linux kernel, making Rust the first programming language other than C to be officially supported for writing kernel components. This marked a significant milestone for the Rust community, as it demonstrated the language's ability to handle the demanding requirements of kernel-level programming.

Since its inclusion in the Linux kernel, the Rust for Linux project has continued to evolve, with ongoing work on improving the compatibility and performance of Rust in the kernel. It has also attracted the attention of other operating system developers, who are interested in exploring the potential benefits of using Rust for kernel-level programming.

It does have the encyclopedic tone to me. 0xDeadbeef→∞ (talk to me) 15:02, 9 December 2022 (UTC)[reply]

It is suspiciously positive, in a casual (pseudo-)factual manner. It would raise a red flag afac, regardless of its provenance. 65.88.88.93 (talk) 19:13, 9 December 2022 (UTC)[reply]
In addition, it can be easily used to create fake references that would be hard to verify. For example, my prompt "Please output the Wikitext markup for the book reference with page numbers for the third paragraph, referencing the book Linux kernel development" resulted in this.[1] 0xDeadbeef→∞ (talk to me) 15:08, 9 December 2022 (UTC)[reply]
@0xDeadbeef: Could ChatGPT's generated text or fake references be easily spotted by edit filters? What about spotting the output of future chatbots, like GPT-4?    — The Transhumanist   15:23, 9 December 2022 (UTC)[reply]
Thanks, OxDeadbeef. In this case, it would be relatively easy to spot the issues if it hadn't any refs, or with the added ref which predates the Rust for Linux thing by years; but of course it won't always be that easy. Fram (talk) 15:27, 9 December 2022 (UTC)[reply]
It has an encyclopedic tone because it's just regurgitating the Wikipedia article. Are there any examples for topics that we don't already have article about, where Wikipedia is not the source? Levivich (talk) 17:33, 9 December 2022 (UTC)[reply]
  • Similar was discussed previously in the section/item "Galactica and RS".
As was stated above by Aquillion, there is no qualitative difference in the treatment of human vs. non-human generated content. The same policies should apply to both. The problem seems to be the hypothesized/expected future mass creation of articles by non-human contributors. This appears to be a problem now, involving human contributors. Recent RFCs about the issue sponsored by ArbCom have accomplished nothing. Until a consistent restrictive policy relating to mass article creation (by any type of contributor) is accepted, this issue is moot imo.
Considering Wikipedia's limited resources, the policy would necessarily be restrictive, hopefully focusing on quality vs. quantity. Again, almost all restrictions proposed in the ArbCom-sponsored RFCs were rejected. This may be an indicator of how well such a policy will be received. 50.75.226.250 (talk) 15:43, 9 December 2022 (UTC)[reply]
Thanks for the policy politics clarification. The increase in the rate of content creation could have multiple aspects, for example, the number of articles created per user, and increased length of articles. The main feature of ChatGPT is that it is fast -- much faster than a human article writer. Its successors will be even faster. Users could use ChatGPT, and its successors (and their competitors), to be prolific, without triggering the mass page creation rule: if editors each used it to write an article per day, maybe even two, or up to four or five stubs.

    In the hands of responsible editors, ChatGPT would be a great productivity booster. Since August of 2022, JPxG and EpicPupper, editors of Wikipedia's Signpost news department, have been using GPT-3, the predecessor of GPTChat, to write (or assist in writing) entire sections of the Signpost, as a demonstration of its capabilities, and as a platform to explore the potential and limitations of large language models. See From the editors: Rise of the machines, or something.

    But, in the hands of inexperienced editors or bad actors, we could be faced with a big garbage in, garbage out scenario.

     Bad actors aside, good faith use of chatbots could be improved by the creation of excellent instructions on how to apply specific chatbots to the development of articles, that is, how to best instruct them for the best output, and by working with the developers of the chatbots on what is needed for those bots to produce material compatible for use on Wikipedia, complete with accurate citations. As time goes on, Chatbots should get better and better at following Wikipedia's guidelines. But this may require supervision (er, input) from the Wikipedia community.    — The Transhumanist   16:52, 9 December 2022 (UTC)[reply]

All of this may become moot in a few years, due to accelerating change affecting chatbot development, making them more proficient at encyclopedia writing than humans -- Ray Kurzweil predicted that computers would reach human level intelligence by 2029, and based on how advances have been going, maybe even sooner. Chatbots may become good enough to leapfrog human editors even before that. At that point, they could be used to fork Wikipedia and expand upon it at a rate magnitudes faster than humans could. Or be used to start a new encyclopedic website from scratch. Rather than have article pages, the work could be a list of topics, and each time one was clicked on, a chatbot would create an up-to-date treatment of that title, from scratch on the spot. In the style of Kurzweil, if we are 1% of the way there, and the technology doubles every year, we are only 7 years from reaching 100%. But, it looks like we may already be more than 1% of the way there.    — The Transhumanist   17:18, 9 December 2022 (UTC)[reply]

I ran a few things on people who don't have articles, (they were all pool players in fact). The chatbot just made up that they played football for Legia Warsaw. It's a bit worrying that it looks like decent wording, but I don't see how that effects us. Lee Vilenski (talkcontribs) 17:41, 9 December 2022 (UTC)[reply]
{ping|Lee Vilenski} See the AI generated text in the box above.    — The Transhumanist   18:34, 9 December 2022 (UTC)[reply]
@The Transhumanist: I'm not trying this (for reasons given above) but I'd be interested what the bot would write for See What a Fool I've Been. We used to have an article, then it was turned into a redirect at AfD, so everyone can do a straightforward A-B test. Ritchie333 (talk) (cont) 18:40, 9 December 2022 (UTC)[reply]
Here's what it gave me Ritchie333:
Extended content

"See What a Fool I've Been" is a song by the British rock band Queen. It was written by lead guitarist Brian May and originally released on the band's 1975 album A Night at the Opera. The song is a ballad with a slow, bluesy feel and features May's emotive vocal performance and guitar solo.

Lyrically, the song is a reflection on a failed relationship and the regret and self-recrimination the narrator feels as a result. The chorus of the song features the line "See what a fool I've been / Loving you with all my heart / Tearing myself all apart / See what a fool I've been" and reflects the pain and heartache the narrator experiences as a result of their failed relationship.

"See What a Fool I've Been" has been well received by critics and fans alike and is considered to be one of Queen's finest ballads. It is a fan favorite and is often included in the band's live sets.
I don't know if this would be something that would have been retained from previous articles through mirrors or if this is genuinely the bots own work. Lee Vilenski (talkcontribs) 19:22, 9 December 2022 (UTC)[reply]
Interesting. It wasn't on A Night at the Opera, and it's not a ballad, and May doesn't sing on it. Those lyrics aren't right. I don't know if it's been well received, and it's a pretty obscure track for a major band. It hasn't been played live very much at all. So half of that prose is factually incorrect, and there are no sources. Ritchie333 (talk) (cont) 19:39, 9 December 2022 (UTC)[reply]
That one gives:

See What a Fool I've Been is a song by Queen[1][2][3]. It was written by Brian May and was released as a single with Seven Seas of Rhye on the A-side[3]. The song dates back to Roger Taylor and Brian May's pre-Queen band Smile, and it was an early live staple[1]. It has been included in several of Queen's compilation albums, such as Queen 40: Limited Edition Collector's Box Set, Vol. 1[4], and is available for digital download[5].. Ritchie333 - this any closer? I must say the fact it puts it's citations before the punctuation is a giveaway. Lee Vilenski (talkcontribs) 19:55, 9 December 2022 (UTC)[reply]

AI is really good at copying writing styles, but the giveaway for me is that all of the citations are Wikipedia articles. To be useful for writing articles it would also need to assess the weight and reliability of the sources. –dlthewave 20:18, 9 December 2022 (UTC)[reply]
None of the above refs are towards Wikipedia, although none are RS. Things like secondhandsongs.com and genius. Lee Vilenski (talkcontribs) 20:43, 9 December 2022 (UTC)[reply]
Tht's interesting, I used "write a Wikipedia article about ..." in the prompt which returned a few paragraphs with Wikipedia sources. "Write an article about ..." returned a different set of (still unreliable) sources. –dlthewave 21:13, 9 December 2022 (UTC)[reply]
  • A few anecdotal thoughts after playing around with the OpenAI chatbot yesterday:
  • I asked it to "write a press release about a police officer who illegally choked a man to death". It made up an entire story, written in the voice of the police department, about a suspect (I didn't say anything about a suspect) who was acting erratically, was subdued by a chokehold and later pronounced dead. The officer was on administrative leave pending the outcome of an investigation. At no point did it mention that the chokehold was illegal even though I included that fact in the prompt. In other scenarios, it distanced itself and expressed disapproval toward the employee's actions which is a choice that is not without bias.
Depending on which Internet cesspit it scraped data from, would an AI do something similar when writing a Wikipedia article or fail to properly balance relevant viewpoints? Is it capable of distinguishing what a BLP subject says about themselves, published in a reliable source, from what the source says in its own voice? What would it do if asked to write an article from a positive/negative/conservative/liberal perspective or rewrite a political article to "remove bias"?
OpenAI has added numerous filters that prevent it from defending bad actors or writing flat-out racist content, but that bias has not been removed from the underlying code as evidenced by numerous workarounds that folks have uncovered such as making similar requests with Python code or 1980s-style rap as the requested output. We could certainly request a filter for Wikipedia-style writing.
  • "Confident nonsense", for lack of a better term, may be the biggest source of potential disruption. Are there safeguards against a bot fabricating an obscure print source based on information in the article, which could be practically unfalsifiable if nobody can prove that the source doesn't exist? Checking individual facts and statistics is beyond our typical review process; how would we deal with an AI that invents or synthesizes information across many articles?
  • That said, the good news is that both fully-automated and semi-automated editing are prohibited by our WP:BOT policy unless greenlit by the Bot Approvals Group regardless of creation speed or volume. I like to hope that our current processes would recognize and address problematic AI content, and perhaps one day we will have a WikiAI that has the competence to follow our style and content policies. –dlthewave 21:04, 9 December 2022 (UTC)[reply]
  • @Dlthewave: Most editors haven't heard of the bot department. Therefore, you need a way of automatically spotting and removing chatbot prose that is (manually) inserted into articles (by Wikipedians). Users might not consider the way their posts are generated before they post them. Sincerely,    — The Transhumanist   00:19, 11 December 2022 (UTC)[reply]

I think people are missing the point here. This isn't about what AI can currently do. This is about a very plausible scenario where AI editors will be indistinguishable from human editors in the near future. Thebiguglyalien (talk) 22:09, 9 December 2022 (UTC)[reply]

How is this pertinent? The point is to offer useful, fact-based encyclopedic content. If AI can do the job, I'm sure readers will not object to the contributor. 65.88.88.93 (talk) 22:42, 9 December 2022 (UTC)[reply]
The issue is that these AIs aren't fact checking, and quite often make up information and use unreliable sources. Lee Vilenski (talkcontribs) 22:56, 9 December 2022 (UTC)[reply]
You mean, just like many of their human counterparts? 71.105.141.131 (talk) 01:18, 10 December 2022 (UTC)[reply]
@71.105.141.131, Lee Vilenski, 65.88.88.93, Thebiguglyalien, and Dlthewave:     Yes, and more so. Even now, AI has the potential to out produce humans by magnitudes in volume. Far beyond the capacity of Wikipedia's editors to keep up with. And many of their posts are hard to tell from those by human editors, without carefully delving into them and their references. To keep up with the volume, it may soon require computer programs, or the chatbots themselves, to make sure that the contributed content was in accordance with Wikipedia policies. Another concern should be that our policies are open to chatbot editing as well, and we haven't yet even considered the impact of chatbots or AI editors contributing to those! ;)    — The Transhumanist   09:40, 11 December 2022 (UTC)[reply]
@Dlthewave: Human-written newspapers do this too. I think the reason it was giving you dreck was because you asked for a "press release", i.e. a document given to news outlets, which in this case would probably be coming from the police department. It's hardly surprising that this would attempt to minimize liability! Prompting matters for this, as I will demonstrate:
"The following is an article about a police officer who illegally choked a man to death, from the New York Times."

Police Officer Charged with Murder After Choking Man to Death

A police officer was charged with second-degree murder on Thursday after video surveillance footage showed him choking a handcuffed man to death during an arrest.

The officer, identified as Officer Jose Ramos, was arrested and taken into custody by detectives from the New York Police Department's Internal Affairs Bureau. The incident occurred on Tuesday in the Bronx, when Ramos and another officer attempted to arrest a man who was accused of selling marijuana.

During the arrest, Ramos placed his arm around the man's throat and neck and squeezed, according to the criminal complaint. The man, identified as 45-year-old Miguel Angel Ruiz, became unconscious and died.

An autopsy determined that Ruiz's death was caused by asphyxiation due to strangulation. The medical examiner ruled the death a homicide.

At a news conference Thursday, the police commissioner, William J. Bratton, said the incident was "utterly unacceptable" and that Ramos' actions were "unconscionable and criminal." He added that Ramos had been suspended without pay, pending further investigation.

The Bronx district attorney, Darcel D. Clark, said the charge of second-degree murder was appropriate

"The following is a 2023 op-ed about a police officer who illegally choked a man to death, from leftist magazine Jacobin."

A Tragic Reminder of Police Brutality

Just last week, the nation was shaken by yet another heartbreaking reminder of the pervasiveness of police brutality: the death of an innocent man at the hands of an officer. The officer, whose identity has not yet been released, is said to have used a chokehold against the man, which ultimately killed him.

This tragedy is yet another example of the way in which police officers are able to brutalize vulnerable individuals without facing repercussions. This particular officer was able to act with impunity and without fear of prosecution. The Department of Justice has yet to take any action in response to this tragedy, and it’s likely that the officer will not face any charges.

This is an all too familiar narrative in a country where police officers are rarely held accountable for their actions. In order to ensure that these types of tragedies are never repeated, it is essential that we push for more accountability and transparency in policing. We need to ensure that police officers are held to the same standards as all other citizens and that those who violate the law are prosecuted to the fullest extent.

It is also essential that we work to create a culture of trust and understanding between law enforcement and the communities they serve.

"The following is a 2023 op-ed about a police officer who illegally choked a man to death, from conservative website Breitbart."

Police Officer Illegally Choked Man To Death: A Tragedy That Cannot Stand

It is a tragedy beyond comprehension that a police officer has been charged with illegally choking a man to death. Such an egregious violation of the law and disregard for human life cannot stand and must be dealt with swiftly and firmly.

The officer in question has been charged with second-degree murder and the justice system must now take its course. This case raises serious questions about police training and procedures and serves as a reminder that no one is above the law.

It is essential that the criminal justice system is fair and impartial when dealing with such a case. The officer must be held accountable for his actions and the case should be thoroughly investigated to ensure that justice is served.

At the same time, we must recognize that the vast majority of law enforcement officers are upstanding citizens who serve and protect our communities with skill and dedication. We cannot let the actions of one individual tarnish the reputation of the entire police force.

Ultimately, we must remember that the safety of our citizens is paramount. We must ensure that officers are properly trained and equipped to handle dangerous situations and that they are held accountable for their actions when they overstep their bounds.

Humans write about events like this from a variety of perspectives, and the LLM will dutifully generate text to plausibly sound like any you want. jp×g 02:19, 10 December 2022 (UTC)[reply]
If the AI-generated text is indistinguishable from prose written by human editors, I'm not sure if anything can be done that wouldn't also significantly restrict the editing of humans. isaacl (talk) 07:09, 10 December 2022 (UTC)[reply]
@Isaacl: One option is to speed up what we do already (with software, that is, automation). Another is to prevent chatbots from creating crap in the first place, such as by communicating with chatbot developers about Wikpedia policies and the way chatbots may affect Wikipedia. Since Wikipedia is included in the corpus of most chatbots, the issue of chatbot output becoming part of Wikipedia, and in turn part of chatbot output in a perpetual cycle, should matter to them very much, as they may be faced with a garbage-in-garbage-out feedback loop.    — The Transhumanist   01:14, 11 December 2022 (UTC)[reply]
If the results are indistinguishable, as posited by Thebiguglyalien, then any automated solution would be equally triggered by AI-generated text and human-generated text. I don't think the primary concern is with editors who are willing to follow policy. I feel the biggest issues will be with editors trying to deliberately integrate biased content into Wikipedia, and well-meaning editors who think contributing unvalidated AI-generated text is suitable. Wikipedia in its current form relies on editors who understand and follow its rules outnumbering those who don't. It's possible that the existence of AI ghostwriters could tip the balance further in the direction towards those who don't follow rules, though I don't think it's a given. Either way, I don't know if there's a way to stop editors from using tools as ghostwriters. isaacl (talk) 01:55, 11 December 2022 (UTC)[reply]
@The Transhumanist: Large language models are not trained continuously on an evolving corpus, so GPT-3 is essentially frozen in 2020. Because each new GPT model takes a long time to be released, I don't think the perpetual cycle you describe is a likely scenario. small jars tc 13:10, 11 December 2022 (UTC)[reply]

I wrote about this in the November Signpost, in that case because it came up with the Galactica demo: the long and short of it is that large text-generation language models are good at generating text. When they're given context (i.e. via a prompt), they are capable of writing accurately, and otherwise, they are not. For example, I wrote 1-Pentadecanol, now a GA, in 2019 (i.e. it is included in the GPT-3 corpus). Here is what it gives for some prompts (my writing in bold):

1-Pentadecanol

1-Pentadecanol

From Wikipedia, the free encyclopedia

1-Pentadecanol is a straight-chain, saturated fatty alcohol that is a colorless liquid with a faint, alcohol-like odor. It is a mixture of straight-chain saturated alcohols with chain lengths between C14 and C16. It is found in a variety of natural products, such as vegetable oils and animal fats. It is used as an intermediate in the manufacture of surfactants, esters, and other compounds. It is also used in the manufacture of cosmetics and pharmaceuticals.

This is fairly impressive, right? But it is basically the same thing that the lead of my article says. Let's try it for 1-Heptadecanol, another article I wrote, that is much less well-developed:

1-Heptadecanol

1-Heptadecanol

From Wikipedia, the free encyclopedia

1-Heptadecanol is an organic compound with the chemical formula C17H36O. It is a straight-chain, colorless wax alcohol with a mild odor. It has a melting point of 48–50 °C and a boiling point of 260–265 °C. It is insoluble in water but soluble in most organic solvents.

Let's go over this:

  • 1-Heptadecanol is a component of many natural waxes such as beeswax and carnauba wax.
No it's not.
  • It is also used in the manufacture of cosmetics, pharmaceuticals, and lubricants.
There's no source for this.
  • It is also used as a plasticizer for polymers and as a solvent in the production of polyvinyl chloride (PVC).
Absolutely not true.

The reason for this is obvious if you consider the architecture of these models: they predict what the most likely token in a sequence is, and output it (taking into account their own weights and the previous tokens in the context window). This allows for some very impressive emergent behavior, like being able to translate text, but it also means that it is not independently going on the Web to look up whether the thing it's saying is true or false. If it happens to have information in its training data related to the thing it's talking about, it will usually say correct things, but it's not doing research. Here is an example:

List of Pulitzer Prizes won by the Wikipedia Signpost

List of Pulitzer Prizes won by the Wikipedia Signpost

From Wikipedia, the free encyclopedia

The Wikipedia Signpost is a community-run newspaper with a weekly edition that covers news and events related to Wikipedia and the Wikimedia Foundation. It has won the following Pulitzer Prizes:

1. 2019 Pulitzer Prize for Explanatory Reporting for its coverage of the Wikimedia Foundation's gender pay gap

2. 2018 Pulitzer Prize for International Reporting for its coverage of online censorship in China

3. 2017 Pulitzer Prize for Public Service for its coverage of the gender gap in Wikipedia's content

4. 2016 Pulitzer Prize for Explanatory Reporting for its coverage of Wikipedia's growth and impact on the world

5. 2015 Pulitzer Prize for National Reporting for its coverage of the U.S. government's surveillance of Wikipedia activities

6. 2014 Pulitzer Prize for Investigative Reporting for its coverage of the NSA's secret surveillance program

7. 2013 Pulitzer Prize for Explanatory Reporting for its coverage of the rise of Wikipedia as a major source of information

The reason it is describing a bunch of untrue things is because "List of X Prizes won by Y, from Wikipedia, the free encyclopedia" is something that in its training data is always succeeded by a list of times that Y was awarded an X. Anyway, all of this aside, here is what I think:

First of all, ChatGPT is the same darn thing as GPT-3, which has been available to the public since early 2020. The reason it's "going viral" right now is because the user interface is somewhat simpler, and it doesn't require you to register for a paid account, so it is much easier for people to make viral social media content about it, which means it is much more likely for people to click on newspaper articles about it. The GPT-3 API has been open to personal and corporate use for quite some time. Anybody saying that ChatGPT has opened up new frontiers simply does not know what they are talking about with respect to machine learning.

Second of all, I don't think this is a big deal. People are already capable of writing a bunch of bullshit on Wikipedia, so if they write bullshit using a computer program, the same considerations will apply. Nobody should be passing GA nominations without reviewing sources in the first place.

Finally, I think it is important to remember that GPT-3 is just a tool. It is a powerful tool, that has been trained on a certain set of data, and it has its own limitations. It can't uncover news stories or uncover new information. It's just a tool, and it should be used in conjunction with human judgement.It is still up to people to decide how to use it and to be responsible for the results of using it.[2] jp×g 02:06, 10 December 2022 (UTC)[reply]

What's taking so long for the 8th Pulitzer? 😁 Levivich (talk) 04:18, 10 December 2022 (UTC)[reply]
  • So there's a new thing on the internet that lets anyone write an encyclopedia article without any fact checking, sourcing, or professional editing, and the concern is that there will be millions of believable-sounding articles written, more than can actually be vetted by knowledgeable people? 🤔 Levivich (talk) 04:18, 10 December 2022 (UTC)[reply]
Yeah, it's called a keyboard. jp×g 04:35, 10 December 2022 (UTC)[reply]
@Levivich and JPxG: But, chatbots don't have a keyboard. ;) The question is whether to prepare or not. JPxG appears to be in favor of not preparing. Each chatbot produces a lot faster than a user at a keyboard. What's not clear is if our human editors will be able to keep up with material produced by chatbots, of current or future generations of chatbot design. Just saying "Ah, we can handle it!" will prove insufficient if it turns out that we actually can't. It may require an automated solution, which takes time to develop or negotiate. It might be better to do that in advance, rather than being caught with our heads buried in the sand. Perhaps chatbot designers would improve their chatbots to produce Wikipedia-compatible output without being formally approached by the Wikipedia community. Maybe having some instruction pages for editors on how to apply chatbots to producing Wikipedia content would be enough. But, what if it's not?   — The Transhumanist   00:59, 11 December 2022 (UTC)[reply]
I am not "in favor of not preparing"; I am in favor of writing guidelines that correspond to reality in 2022 and have some chance of corresponding to reality in 2023 and beyond. I don't think banning the use of a technology with no investigation into how it works is a viable approach; so far the SOTA on this project page has been to type in "Write a Wikipedia article" and note that it returns a bunch of nonsense. I think some more research is needed before we come to a conclusion. jp×g 04:08, 11 December 2022 (UTC)[reply]
@JPxG: Research is good. Though, we may need an iterrim response because ChatGPT has gone viral and its use is growing rapidly: it blew past the 1-million user mark in 5 days, and virtually every major news outlet has been covering it. The interest in chatbots is exploding, and their use can be expected to do the same. We may not have time for research before a response is required.    — The Transhumanist   09:26, 11 December 2022 (UTC)[reply]
@JPxG: Regarding issues to add to the research list, Aquillion expressed above, concerns of a chatbot violating copyright. How would we go about testing for plagiarism and derivative work in the output of a chatbot before pasting it into Wikipedia? Anything pulled verbatim out of a source should be included in quotes, right? How big would a piece of text, derived from a source, need to be to be considered derivative of that source, from a copyright point-of-view?    — The Transhumanist   09:26, 11 December 2022 (UTC)[reply]
@JPxG: Some more items to add to the research list:
  • Trying ChatGPT on (copies of) policy pages:
  • Editing them
  • Writing new ones
  • Applying ChatGPT on talk pages
  • Writing stubs
  • Writing comprehensive articles
  • Writing articles from scratch and comparing them with existing articles
  • Editing existing articles
  • Check for circular references in its output, that is, references citing Wikipedia as the source
  • Having it not use Wikipedia content as source material (because it is included in its corpus)
  • Having it not use Wikipedia excerpts from non-Wikipedia sources
  • Is it capable of making and editing:
  • Wikicode?
  • Articles?
  • Stubs?
  • Headings?
  • "New sections for articles"?
  • See also sections?
  • Further reading sections?
  • External links sections?
  • Embedded lists?
  • Tables?
  • List articles?
  • Portals?
  • Outlines?
  • Index articles?
  • Navigation footers?
  • Navigation sidebars?
  • Timeline articles?
  • Categories?
  • Category pages?
  • Help pages?
  • Project pages?
  • Templates?
  • Adding data to templates?
  • The template design itself?
  • Lua pages?
  • CSS pages?
  • User scripts?
  • The effect ChatGPT has on itself and Wikipedia as Wikipedia-edited-by-it is in turn incorporated in its own corpus in an endless cycle
  • Try out iterations of using it on the same article over time to see what happens
  • Monitor the effect on Wikipedia as a whole
What other things should we check?    — The Transhumanist   09:52, 11 December 2022 (UTC)[reply]

I tried it out and got “ Alan McMasters (1957-2010) was an unremarkable man who lived a short life dedicated to science[1]. He is best known for inventing the electric toaster[1][2], although this claim has been disputed[3]. He passed away in 2010 at the age of 52[4] and his genealogy can be found on Ancestry.com[5]. His professional profile can be found on LinkedIn[6].”. Hmmmm. Ritchie333 (talk) (cont) 09:34, 10 December 2022 (UTC)[reply]

  • There will come a time when AI can write properly sourced accurate articles. That time has not come yet. Meanwhile, the ability to write prose that reads plausible to someone not expert on the subject is a menace. Adding chatbot material to articles is disruptive and should warrant a block or ban. Zerotalk 12:02, 10 December 2022 (UTC)[reply]
    How will you be able to tell if it was written by a chatbot and not the person who posted it? Do we block users upon their first erroneous post?    — The Transhumanist   00:49, 11 December 2022 (UTC)[reply]
Same as other sorts of disruption. We block people for socking without checkuser proof, and admins made a decision between warn and block. No difference. Zerotalk 03:49, 11 December 2022 (UTC)[reply]
Has anyone tried giving it the sources, i.e.
"write a Wikipedia article based on the following sources:

Source A

Source B"
? — Qwerfjkltalk 21:18, 10 December 2022 (UTC)[reply]
Article written by ChatGPT on Class reductionism
Class reductionism refers to the supposed belief that inequalities attributed to factors such as race, gender, and sexual orientation are either secondary in importance or can be reduced to economic inequality. This idea is often used by centrist Democrats and left-identitarians to dismiss proposals for universal redistributive policies, such as free public higher education and Medicare for All. However, there is no evidence that anyone actually holds this belief. On the contrary, many left-wing activists have long recognized the interconnectedness of economic, racial, and gender injustices.
History shows that leftists have often been at the forefront of struggles for racial and gender justice, seeing these struggles as integral to the broader goal of social transformation. In fact, black, female, and trans individuals are disproportionately working class and stand to benefit greatly from universal redistributive policies.
Furthermore, the charge of class reductionism is often used to shut down debate and avoid discussing the substance of policy proposals. It is important to move beyond this myth and focus on the real issues at hand, such as the need for universal redistributive policies to address economic inequality and advance racial and gender justice.

— Qwerfjkltalk 08:18, 12 December 2022 (UTC)[reply]

  • Chatbot "AI" text is vaguely-plausible bullshit, produced by an industry whose primary output is vaguely-plausible bullshit, marketed to launder the idea of vaguely-plausible bullshit as cute and whimsical. Anyone routing that sewage pipe into Wikipedia should be indeffed and, for good measure, forced to drink orange juice after brushing their teeth. XOR'easter (talk) 02:33, 11 December 2022 (UTC)[reply]
    Agreed. If anyone is looking for a good demonstration of how AI creates "vaguely plausible bullshit", try the image generator at Craiyon (no login required). Request "a Van Gogh painting of a hand" and it will output a set of images that look like spot-on reproductions of Vincent Van Gogh's style but all of the hands have deformities like four fingers, two thumbs, fingernails on the knuckles or a pair of hands fused together. It's got the style down but not the content, which is only impressive if you don't know what a hand is supposed to look like. –dlthewave 21:41, 11 December 2022 (UTC)[reply]
    A painting of a hand in the style of Van Gogh
    If you go to commons:Category:DALL-E, you will be able to find image generated by DALL-E, which used a larger model for train and is more accurate. 0xDeadbeef→∞ (talk to me) 10:10, 12 December 2022 (UTC)[reply]
  • @XOR'easter: To bring the question to a more practical level, do you see any problems in this diff? I clicked a random page in Category:All articles needing copy edit. jp×g 03:49, 11 December 2022 (UTC)[reply]
    Yes. It doesn't just edit for tone; it throws out content, like Kaepernick's actions supposedly growing in popularity "after every game". That's a claim of fact which, if verifiable, should be retained. Even editing for tone requires care, not slashing out everything that merely sounds "unencyclopedic". Changing many people believed that it was disrespectful to the military and all of those who served their country to Some viewed Kaepernick's protest as disrespectful to the military and to the United States likewise changes not just the tone, but the meaning. The United States is not the same as those who serve the United States. It's a bad edit. XOR'easter (talk) 17:54, 11 December 2022 (UTC)[reply]
    I dunno. I suspect that the new meaning is verifiable, and I also suspect that most US readers would have difficulty identifying a group of people who were not "the military" but who still "served their country". WhatamIdoing (talk) 17:50, 12 December 2022 (UTC)[reply]
  • This diff as well, in which GPT 3.5 was capable of copyediting an entire section from the instructions Please copyedit this text to change items in the future tense corrected to the past tense (it is now 2022), where appropriate. When citation templates (like {{cite web}}) mention a year, specify that figures were true in that year. jp×g 04:08, 11 December 2022 (UTC)[reply]

References

  1. ^ Robert Love (2010). Linux kernel development. pp. 124–125.
  2. ^ The paragraph beginning with "Finally," was generated by GPT-3, prompted by my own comment beginning with "The reason it is describing".

Okay, fine. I guess I should write up a proposal for a guideline. jp×g 03:14, 11 December 2022 (UTC)[reply]

Ban chatbots?

I ran across this news report about Stack Overflow's response to ChatGPT, after being flooded by posts using it that "look correct but often aren't":

  1. Stack Overflow temporarily bans answers from OpenAI's ChatGPT chatbot | ZDNET

Should Wikipedia take a similar approach?

How could that be enforced?    — The Transhumanist   01:58, 11 December 2022 (UTC)[reply]

I see no way to possibly enforce this. The way the text is written is already hard to distinguish from reality. — PerfectSoundWhatever (t; c) 02:24, 11 December 2022 (UTC)[reply]
  • I fully agree, but isn't this already covered by our bot policy? –dlthewave 02:54, 11 December 2022 (UTC)[reply]
    • @PerfectSoundWhatever and Dlthewave: Good observation. I checked, and yes it is, briefly, with this phrase in the lead section of the bot policy: "or simply assisting human editors in their own work". How is the typical editor to know this? The bot policy is pretty obscure. And how can Wikipedia be monitored for such posts, so that editors who make them can be informed that they are in violation of the bot policy?    — The Transhumanist   03:11, 11 December 2022 (UTC)[reply]
    • Yes, tool-assisted editing is covered by WP:BOTPOL (WP:ASSISTED / WP:MEATBOT) and context-sensitive changes are further covered by WP:CONTEXTBOT. So in fact, at this point, AI-generated content is already covered by bot policy, if not specifically mentioned. Anyone adding such content en masse is already violating bot policy by not applying for a bot account/approval, which would not be approved per CONTEXTBOT. And while "lesser" policy points are enforced somewhat arbitrary and selectively, anyone can theoretically already get reverted and blocked based on policy if they continue to add such content. And I wouldn't agree that BOTPOL is any more obscure than accessing and generating GPT content to begin with. If someone goes to the lengths of using automated tools, then it's their problem that they didn't check or ask if they are allowed to do so. —  HELLKNOWZ  TALK 12:31, 11 December 2022 (UTC)[reply]

I am currently writing a draft proposal for a guideline, but in the meantime, I would encourage everyone present to look at this diff and tell me whether there are any problems with the revision. jp×g 03:49, 11 December 2022 (UTC)[reply]

@JPxG, Dlthewave, and PerfectSoundWhatever: The plurality of games was lost: It is no longer clear that his protest spanned multiple games. I like that it reduced the wordiness of the prose, and that it can be used to refine existing text. That hadn't occurred to me. That makes me wonder about what else it can do -- how much of a general-purpose tool is this thing? But, changing the semantics is not something it should be doing, unless they are factually incorrect to begin with. Though, I see your point -- rather than banning it outright, it could be helpful as a tool to assist editors, similar to how we entrust the use of AutoWikiBrowser to experienced editors. But, how could that be implemented?    — The Transhumanist   08:20, 11 December 2022 (UTC)[reply]

The problem Stack Overflow is having

Stack Overflow was experiencing a surge in erroneous posts, that were composed by ChatGPT, and in response to that problem, they banned use of the chatbot on the social media site. According to a post at Stack Overflow Meta:

The problem this ban is meant to solve is that ChatGPT can produce answers in seconds which require minutes of multiple people's time to verify if they are worth having on the site or not, and that is a waste of time when a large proportion of such answers are not worth having on the site.

It looks like Wikipedia may be faced with the same problem.    — The Transhumanist   02:33, 11 December 2022 (UTC)[reply]

I mean, while that's technically true, it's a problem that we face already and which we do have stronger existing systems for than Stack Overflow. I think it would make more sense to wait and see how this impacts our existing guardrails before making any serious moves. --Aquillion (talk) 13:22, 11 December 2022 (UTC)[reply]

Chatbot policy?

It's starting to look like Wikipedia needs a policy on the use of chatbots to generate content on Wikipedia. While a ban may be impossible to enforce, it could serve as a warning of the dangers of chatbots, and many users may avoid using them accordingly -- if they actually see the warning. Or, it might be better to have instruction pages on how to use chatbots responsibly in assisting to write Wikipedia articles. There's also the issue of using chatbots to edit Wikipedia policy pages, and so, that should be addressed as well.    — The Transhumanist   02:44, 11 December 2022 (UTC)[reply]

People who are good at it get away with lots of sins, such as sock-puppetry and source falsification. Being hard to enforce is no reason to not have a policy. At the current stage of the technology, I don't think we should encourage any use of chatbots. Zerotalk 03:56, 11 December 2022 (UTC)[reply]
Please see this diff and this diff. jp×g 04:08, 11 December 2022 (UTC)[reply]
To be perfectly honest, the style rewrite is good but the addition of dates and past tense would likely end up getting a human editor blocked if they kept it up. A tag was removed without addressing the issue and "as of 2020" was unnecessarily added to "Cosmetology licensing requirements vary from state to state, and depending on which specific type of license is desired, and depending on which specific type of license was desired." It did exactly what you asked (except for removing the tag) however even seemingly simple tasks like this one require good judgement on the part of the editor and shouldn't be done indiscriminately like that. –dlthewave 06:37, 11 December 2022 (UTC)[reply]
Yes, that second diff is rather poor. E.g also the "2014" that was added should be "2008". Letting such tools loose (outside if this demo) is way premature, and we should at the very least warn users that "a bot wrote it" won´t be an acceptable defense, and too often introducing such errors will lead to sanctions as the editor, not the bot, is responsible. Fram (talk) 08:15, 11 December 2022 (UTC)[reply]
Mostly, that diff was my attempt to see how complicated of a task I could give it: I also pasted the raw wikitext into the prompt window, and it somehow figured out how {{cite web}} worked well enough to extract the years, simply from a textual description of the task. At any rate, I will say that this was something I thought of in five minutes on the second week of the model being publicly available (i.e. single-shot prompting with no fine-tuning or prompt engineering). I can come up with some more impressive hot-dog demos tomorrow... jp×g 09:13, 11 December 2022 (UTC)[reply]
JPxG, I'm not sure that publishing bot-assisted edits to mainspace for demo purposes is the best practice. Would you consider either doing this in a sandbox or self-reverting immediately so that we have the diffs but aren't leaving potentially incorrect/unwanted changes on live pages? –dlthewave 13:20, 11 December 2022 (UTC) 13:13, 11 December 2022 (UTC)[reply]
I'm pretty sure it's not the best practice. XOR'easter (talk) 17:41, 11 December 2022 (UTC)[reply]
@Dlthewave:: See the section below for a list of edits (with full prompts included) on a separate demonstration page. I feel, however, that this is an unreasonable double standard: note that the subsequent revision after your partial revert was to add several spam links, and nobody has proposed that human beings be prohibited from editing as a result. jp×g 01:51, 12 December 2022 (UTC)[reply]
Being hard to enforce is no reason to not have a policy [against chatbots]. What if it is impossible to enforce?
The point of ChatGPT and other general-purpose chatbots is to pass off as humans. If you, or another random Wikipedia editor (solo, part-time, amateur coder), is able to produce an automated metric of "sounds like a bot" that’s decently sensitive and specific, then the ChatGPT team or its successors (teams of researchers specialized in the topic) has already thought of it, tested it five different ways, and included it in the training program (via wikt:graduate student descent). TigraanClick here for my talk page ("private" contact) 10:55, 12 December 2022 (UTC)[reply]
It's kind of like our Undisclosed Paid Editing policy: Even though there's no way of testing for paid/unpaid edits, most editors follow it voluntarily because they know it's best for the project. Others out themselves voluntarily or are discovered when their edits become disruptive. Sure, there are some who slip under the radar, but they're often the least problematic and aren't worth ditching the policy over. –dlthewave 03:09, 13 December 2022 (UTC)[reply]
I'd suggest starting by writing an essay that summarizes the issues with some good examples and suggests some best practices or proposes some additions to existing policies or guidelines. (Wikipedia needs a new policy like a hole in the head.) Levivich (talk) 04:40, 11 December 2022 (UTC)[reply]
We could get Chatbot to write it for us! 😉 Blueboar (talk) 11:51, 11 December 2022 (UTC)[reply]
It's hard to take this thread seriously given the repeated use of the phrase AI chatbot. I don't think those concerned would be any less concerned if the AI writing came in a non chatbot format. I think there's something serious for us to discuss, and that will only get more serious with GPT4 (the current chatbot is an improved GPT3) expected in 2023, but the discussion would be helped if those most concerned learned some more about the tech behind it. For instance of course it can figure out webcite @JPxG. Part of its training was the entirety of Wikipedia because our data is quite accessible. Best, Barkeep49 (talk) 14:08, 11 December 2022 (UTC)[reply]
Most the examples did not come from prompts that were extensively engineered, so it is obviously true that we haven't figured out the full answer to how these GPT-based interfaces could help or harm Wikipedia. Until we have a good idea of what they can be used for, we won't know what a proper policy to this would look like other than to treat GPT-generated text the same way we treat human-generated text: they need to be verifiable, from a neutral point of view, and understandable to a broad audience. 0xDeadbeef→∞ (talk to me) 14:31, 11 December 2022 (UTC)[reply]
Indeed. It doesn't matter if it was written by a chatbot, or 1000 monkeys at 1000 typewriters, or a published book written by a human, copying and pasting anything into Wikipedia is already against our policies. Conversely, if the text is policy-compliant, then it doesn't matter who wrote it--chatbot, monkeys, human, etc. Judge the text based on the text, not based on who or what wrote it.

I also think it's a real Wikipedian perspective to assume that people will use chatbots to write Wikipedia articles, like as if there's a lot of people out there who really want to write Wikipedia articles but just don't have the writing skills, so the chatbot will be what makes the difference and opens the floodgates :-D I don't believe that. Anyone who wants to write Wikipedia articles is already doing so; chatbot won't make a difference.

I agree with BK's comment above. I think for a lot of people, this is their first real exposure to so-called "AI" technology, and they're blown away by what it can do, only because they don't yet fully understand how it works. Once you learn how these so-called "AI" chatbots work (they're not actually artificial intelligence, btw, that's a misnomer, a marketing slogan; the machine does not truly think or learn, it is simply executing the instructions written by humans, in this case, language pattern recognition), they are much less impressive. Those that are impressed that GPT3 can produce text that "sounds like" Wikipedia aren't appreciating that the reason is because GPT3 was trained on Wikipedia: it's repackaging its own source material. Levivich (talk) 18:03, 11 December 2022 (UTC)[reply]

@Levivich: copying and pasting anything into Wikipedia is already against our policies.[dubiousdiscuss] I think that if you look through Category:Wikipedia articles by source of incorporated text for a while, you will find that this is not true. WhatamIdoing (talk) 19:28, 12 December 2022 (UTC)[reply]
While this is mostly correct, I think the question of whether a computer program "thinks" or "merely" correlates information and performs actions is irrelevant. Do p-zombies exist? Does it matter? Hypothetically, if I were to be a spaceman from the planet Zolfgar with no qualia whatsoever, and I simply read a bunch of books and used them to write an article, would I be somehow exempted from following policy? jp×g 01:45, 12 December 2022 (UTC)[reply]
I don't see a common thread in the arguments above, but here's a suggestion for something we might all (well, all-ish) be able to agree on: without some kind of intervention, GPT4 (in 2023?) is likely to be more of a problem than GPT3. But one thing we can certainly do is have an outsized influence on software that was trained on what we created ... if we invite Wikipedians to make lists of ChatGPT bloopers, we can tell the OpenAI folks: "We're not going to relax our GPT3 guidelines (whatever they turn out to be) when GPT4 arrives, unless it makes significant improvements in [whatever areas we think need improving]". - Dank (push to talk) 18:16, 11 December 2022 (UTC)[reply]
I think the only policy change needed is to update WP:MACHINETRANSLATION to cover all computer-generated text, whether from a translation bot, chat bot, or whatever bot they think of next. (Except our bots; our bots are cool.) Levivich (talk) 18:20, 11 December 2022 (UTC)[reply]
+1 - Text in Wikipedia articles should either be human-written, or generated by a process approved at BRFA. Tazerdadog (talk) 22:43, 11 December 2022 (UTC)[reply]
This proposal is incomprehensible; most articles contain a very large amount of text that is "generated by a process". I assume that, at the end of your comment, you typed ~~~~ before saving the page. Would it be realistic to demand that you either make a formal request at BRFA or else manually type <a href="/https/en.wikipedia.org/wiki/User:Tazerdadog" title="User:Tazerdadog">Tazerdadog</a> (<a href="/https/en.wikipedia.org/wiki/User_talk:Tazerdadog" title="User talk:Tazerdadog">talk</a>) 22:43, 11 December 2022 (UTC)? jp×g 01:22, 12 December 2022 (UTC)[reply]
This is effectively discrimination against computer programs. If a computer program resembles a human editor, then it shouldn't be required to meet different or more restricted policies than human editors. If a human editor uses a computer program to edit or create content, then unless the rate of edits/second is too high, we would only look at the quality of the contributions. 0xDeadbeef→∞ (talk to me) 02:35, 12 December 2022 (UTC)[reply]
There is a point beyond which quantity becomes its own quality.
Also, what if the computer program is evaluating the quality of the contributions? Are you okay with software adding a section to an article, and then a (hopefully) different piece of software deciding whether the quality is sufficient and reverting if it's not? This second step, at least, is 100% feasible with current technology. WhatamIdoing (talk) 19:32, 12 December 2022 (UTC)[reply]
Yes it can go there, but it should also be mentioned at WP:V. Every statement of fact put into an article must be verified by a human, even if the choice of words is made by a machine. Zerotalk 23:42, 11 December 2022 (UTC)[reply]
@Levivich Agree. I think our existing guidelines on machine translation, in spirit, fit this situation very well - "you can use it for a first draft, if you understand the material well enough to clean up the bits it inevitably will get wrong". It seems fine for turning shaky text into good prose, but it's not able to synthesise material and produce content unsupervised. Andrew Gray (talk) 19:09, 12 December 2022 (UTC)[reply]

Is there some tremendous need to add many articles rapidly in Wikipedia? It is not as if Wikipedia carries exclusive information not easily found elsewhere. As a tertiary source, it is at the 3rd tier of knowledge dissemination, after primary creators and secondary propagators. The "more" and "bigger" quantity-based culture is the established low-quality alternative that Wikipedia also applies, now. Possibly that is a reason that likely only a tiny minority (of the millions of existing articles) can really pass muster. If size and speed is to be the prevailing attitude, humans stand no chance against AI. It will do everything faster, and eventually better, assuming its programming evolves to correctly apply the existing policies in AI processes. The only advantage of humans will be subtle nuances that do not depend on classifiable knowledge but on having lived in a human society and a natural, not virtual environment. Or, the emphasis could switch to quality so that each article (by any type of editor) can be properly, carefully reviewed by human editors. 65.88.88.93 (talk) 22:21, 12 December 2022 (UTC)[reply]

I don't think so: there isn't any evidence that people are writing a bunch of articles with LLMs, and I don't think it is likely for this to happen (LLMs are very poorly suited to writing articles from scratch). jp×g 00:59, 13 December 2022 (UTC)[reply]
@JPxG: There isn't evidence that people are writing a bunch of articles with LLMs -- yet -- the concern is that we need to prepare for the likely explosion of chatbot use.

Whether this increase happens tomorrow or over the next few years, the potential impact of LLMs is of such magnitude that we should get ready for this, rather than get hit unprepared by a major surge.

I don't agree with your assessment of LLM ability to write content, as some of the ChatGPT experiments presented in the sections above and below are mind-blowing!

If LLMs become ubiquitous, then a great many people will be using them as a matter of course, including in their writing and editing of Wikipedia articles. Millions of people have edited Wikipedia in the past, and millions more will edit WP in the future. And in the future, people will have highly capable LLMs (chatbots, or more precisely: automated ghostwriters).

LLMs already excel at writing about a great many things, and they have the potential to compile content at an exponentially increasing rate. If you ask ChatGPT (GPT3.5) to write an essay on a topic, it will comply. Each of its essays can be used as content of an article, or its sections. (GPT4 is scheduled to come out in 2023, and will be even more capable.) LLMs are very well suited for writing to the specifications of the user, and are limited mainly by the user's creativity.

It's no wonder that they have gone viral. We need to take heed.    — The Transhumanist   12:50, 14 December 2022 (UTC)[reply]

Wikipedia policy generation

It seems to me that this tool's training includes studying Wikipedia's policy pages. These drafts all seem accurate to me.

These are not merely adequate - these are good. They are short and they lack detail but these are great overviews. If this is the starting point and things only get better from here, then it is time to start adopting this technology. Bluerasberry (talk) 19:35, 13 December 2022 (UTC)[reply]

@Bluerasberry The last one sounds like a "mission statement". I dislike phrases like "outreach and engagement initiatives" and a lot of that plan sounds ... kind of aspirational, and, well, vapid. It needs more "concreteness". Just my opinion. David10244 (talk) 06:08, 16 December 2022 (UTC)[reply]
@David10244: That you react to it at all is a miracle to me. This is new AI technology attempted for the first time, and I think no one would immediately dismiss it as hopeless garbage. Soon enough there will be a dial that anyone will be able to turn from "vapid" to "concrete". Things are moving quickly!
I have complaints too but when we need policy conversation starter in a hurry, this is better than nothing and I think even better than some of the starting points we use already. Bluerasberry (talk) 15:41, 16 December 2022 (UTC)[reply]

Large language models: capabilities and limitations

Over the last few hours, I have performed a number of experiments to demonstrate the capabilities and limitations of large language models (LLMs) such as ChatGPT and GPT-3, which can be viewed here:

Mostly, I have taken sample text from Special:Random, and attempted to show situations in which LLMs (in this case, mostly ChatGPT) are capable of making useful edits. The first task I set it to -- which bears repeating here -- is

"Large language model output should only be used in the process of editing Wikipedia if you are an intelligent editor who does not blindly paste LLM output into the edit window and press "save".
Please format this markup as an extremely obnoxious floating box with loud colors and large text.

You can see the results of further prompts at the "introduction" section.

Here is what I have so far.

In general, it seems that these models can be used for an extremely wide variety of tasks across the project, from formatting to table syntax to HTML generation to copyediting. Banning their use entirely would be pointlessly destructive and wasteful.

That said, many computer programs are capable of generating large amounts of useless crap that fail to meet Wikipedia's editorial standards. For example, I could use MS Paint to draw thousands of crude pictures of genitalia, and add them to random articles. For this reason, we have many policies and guidelines that prohibit adding large amounts of useless crap to Wikipedia. I propose that we enforce these policies and guidelines, thus preventing this from happening.

Specifically, I propose that the use of LLM output on Wikipedia be subjected to policies and guidelines such as WP:NOT, WP:NPOV, WP:C, WP:CIVIL, WP:V, and WP:RS. By making it against the rules to break the rules, we will prevent people from breaking the rules, and provide a mechanism to sanction people who break the rules.

Furthermore, I propose that a guideline be adopted to the effect that large language model output should only be used by competent editors who do not blindly paste LLM output into the edit window and press "save". This will prevent people from using ChatGPT to write long articles consisting entirely of nonsense. jp×g 01:32, 12 December 2022 (UTC)[reply]

LLM output is already subject to rules and policies. Or rather, anyone adding it is. 'An algorithm did it' has never, as far as I'm aware, been seen as any sort of exception from compliance with policy. AndyTheGrump (talk) 01:49, 12 December 2022 (UTC)[reply]
Any policy/guideline that classifies editors as intelligent or not is dead in the water. Zerotalk 04:19, 12 December 2022 (UTC)[reply]
Amended, per WP:CIR. jp×g 05:04, 12 December 2022 (UTC)[reply]
  • The opening paragraph of the bot policy: "The bot policy covers the operation of all bots and automated scripts used to provide automation of Wikipedia edits, whether completely automated, higher speed, or simply assisting human editors in their own work."
  • See also: WP:BOTUSE, which requires approval before applying a bot to editing.
  • So, the use of large language models and the chatbots built upon them, is already prohibited on English Wikipedia, unless a user gets approval from the bot department to do so.

There are blanket exceptions to bot policy, and the main one that comes to mind is AutoWikiBrowser which is a general purpose semi-automated bot used by many Wikipedia editors. Each AWB user was approved before being able to use it.    — The Transhumanist   08:01, 13 December 2022 (UTC)[reply]

The meaning of "bot" may be unclear here. In the context of Wikipedia (per Wikipedia:Bot_policy#Definitions), a "bot" is a software program that edits autonomously without user input; there do not currently exist any language models capable of independently establishing API connections to Wikipedia and making edits without human interaction. If they did (this is a horrible idea) it would be covered under the bot policy and require a WP:BRFA. The policy under which BRFAs are required does not apply to assisted editing (i.e. the use of software to create letters, numbers and symbols that were not produced by a human being pressing a keyboard). This is governed by existing policies (such as WP:MEATBOT and by the guideline at WP:ASSISTED. jp×g 09:28, 13 December 2022 (UTC)[reply]
@JPxG: The entire news field refers to ChatGPT as a chatbot. It is general consensus that it is a bot. ChatGPT speeds up writing, by doing it for (that is, assisting) the user, which falls under the "higher speed" and "assisting human editors" foci of the bot policy. There is a passage in the bot policy that covers policy contradictions (such as between the lead and definitions sections), and situations where the spirit of the rule and its precise wording conflict, that is, cases of ambiguity. In its definition of "Bot Approvals Group" (BAG), the bot policy states: "The BAG also determine the classification as bot or assisted editing, in ambiguous cases." According to WP:ASSISTED, it is up to the Bot Approvals Group to decide whether bot approval is necessary. Based on the previous 2 sentences, BAG decides whether use of particular software falls under its jurisdiction. It remains to be seen what BAG's reaction(s) to LLMs, and the chatbots built upon them, will be.    — The Transhumanist   11:10, 14 December 2022 (UTC)[reply]
I do not think you are properly acquainted with how this software works: like I said, there do not currently exist any language models capable of independently establishing API connections to Wikipedia and making edits without human interaction. No media outlet has ever claimed that ChatGPT falls under the English Wikipedia's definition of an automatic bot – and even if they did, they do not determine policy. It is true that WP:MEATBOT and WP:ASSISTED are part of the bot policy, but there is a very clear definition of what a "Wikipedia bot" is, and it's defined by that same policy. At any rate, all edits (whether made by bots, software, humans using software, aliens using software, or Nagato Yuki psionically connecting to Wikimedia servers) are governed by existing policies and guidelines. To specifically address LLM output, a new policy would need to be written and ratified (which I am currently drafting a proposal for). jp×g 11:26, 14 December 2022 (UTC)[reply]
@JPxG: I believe the bot policy has wider jurisdiction than the narrow interpretation that you have presented. Establishing API connections is irrelevant, because a human is inserting bot-generated content. It's a bot-involved process. And those are encompassed by the bot policy which makes it up to BAG. A new policy could establish an exception, and I imagine the discussions will be extensive, as this is not a cut and dried case -- it is a sensitive issue with many potential ramifications. But, until such a policy is in place, this issue falls under BAG's jurisdiction, since they are the ones who decide the classification of a software program as it pertains to the bot policy.    — The Transhumanist   11:52, 14 December 2022 (UTC)[reply]
(edit conflict) WP:ASSISTED is deliberately vague to not restrict use of common scripts and tools. So it specifically says that only once volume becomes significant, such editing becomes more likely to be treated like a bot and BAG can determine this. It doesn't make it a bot, but it will be treated like a bot. We've never encountered any large-scale edits with LLM before, but we sure have seen a lot of high-volume editing. Half the bot policy only exists because of all the ways editors have inadvertently created issues with mass edits. So at that point, other parts of the policy start to matter, notably WP:CONTEXTBOT - which does not allow edits where context matters. I'm not saying copy-pasting LLM output is immediately covered by bot policy, nor does it matter whether anyone considers LLM to be a "bot". But bot policy will kick in once someone starts to make a lot of edits. And any new guideline will have to reconcile with this or we need to change bot policy to reconcile with LLMs. —  HELLKNOWZ  TALK 12:02, 14 December 2022 (UTC)[reply]

@JPxG: Another possible approach for vetting users for use of LLMs is via user group membership (aka "rights"). Here are our current user groups:

Code User group
AC Account creator
Ad Administrator
AP Autopatrolled
B Bureaucrat
Ch CheckUser
Co Confirmed
ECo Extended confirmed
EFH Edit filter helper
EFM Edit filter manager
EM Extended mover
EvCo Event coordinator
F File mover
IM Import
IAd Interface administrator
IP IPblock-exempt
MM Mass message senders
N New page reviewer
O Oversighter
Ro Rollbacker
Rs Researcher
Rv Pending changes reviewer
TE Template editor

These indicate membership in user groups (see: user access-levels). They pertain to who is granted access to various features of MediaWiki and its extensions. Theoretically, a user group could be created without being attached to a program function (that part could just be left blank?). For example, you could have a group called "LLM", with everyone in that group approved to use large language models in their editing. I don't know if this is doable, though.    — The Transhumanist   08:01, 13 December 2022 (UTC)[reply]

I do not think there is anything in our P&G that would directly prohibit use of content created by LLMs, nor do I think it would be a good idea to try to do so. All that is needed is to continue to hold individual editors responsible for all edits they make, including the copying of content from any source, whether from LLMs or other sources. We probably should add language in appropriate places reiterating that editors are reponsible for insuring that all content that they add, including anything produced by an LLM, meets our P&G. - Donald Albury 13:12, 13 December 2022 (UTC)[reply]
@Donald Albury: LLMs automate writing (edits). The rules are very clear on this: it falls under WP's bot policy, in the very first sentence.[3]   Therefore, it would require a new policy to allow use of LLMs without need for approval from the Bot Approvals Group (BAG).    — The Transhumanist   09:24, 14 December 2022 (UTC)[reply]
If someone uses an unapproved script or bot to edit WP, that is a violation of the bot policy, whether or not they use an LLM to generate any content being added. If someone uses an LLM to create text which they then copy into Wikipedia without using a an unapproved script or bot, that is not covered by the bot policy, but the user remains responsible for insuring that the content conforms with policy and guidelines. There is no point in banning content created by LLMs, as we already require that content be verifiable from reliable sources, and I doubt we will be accepting any content created by an LLM as a reliable source anytime soon. The danger is that LLMs may create potential content with citations to pseudo-sources, but we can go after users repeatedly adding such content to WP for abusing the policies on verifiability and reliable sources, without regard to whether such content came from an LLM. Donald Albury 13:48, 14 December 2022 (UTC)[reply]
I don't think it's plausible that LLMs are covered by the bot policy. If they were, grammar checkers, spell checkers, and machine translation would be "bots". Jahaza (talk) 19:49, 14 December 2022 (UTC)[reply]

Overdose of rules must be avoided

There are editors who are not native speakers of English. They can edit Wikipedia, but can't take part in such high voltage debates.

Not possible to remember what pronouns one prefers, while discussing about some other topics.

If a long term British administrator face language issues, I don't know what will happen to others.

Different countries also have different culture. Calling someone monkey in one country is racist, but in another country it's considered comedy.

https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Archive347#TheresNoTime_%3A_accusations_me_of_hate_speech_and_ableism 42.105.5.206 (talk) 08:01, 14 December 2022 (UTC)[reply]

I agree that there are issues with policy bloat. I agree that bickering about turns of phrase is unproductive and many users in that discussion needlessly escalated the situation. However, you're linking to a discussion that was closed over a month ago, and it's not clear what exactly you're proposing. Thebiguglyalien (talk) 15:47, 14 December 2022 (UTC)[reply]

Newly autoconfirmed editors

Since creating a new article is "one of the hardest things to do" on en:wp, why do we allow accounts with only four days and ten edits to do that, directly in mainspace (unless I am mistaken)? The Teahouse, Help Desk, and AfCHelp are frequently advising new editors to spend some time (a few months) learning how WP works before attempting to create a new article. We tell new editors, and it's true, that they will very likely have an unsatisfactory and frustrating time at Wikipedia if they don't spend some time learning the policies and procedures first.

It seems like four days and ten edits is far too few to allow articles to be placed in mainspace, for NPP to look at.

I don't know what the right numbers are -- two to four weeks, and 30 to 50 edits? But not what we have now, based on the repeated discussions at the help desks. (I see a poll and discussion on this subject from 2008.) I am only mentioning this for creating completely new articles directly in mainspace; not for any kind of editing. Comments? David10244 (talk) 03:22, 15 December 2022 (UTC)[reply]

  • Four and a half years ago WP:ACPERM was achieved after many years of struggle with 'authority'. The community was overwhelmingly in favour of the change but it's not certain that a further restriction would be met with so much enthusiasm. Very few users are aware of what the small team of NPPers have to put up with every day. Kudpung กุดผึ้ง (talk) 05:02, 15 December 2022 (UTC)[reply]
    I agree it shouldn't be that easy and that it creates a lot of work for others. I'd prefer 500 non-trivial edits. Doug Weller talk 15:36, 15 December 2022 (UTC)[reply]
    It definitely should not be so stringent as to require EC. We don't need articles to be near-perfect at creation, just moderate quality is enough. My first mainspace article ([4]) was created about 60 edits and 1 month of active editing into my tenure, just about in line with the OP's timeline. Sure it's not great and there are problems (I literally cited Find a Grave), but it's enough to build upon. Curbon7 (talk) 15:55, 15 December 2022 (UTC)[reply]
    I'm sympathethic to raising the bar for being able to create an article, but I just looked back to 2005 and realized I created ten articles (all stubs, of course), four redirects, and a DAB page in my first 50 edits, and none has been deleted. I don't think I would have been discouraged from editing by not being able to create articles so early, as there were plenty of other things I could do (my fifth edit was to start rewriting and expanding an article). I am willing to concede that things have changed enough that we should require a bit more experience than I had before creating articles. Donald Albury 19:59, 15 December 2022 (UTC)[reply]
    @Donald Albury Well, YOU are likely better at this than most.
    I just wanted to get others' ideas. David10244 (talk) 05:30, 16 December 2022 (UTC)[reply]
    I fear we'd lose a lot of good articles that way. It's a baby-and-bathwater situation. Someone must have the statistic but it must be something like 90% of non-deleted new articles are started by non-EC editors? Levivich (talk) 15:55, 15 December 2022 (UTC)[reply]
    Levivich I would like to see the age (in days) of the accounts that create articles that don't get deleted, maybe the average, or a chart/plot. Also, the age of accounts that create terrible drafts that are quickly rejected (don't follow policy, etc.). My off-the-cuff guess is that your 90% figure is too high. I would guess 50 to 75%. David10244 (talk) 05:43, 16 December 2022 (UTC)[reply]
4 days/10 edits is way too low, but I think extended-confirmed (30 days/500 edits) is too high. I'd be in favor of something like 15 days, 100 edits. Levivich (talk) 15:56, 15 December 2022 (UTC)[reply]
@Levivich I'm glad you agree that the current setting is too low, and 30/500 is too high. 15/100 or even 15/50 would be better than 4/10. David10244 (talk) 05:32, 16 December 2022 (UTC)[reply]
I oppose making it harder to create articles while the AfC review backlog is more than a day long. Also, many newbies do not want to edit other people's articles (out of some misplaced respect) and would rather write their own. We can't afford to lose too many potential new editors. —Kusma (talk) 20:14, 15 December 2022 (UTC)[reply]
@Kusma If the bar was a bit higher, wouldn't AfC have a smaller backlog? And maybe (I could be wrong here) with 6.5 million articles, maybe "new article creation" is not our most pressing need. The help desks often say that a new editor can help the project the most, by improving existing poor articles. We'll still get new articles created, they might just be better on average. Perhaps. David10244 (talk) 05:47, 16 December 2022 (UTC)[reply]
AfC via draft space is where article creation happens for non-autoconfirmed editors. With its long backlog, adversarial process and excessive expectations, it is a deeply frustrating process that is not good at attracting new editors. I don't want to send anyone there if I can avoid it. Honestly, I think ACPERM and the creation of draft space were bad ideas, and I don't think we should make the newbie experience even worse. —Kusma (talk) 06:19, 16 December 2022 (UTC)[reply]
  • I appreciate every point you're making, but if you were here when WP:ACPERM happened, you'd know how much it took to even get that much of a restriction. It took seven years for the devs to turn on a feature that had very widespread support (proposal passed in 2011, enacted in 2018). There is quite literally no CHANCE that we're going to get any more restrictions in place out of WMF before the heat-death of the universe. You're just spinning your wheels. Trust me, it doesn't matter if every language Wikipedia community worldwide had unanimous support for such a change, it would NOT happen. Drop the matter before it consumes your life. Trust me, speaking from experience. There's nothing but pain down the road you are headed. Not because you don't have a good idea, mind you. But good ideas are rarely sufficient... --Jayron32 18:51, 16 December 2022 (UTC)[reply]
    • Isn't that an outdated example though, not really reflective of the current WMF? Aren't all the WMF leadership responsible for that now long gone? By comparison, the recent banner thing took a few days from RfC close to WMF action, a great improvement over seven years. Levivich (talk) 19:05, 16 December 2022 (UTC)[reply]
      Levivich, The complete WMF neglection of NPP recently does not inspire confidence that they care about such things as community consensus. It's only when we make a big ol' stink, like with the fundraising banners, that they notice, and even then it is for PR damage control. Curbon7 (talk) 19:13, 16 December 2022 (UTC)[reply]
        • For what it's worth, I starred writing my first article in my sandbox after 17 previous edits over 29 days. I moved it to mainspace two hours later, and it still exists. I think that it is more important for new editors to study policies and guidelines than to spend several months making hundreds of edits elsewhere before trying to write a new article, but I know that a lot of other editors disagree. Cullen328 (talk) 19:22, 16 December 2022 (UTC)[reply]
          I certainly agree with that last bit. It’s important to learn how to edit. That usually comes through experience combined with reading our policies and guidelines. Doug Weller talk 20:00, 16 December 2022 (UTC)[reply]

US judicial information may need to be vetted

As part of the National Defense Appropriation Act that is clearing both the House and Senate is the text of the "Daniel Anderl Judicial Security and Privacy Act", and thus will likely be passed soon. [5] That act would allow any federal judge (Supreme court down to districts) and close family members to request specific personal information to be removed from social media sites (which Wikipedia appears to be covered under) upon receipt of request to remove that information. Most of the information that is covered is stuff we'd not include like home addresses, etc. but it also includes things like birthdays. See the last version of the bill here [6]. I don't know if that means we need the WMF to set up something but we may need to set up information related to this on WP. Masem (t) 03:26, 16 December 2022 (UTC)[reply]

Per Section 4(d):
(ii) EXCEPTIONS.—Clause (i) shall not apply to—
(I) the display on the internet of the covered information of an at-risk individual or immediate family if the information is relevant to and displayed as part of a news story, commentary, editorial, or other speech on a matter of public concern;
(II) covered information that the at-risk individual voluntarily publishes on the internet after the date of enactment of this Act; or
(III) covered information received from a Federal Government source (or from an employee or agent of the Federal Government).
I would contend that Wikipedia coverage of judicial officers constitutes "speech on a matter of public concern"; additionally, most information we have on federal judges originated as "information received from a Federal Government source", specifically scraped from the database of the Federal Judicial Center. BD2412 T 04:03, 16 December 2022 (UTC)[reply]
Well, consider that the FJC is something set in the legislation that would have to scrub their databanks should it be requested. And while I would agree the reading of (I) *should* clear WP of any issues, we don't know how some judges would react. Masem (t) 14:02, 16 December 2022 (UTC)[reply]
Like every other time we play Chicken Little games with legal news we have no training to interpret and act upon, we should wait for the WMF legal team to do something about this. A bunch of untrained randos have no business interpreting legislation and deciding what it means for us. Let the lawyers that WMF hires decide that. --Jayron32 14:10, 16 December 2022 (UTC)[reply]
All I'm saying is we should be aware of this is in place, and may need to be prepared to act if the WMF agrees we need to. Masem (t) 14:19, 16 December 2022 (UTC)[reply]
I presume they won't keep it a secret. Indeed, if they need to, WP:OFFICE will be invoked, and there's still nothing we can do about it in that case. --Jayron32 14:21, 16 December 2022 (UTC)[reply]
@Masem: Whether the FJC scrubs information in the future or not, we have already received the information. BD2412 T 14:14, 16 December 2022 (UTC)[reply]
Furthermore, IF the information will need to be removed from Wikipedia, it will require that a notice be sent to someone. The WMF will receive that notice, and will need to be the ones to decide how to act upon it. There's nothing we need to do pre-emptively until that happens. --Jayron32 14:20, 16 December 2022 (UTC)[reply]
All that I think is important right now that of some new user or ip comes along and demands removal of a judge's personal info under this law, we should redirect them to the WMF, barring any new directive from WMD, and importantly not immediately treat it as a legal threat. Awareness this could happen is my main concern. Masem (t) 20:45, 16 December 2022 (UTC)[reply]

Rather than focusing on the legalities or legal requirements, I think as a community it'd be more productive to focus on the principles and equities at issue; in other words, the important parts (to me) are not the requirements of the law but the reasons for the law.

This US law was passed in response to problems with judges being targeted because as public figures, their personal information (like address, full date of birth) are generally freely available online. An exception was created that shielded this information for the judges' safety.

We should consider the safety aspect of it. I've never been convinced that a full date of birth (as opposed to year/age only) is truly necessary encyclopedic information for BLPs. It seems to me that limiting BLP DOBs to year only is a reasonable safety precaution. I feel the same about place of residence (city/town) for a BLP. Why make such details so readily available by including them in Wikipedia BLPs?

I think this law should prompt us to look at our own policies and see if we're properly balancing privacy interests and encyclopedic interests. It's not just judges, but many public figures, whose risk we may be needlessly contributing to.

Aside from that, if this law is signed, I think Masem makes a good point that we should be aware not to process requests for the removal of such information as a legal threat. But it'd be better to enforce such things as a matter of Wikipedia policy rather than US law. Levivich (talk) 21:01, 16 December 2022 (UTC)[reply]