Jump to content

User talk:GreenC/2017: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Line 165: Line 165:


I'd like to see ''Template:Cite IETF'' documentation and parameters get fixed. Alternatively I'd like to see it deprecated in favor of [[Template:Cite web]], but for now I'll keep using ''Template:Cite IETF'' as it seems to be more semantic. [[Special:Contributions/80.221.152.17|80.221.152.17]] ([[User talk:80.221.152.17|talk]]) 13:22, 13 April 2017 (UTC)
I'd like to see ''Template:Cite IETF'' documentation and parameters get fixed. Alternatively I'd like to see it deprecated in favor of [[Template:Cite web]], but for now I'll keep using ''Template:Cite IETF'' as it seems to be more semantic. [[Special:Contributions/80.221.152.17|80.221.152.17]] ([[User talk:80.221.152.17|talk]]) 13:22, 13 April 2017 (UTC)

:I guess a workaround is to use the standalone archive template {{tlx|webarchive}}. Since {{tlx|Cite IETF}} is only used in 181 articles and programming a fix for it will be difficult I'll tell my bot to ignore the template for now. -- [[User:Green Cardamom|<font color="#006A4E">'''Green'''</font>]][[User_talk:Green Cardamom|<font color="#009933">'''C'''</font>]] 14:19, 13 April 2017 (UTC)

Revision as of 14:19, 13 April 2017

Happy happy!

Happy New Year!
North America1000 01:42, 1 January 2017 (UTC)[reply]

Why?

[1] It would help to provide edit summaries when you're adding a bunch of cite templates to the ignore list so suddenly.—CYBERPOWER (Around) 13:36, 9 January 2017 (UTC)[reply]

Well I had a late night :) But we had just had a discussion on phab, there are edits where these templates caused problems, but in retrospect it may not be the template fault but something else. What do you think?

-- GreenC 15:14, 9 January 2017 (UTC)[reply]

I'm asking why you'd put them in the ignore list and note the cite list.—CYBERPOWER (Chat) 16:40, 9 January 2017 (UTC)[reply]
Yeah that's ok I'm probably misunderstanding how it works. The templates {{cite court}} and {{cite act}} don't support |archiveurl= / |archivedate=. Same with {{Internet}} and {{Website}}. Since they were generating errors I thought it made sense to disable them. What kind of templates should go in the cite list? -- GreenC 17:17, 9 January 2017 (UTC)[reply]
Generally templates that support the url, accessdate, archivedate, and archiveurl parameters. v1.3 will be disabling the converting and tagging of URLs that are within a template, while still allowing them to be directly replaced with an archive.—CYBERPOWER (Chat) 17:56, 9 January 2017 (UTC)[reply]
Excellent. If possible I'd like to leave a comment in the .js to that effect, so the cfg is self-documenting - would that be possible? JavaScript comments. I don't want to break the parser though if it doesn't expect comments. -- GreenC 18:15, 9 January 2017 (UTC)[reply]
I don't think comments work for the PHP JSON Decoder. When a comment was added by another admin, it broke the bot.—CYBERPOWER (Chat) 18:34, 9 January 2017 (UTC)[reply]
Oh forgot. How about a top-level entry called "documentation:" before "link scan:" with a value of "Documentation for this page at User:InternetArchiveBot/Dead-links.js/doc" -- GreenC 19:13, 9 January 2017 (UTC)[reply]
I doubt the page would take it since it's told to only accept JSON.—CYBERPOWER (Chat) 19:25, 9 January 2017 (UTC)[reply]
It's JSON. Might need to escape some characters not sure. Or if you mean the /doc path it can reside anywhere. -- GreenC 19:33, 9 January 2017 (UTC)[reply]

RfA

Have you ever considered running for adminship? Pinging User:Ritchie333.—CYBERPOWER (Chat) 22:20, 9 January 2017 (UTC)[reply]

This is the same Green Cardamom that's been running that Green C bot I've noticed around the place .... mmm, will have to see how much havoc (if any) that's caused. Firstly, read Wikipedia:Advice for RfA candidates then User:Kudpung/RfA criteria (which is the basic criteria everyone should start off with, though by no means the only one) thoroughly. When you've done that and understand everything in it, pop back and I'll see if you are suitable. Ritchie333 (talk) (cont) 22:25, 9 January 2017 (UTC)[reply]
User:Ritchie333, thanks for the offer, I'm going to pass for right now, but will keep it in mind for the future. I probably would do alright no skeletons but no need for access to the toolset at that moment. One day I'd like to create a new tool for admins that functions as a rap sheep for problematic users, searching out and displaying on a single page points of trouble (eg. history of username changes, intersections of those names in SPI, etc..) .. automating some of the techniques admins currently do manually to research editors, but that's down the road, right now doing a lot of work with fixing link rot eg. havoc :) -- GreenC 14:40, 11 January 2017 (UTC)[reply]
I'm sorry, that is the wrong answer, an RfA will now be forced on you as punishment. :p—CYBERPOWER (Chat) 14:57, 11 January 2017 (UTC)[reply]

You're welcome

Re: your compliments on the Hlj user page, you're welcome. Hal — Preceding unsigned comment added by 2601:646:C200:934D:F14B:A1A3:9F1C:F1AF (talk) 16:45, 15 January 2017 (UTC)[reply]

New Wikiproject!

Hello, Green Cardamom! I saw you recently edited a page related to the Green party and green politics. There is a new WikiProject that has been formed - WikiProject Green Politics and I thought this might be something you'd be interested in joining! So please head on over to the project page and take a look! Thanks for your time. Me-123567-Me (talk) 21:38, 22 January 2017 (UTC)[reply]

ARCHIVEIS (archive.fo) suggestion for Wayback Medic

WP:ARCHIVEIS usually gives users a short URL to its archives, but like WebCite, there is actually a "full form" URL available as given in its FAQ. Such "full form" URLs are required by an Rfc.

A quick way to obtain a "full URL" from a short one can be:

  1. Fetch the short URL.
  2. Let dom be the fetched DOM tree.
  3. Let rel be the result of running the selector head > link[rel=bookmark].
  4. Return the "href" property of rel. Optionally replace "archive.today" with the site name found in the initial URL.

--Artoria2e5 contrib 16:10, 3 February 2017 (UTC)[reply]

Hi, good idea. I tried and it worked, dom tree is better than web scraping. Yes replacing archive.today with the original is good too. Currently the bot adds long-form for Webcitation.org but not archive.is .. should be easy to adapt the bot's webcite module for archive.is -- GreenC 16:50, 3 February 2017 (UTC)[reply]
@Green Cardamom: Found a caveat with the "bookmark" rel: instead of giving an https protocol as enforced by its archive.is redirection (yes, they do a redirection back…), the rel gives an http link. Shouldn't be hard to work around, I guess. (PS: While http://archive.today takes you to https://archive.is, http://archive.is redirects to https://archive.fo. Weird.) --Artoria2e5 contrib 02:11, 9 March 2017 (UTC)[reply]

Wayback template

Will there be a bot that will convert all existing usage of "Wayback" to the new "Webarchive"? I'm happy to switch to using the "Webarchive" template but it would be a lot of work going back and correcting existing/original usage. WhisperToMe (talk) 15:20, 5 February 2017 (UTC)[reply]

You're timing is excellent because the {{wayback}} merge completed last night. It was a module of WaybackMedic and converted about 126k plus {{webcite}} and {{cite archives}}. -- GreenC 15:29, 5 February 2017 (UTC)[reply]
Until the template is red-linked (soon) it displays this if it's used:
Template:Wayback
-- GreenC 15:34, 5 February 2017 (UTC)[reply]
Great! Thanks for the info! WhisperToMe (talk) 15:47, 5 February 2017 (UTC)[reply]

In case you didn't see it, there was a message for you at TFD. I deleted it since it's been a week, but here's an old version with it. Primefac (talk) 18:03, 9 February 2017 (UTC)[reply]

Ahh I thought you meant delete/merge any new instances of the template. -- GreenC 18:55, 9 February 2017 (UTC)[reply]

HTTP→HTTPS for Wayback Machine

It seems like your bot is not converting HTTP→HTTPS for Wayback Machine (anymore?). Did you turn that feature off? If so, should I let my bot do these conversions? --bender235 (talk) 17:03, 13 February 2017 (UTC)[reply]

InternetArchiveBot does it.—CYBERPOWER (Be my Valentine) 17:44, 13 February 2017 (UTC)[reply]
Is it still running? Doesn't appear so. --bender235 (talk) 17:55, 13 February 2017 (UTC)[reply]
@Bender235: Please see this.—CYBERPOWER (Be my Valentine) 18:03, 13 February 2017 (UTC)[reply]
I noticed. So is HTTP→HTTPS for Wayback Machine still a feature of IAbot? --bender235 (talk) 18:39, 13 February 2017 (UTC)[reply]
Yes. All Wayback HTTP links are flagged as invalid and get forcibly converted to HTTPS.—CYBERPOWER (Be my Valentine) 18:40, 13 February 2017 (UTC)[reply]

That was a special case where it moved the URL from one argument to another and it bypassed the normal routine that would have normalized the URL. It is an oversight and I'll fix it. -- GreenC 19:21, 13 February 2017 (UTC)[reply]

Great. Thanks. --bender235 (talk) 22:19, 13 February 2017 (UTC)[reply]

Custom search engines

Does the guide here still work? Wikipedia:Newspaper search engines by country or do you recommend a new method. All the ones listed are 404. Is that just because the Drive folders are no longer public?—አቤል ዳዊት?(Janweh64) (talk) 05:07, 23 February 2017 (UTC)[reply]

Hi (Janweh64), it appears Google Drive, Docs and Sites no longer work with the CSE. I've update the instructions and included the HTML source for the three engines I created. The one for India was created by someone else so don't have the info for it. -- GreenC 16:55, 23 February 2017 (UTC)[reply]

Your feedback matters: Final reminder to take the global Wikimedia survey

Hi. Thank you for your recent edits. Wikipedia appreciates your help. We noticed though that when you edited Perma.cc, you added a link pointing to the disambiguation page Memento (check to confirm | fix with Dab solver). Such links are almost always unintended, since a disambiguation page is merely a list of "Did you mean..." article titles. Read the FAQ • Join us at the DPL WikiProject.

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 09:53, 12 March 2017 (UTC)[reply]

HM Stanley

There is no evidence that Stanley was a "pathological brute" or a "historical monster" (that is clear to anyone who ever read Jeal's book) so why does those claims have to be repeated in Tim Jeal? Can we at least have constructive criticism only please?1982vdven (talk) 02:52, 14 March 2017 (UTC)[reply]

Follow up on the article talk page. -- GreenC 15:57, 14 March 2017 (UTC)[reply]

Wayback Medic query

Hi there. I'm messaging you regarding this edit [8] by GreenC bot. Can you explain to me what is wrong with using the short URL from archive.is as opposed to the long one? I only ask as I've only ever added the short URLs (I must have manually added hundreds to articles) and now i'm worried I've done something wrong. Freikorp (talk) 12:54, 19 March 2017 (UTC)[reply]

See Wikipedia:Using_archive.is#Use_within_Wikipedia and the linked RfC for more info. Don't worry about adding short form, the bots should eventually take care of it. Best practice is long form though. Wikipedia has a policy against link shortening services, they can hide malicious links and some other problems. -- GreenC 13:38, 19 March 2017 (UTC)[reply]
Thanks for clearing that up. :) Freikorp (talk) 13:40, 19 March 2017 (UTC)[reply]

GreenC bot

Could you explain why GreenC bot replaced an archive.is link with one from archive.org, here? This after InternetArchiveBot rendered the link thus. Thank you. -- Ham105 (talk) 20:23, 22 March 2017 (UTC)[reply]

It thought the archive.is link was a soft 404. Archive.is has a problem with many soft 404s in their database which makes it unusable for bots trying to add or maintain links. So I developed an algo that aggressively checks and filters them - the downside sometimes it gets false positives like this case. The upside it's able to add many archive.is links that before it wasn't able. So it's a fine balance and I keep fine tuning the algo. In this case I see way to improve it. -- GreenC 21:02, 22 March 2017 (UTC)[reply]
The error I see is that archive.is does not support https <This site can’t provide a secure connection archive.today uses an unsupported protocol. ERR_SSL_VERSION_OR_CIPHER_MISMATCH> after InternetArchiveBot changed it from http (the original http link worked okay - and still does). Bots editing the edits of other bots. -- Ham105 (talk) 00:39, 23 March 2017 (UTC)[reply]
The https link works for me (Firefox Win7) .. it doesn't work for you? I see a locked padlock next the URL bar meaning it's secure. It's possible archive.today doesn't support https. -- GreenC 00:48, 23 March 2017 (UTC)[reply]
Works for me too. I only convert to HTTPS on sites that support it.—CYBERPOWER (Chat) 00:50, 23 March 2017 (UTC)[reply]

The link works for me on firefox as well. Must be an issue with my instance of Chrome rather than some universal problem. Green C, what are the bot's statistics on these link swapouts? How many archive.is links has your bot replaced with archive.org links, and how many has the bot encountered (since the first was swapped) and left unchanged? -- Ham105 (talk) 06:14, 23 March 2017 (UTC)[reply]

It could also be a configuration problem at archive.is ... might be worth tracking down why it failed in Chrome in case there is a bigger problem at archive.is. In terms of changes, that's a good question. In the last batch of 10k articles it found 77 preexisting archive.is links. Of those 14 were considered inoperable by the bot and, after manually checking them, 8 are false positives ie. they actually are good links. The upside is the bot added 1,347 new archive.is links, which it couldn't have done without this algo and for which no other archives are available. I'll check those 8 and see why the algo thought they were bad and try to tune it. -- GreenC 14:00, 23 March 2017 (UTC)[reply]
Thanks for looking at the issue. -- Ham105 (talk) 11:18, 24 March 2017 (UTC)[reply]
The solution implemented: if the link is already on Wikipedia, likely it's a good link so the algo will be in "easy" mode with basic filtering rules. If the link is sourced to an external API as a candidate for inclusion, the algo will be in "strict" mode with additional filtering rules. This will err on the side of good faith for existing links and more verification for new links. It worked for all but 1 link. -- GreenC 02:55, 25 March 2017 (UTC)[reply]

GreenC bot, query 2

Hi GreenC, thanks for your earlier replies and fine tuning to —at least on my reading of your summary— effectively give the original human editor some priority. I do recognise the usefulness of your bot in maintaining these links. As an editor using archives heavily, I'm also interested in seeing the process improved where it can be. So, checking my watchlist today has prompted me to raise another query.

  • The recent Green C bot edit here added an archive link on Sydney University at line 141.
  • However the previous edit by Green C bot here on 15 September 2016 removed the original archive link for the same citation (then at line 139), ...
  • ... as well as another on NSW Rugby History, here (then at line 148).
  • All archive links I refer to this time are from the the Wayback Machine (archive.org).

Even though the link added by Green C bot for the Sydney University citation is a more recent capture, it is an inferior version (format-wise and as a reproduction of the orginal web page) compared to the one originally included. The NSW Rugby History citation now has no archive url at all, although the link originally included is still viable. So my query is fourfold:

  1. Does archive.org also suffer from soft 404s (or, if not some intermittent error, why were the url's deleted on 15 September)?
  2. Is there a reason an archive url was then restored on 26 March for one of the two instances above, but not the other?
  3. In cases where there are multiple captures of an archived webpage, how does the bot decide which capture date to use?
  4. Can a case be made, for instances where links are being deleted, to change to a two-stage process? Perhaps instead of deleting or subtituting the archived link, a parameter with an error code might be added instead. The page could be scanned again at a later date (and if the same issue is still found) then the deletion or substitution made. Using an error code would at least provide some transparency to human editors as to why the bot is making deletions, even after the event.

Thank you -- Ham105 (talk) 03:13, 27 March 2017 (UTC)[reply]

The bot keeps detailed records. In the case of saintsandheathens.wordpress.com on Sept 15 the link didn't exist at Wayback. This was checked by 2 API requests, and verified by header checks (physically attempting to access the page) - plus lots of timeouts and retries. Since that version of the bot, I've added a new recheck step in the process so there is about a 24hr delay before it finally decides a link is dead. The error code idea is interesting and I will consider it.. transparency and seeing progressive checks is useful though it adds a lot of minor page edits. It could be tracked offline too. Archive.org has some soft404s which the bot checks for, but not nearly as bad as archive.is where every page is header status 200 .. The snapshot date is what the API says is the best available for the date requested (I think it checks access-date first then date). -- GreenC 05:03, 27 March 2017 (UTC)[reply]
Looking at the data from the March 26 edit for saintsandheathens.wordpress.com it was a difficult case for the bot to find the link. The API is still saying the link doesn't exist, so it went through a long process of trying different things before it finally found a snapshot. This is new code and why the Sept 15 run didn't find it but the March run did. Similar problem with sydneywomensrugby.rugby.net.au .. it found it but not easily, using the new code. As for saintsandheathens.wordpress.com/states/nsw-waratahs/ it didn't get processed on March 26 because there is no {{dead link}} tag which the Sept. 15 run didn't leave because it thought the source link was still live (dead-url was set to "no"). -- GreenC 05:17, 27 March 2017 (UTC)[reply]

Hi. Thank you for your recent edits. Wikipedia appreciates your help. We noticed though that when you edited The Forgotten Soldier, you added a link pointing to the disambiguation page New Yorker (check to confirm | fix with Dab solver). Such links are almost always unintended, since a disambiguation page is merely a list of "Did you mean..." article titles. Read the FAQ • Join us at the DPL WikiProject.

It's OK to remove this message. Also, to stop receiving these messages, follow these opt-out instructions. Thanks, DPL bot (talk) 10:08, 29 March 2017 (UTC)[reply]

Your edit in List of TCP and UDP port numbers article created some referencing errors.

  • Reference 107 (port 1207): Error: If you specify |archivedate=, you must first specify |url=. I don't expect this to be changed, since |url= is generated from the other parameters such as |rfc=. This template ignoring the existence of |rfc= is a bug, of course.
  • Reference 221 (port 7542): |deadurl=no is ignored. This is also a bug.

This has previously been explained in Special:Diff/772452835 among a few other Template:Cite IETF bug workarounds in the article's edit summary history.

I'd like to see Template:Cite IETF documentation and parameters get fixed. Alternatively I'd like to see it deprecated in favor of Template:Cite web, but for now I'll keep using Template:Cite IETF as it seems to be more semantic. 80.221.152.17 (talk) 13:22, 13 April 2017 (UTC)[reply]

I guess a workaround is to use the standalone archive template {{webarchive}}. Since {{Cite IETF}} is only used in 181 articles and programming a fix for it will be difficult I'll tell my bot to ignore the template for now. -- GreenC 14:19, 13 April 2017 (UTC)[reply]