Help:Using archive.today: Difference between revisions

Content deleted Content added

Inline

Revision as of 21:44, 14 April 2022

This help page is a how-to guide.

It explains concepts or processes used by the Wikipedia community. It is not one of Wikipedia's policies or guidelines, and may reflect varying levels of consensus.

Shortcuts

archive.today is an on-demand web archiving service at https://archive.today. A web archiving service allows Wikipedia editors to reduce link rot by preserving a copy of an online source that can be accessed if the original page is moved, changes, or disappears. Not all web pages can be archived using archive.today.^[1]

archive.today can archive HTML web pages, style sheets, JavaScript, and digital images.

Archive.today compared to .is, .li, .fo, .ph, .vn and .md

Besides https://archive.today, the site is accessible through other domains including https://archive.is, .li, .fo, .ph, .vn and .md

The owner of archive.today requests Wikipedia always use archive.today - it is a gateway that redirects to one of the final destinations (.is, .li, .fo, .ph, .vn and .md) based on load and availability. It provides archive.today flexibility to dynamically redirect traffic to other domains/servers.

Differences from other archivers

Other web archiving services include Wayback Machine and at least 20 other providers in use on Wikipedia, although over 80% of all archives in-use are Wayback. The two operate differently, and certain pages can be archived by one but not the other. Similar to archive.today, the Wayback Machine takes snapshots of webpages at certain times, as well as user-initiated on-demand archiving called "Save Page Now" (SPN).^[2]^[3]

Copyright and robots.txt

archive.today removes archived pages by request of copyright holders per the U.S. DMCA;^[4] requests can be made with the "Report abuse" link on archive.today archived pages. Re-hosting U.S. copyrighted material without permission may be a violation of the U.S. Digital Millennium Copyright Act (DMCA) – for this reason, to avoid implicating Wikipedia in violations of copyright laws and incurring DMCA take-down requests, archive.today should be used with some caution regarding U.S.-copyrighted content.

The history of robots.txt and archive providers is longer and more complex than this essay's focus. Briefly, robots exclusion standard was never designed for use by archive providers. The use of robots.txt for this purpose is essentially a hack that led to unintended consequences, for example domains that are hijacked or change ownership with the new domain owner adding a robots.txt which triggers archive providers to block the display of archives from the original site, even though the old site never had a robots.txt - Nevertheless, some archive providers agreed to use robots.txt as a method for end-users to signal when they didn't want their pages publicly archived and/or displayed (if already archived). archive.today does not abide by the Robots exclusion standard.^[5] Wayback Machine formerly^[6] used it to avoid archiving material which site owners do not want archived.^[7]^[8]

Note that it can sometimes be a good idea to add multiple archive providers for key material. Multiple links can be added to Wikipedia using {{webarchive}}.

How to archive

There are several ways to submit a web page to archive.today for archiving. For new users, the website form is suggested. The other methods are better suited to those who use archive.today regularly.

Website form

This method is easy to use. It requires going to the archive.today website to archive a web page.

At https://archive.today/, enter the URL of the web page you wish to archive into the "My url is alive and I want to archive its content" field (the red one).
Click the "Submit" button. When archiving process completes (it usually takes 5-15 seconds) you will be sent to the archived page.
It is recommended that you view the archived page to check if the archive process has been successful.

Android app

Android app Share2Archive can access archive.today by means of a Share action. When viewing a Web page in an Android Web browser (e.g., Chrome, Edge, Firefox), Share it to Share2Archive, and the page archive will open in the default Web browser (not necessarily the same Web browser). If the page is already archived, the archived copy will open; otherwise, a new archive of the page will be initiated.

Browser extensions

Browser extensions that can archive and search archive.today, by means of Toolbar icon and Context menu, are available for:

Bookmarklet

Note: Bookmarklets have been deprecated in favor of Browser extensions.
A bookmarklet is a web browser bookmark which performs a certain function. The archive.today bookmarklet, when clicked, takes the URL of the page you are currently looking at and submits it to archive.today for archiving. This method is straightforward to set up, and is convenient. It is recommended that you have your Bookmarks/Favorites bar visible or at least have your bookmarks accessible within a click or two. This method only allows you to archive the page you are currently viewing. To archive a different web page you will have to use another method.

To set up the bookmarklet, first create a bookmark for any page. Then follow the next two steps to change it to work.
Change or enter the name for the bookmark (e.g. archive.today).
Change or enter javascript:void(open('https://archive.today/?run=1&url='+encodeURIComponent(document.location))) into the Location field.

To use the bookmarklet, simply click on it when you are on a web page you wish to archive. It initiates the archiving process. When the process is complete (it usually takes 5-15 seconds) you will be sent to the archived page.
It is recommended that you view the archived page to check if the archive process was successful.

Firefox smart keyword

Firefox smart keywords are commonly used to perform searches through the Firefox address bar or to open a bookmark by typing a keyword into the Firefox address bar. Here we are going to use a smart keyword to submit a URL to archive.today for archiving. This method is moderately simple to set up.

To set up the smart keyword, hit Ctrl+Shift+B to open up your Bookmarks Library (or by clicking the orange Firefox button on the top left of the window, then going to "Bookmarks", then "Show All Bookmarks").
Browse to a location you would like to save the smart keyword bookmark in.
In the menu at the top of the window, click "Organize", then "New Bookmark".
Enter a name for the bookmark (e.g. archive.today).
Enter https://archive.today/?run=1&url=%s into the Location field.
Enter a keyword for the bookmark. You should choose something short and this keyword must not already be used for another bookmark (e.g. wc).
Click the "Add" button. Close the Bookmarks Library.

To use the smart keyword, add the keyword you chose ("wc" in the above example) followed by a space (" ") in front of the URL of the web page you would like to archive in the Firefox address bar. (e.g. If you are using "a" as your keyword, the text in the address bar would be a http://www.example.com/pageyouwantoarchive.html).
Hit Enter. It initiates archiving process. When archiving process completes (it usually takes 5-15 seconds) you will be sent to the archived page.
It is recommended that you view the archived page to check if the archive process has been successful.

Chrome search engine

Although this is created through Chrome's search engine feature, this functions just like a smart keyword in Firefox. This method is moderately simple to set up.

To set up the "search engine", right click the address bar and select "Edit search engines...". At the bottom of the list that comes up, you can add a "search engine".
Enter a name for the "search engine" in the first field (e.g. archive.today).
Enter a keyword for the "search engine" in the second field. You should choose something short and this keyword must not already be used (e.g. wc).
Enter https://archive.today/?run=1&url=%s& into the third field.
Hit Enter to save the "search engine".

To use the "search engine", add the keyword you chose ("wc" in the above example) followed by a space (" ") in front of the URL of the web page you would like to archive in the Chrome address bar (e.g. If you are using "a" as your keyword, the text in the address bar would be a http://www.example.com/pageyouwantoarchive.html).
Hit Enter. You will be sent to a page containing a link to the archive URL of the web page you wished to archive.
It is recommended that you view the archived page to check if the archive process has been successful.

Use within Wikipedia

Links archived with archive.today should appear in long format. (See Wikipedia talk:Using archive.today § RfC: Should we use short or long format URLs?)

An example long format:

https://archive.today/YYYYMMDDhhmmss/http://www.example.com

This archive URL can be inserted into the archiveurl= and its supporting archivedate= and url-status= parameters in any of the citation templates. If the original URL is no longer accessible, the url-status parameter value should be set to dead. If the original URL is still accessible, the url-status parameter value should be set to live.

<ref>{{cite web |last= |first= |title= |work= |publisher= |date= |url= |archive-url= |archive-date= |url-status= }}</ref>.

Searching for previously archived web pages

Web pages previously archived through archive.today are accessible through a searchable database. Users may search by URL, domain or their wildcards.

Consensus

The request for comment (RfC) held at Wikipedia:Archive.is RFC 4 ended in June 2016 with consensus to remove archive.is from the blacklist. The previous consensus, established earlier at Wikipedia:Archive.is RFC 3, was to blacklist links to archive.today, as soon as all the existing links were removed.

References

^ "FAQ". A page may not be archived for a number of reasons. archive.today does not support archiving Portable Document Format files, audio and video. The page may be too big (there is 50mb limit for a single page). The content may be inaccessible from the archive.today network (this is particularly likely if you are attempting to access subscription based content which your institution subscribes to on its users' behalf). Also, the content may be unreadable by the archive.today archiver (too complex JavaScript based pages can crash its browser or be executed too long time, or ones involving browser checks sometimes cause our archive engine to fail). … Pages which violate our hoster's rules (cracks, porn, etc) may be deleted. Also, completely empty pages (or pages which have nothing but text like "502 Server Timeout") may be deleted.{{cite web}}: CS1 maint: url-status (link)
^ Harihareswara, Sumana (3 September 2013). "Wikitech-l - format of Recent Changes feed". Wikimedia.org technical mail list. Archived from the original on 26 October 2013.
^ "Save Pages in the Wayback Machine". Internet Archive. 2018. Archived from the original on 14 July 2020. Retrieved 19 May 2021. Save Page Now: Put a URL into the form, press the button, and we save the page. You will instantly have a permanent URL for your page. Please note, this method only saves a single page, not the whole site.
^ "How can I delete an archived page". Blog. 24 January 2013. Archived from the original on 26 September 2013.
^ Dascalescu, Dan (18 February 2013). "Web page archiving". Wiki. Dan Dascalescu. Archived from the original on 22 September 2013.
^ "Robots.txt meant for search engines don't work well for web archives".
^ "Removing Documents From the Wayback Machine". Archived from the original on 15 October 2002.
^ "Some sites are not available because of robots.txt or other exclusions. What does that mean?". FAQ. Archived from the original on 4 October 2002.

[1] "FAQ". A page may not be archived for a number of reasons. archive.today does not support archiving Portable Document Format files, audio and video. The page may be too big (there is 50mb limit for a single page). The content may be inaccessible from the archive.today network (this is particularly likely if you are attempting to access subscription based content which your institution subscribes to on its users' behalf). Also, the content may be unreadable by the archive.today archiver (too complex JavaScript based pages can crash its browser or be executed too long time, or ones involving browser checks sometimes cause our archive engine to fail). … Pages which violate our hoster's rules (cracks, porn, etc) may be deleted. Also, completely empty pages (or pages which have nothing but text like "502 Server Timeout") may be deleted.{{cite web}}: CS1 maint: url-status (link)

[wm_071623-2] Harihareswara, Sumana (3 September 2013). "Wikitech-l - format of Recent Changes feed". Wikimedia.org technical mail list. Archived from the original on 26 October 2013.

[WaybackHelp-3] "Save Pages in the Wayback Machine". Internet Archive. 2018. Archived from the original on 14 July 2020. Retrieved 19 May 2021. Save Page Now: Put a URL into the form, press the button, and we save the page. You will instantly have a permanent URL for your page. Please note, this method only saves a single page, not the whole site.

[is_4139573794-4] "How can I delete an archived page". Blog. 24 January 2013. Archived from the original on 26 September 2013.

[Dascalescu-5] Dascalescu, Dan (18 February 2013). "Web page archiving". Wiki. Dan Dascalescu. Archived from the original on 22 September 2013.

[archive_nomorerobotstxt-6] "Robots.txt meant for search engines don't work well for web archives".

[archiveorg_exclude-7] "Removing Documents From the Wayback Machine". Archived from the original on 15 October 2002.

[archiveorg_faq-8] "Some sites are not available because of robots.txt or other exclusions. What does that mean?". FAQ. Archived from the original on 4 October 2002.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 46: / Line 46: @@
 # To '''set up''' the bookmarklet, first create a bookmark for any page. Then follow the next two steps to change it to work.
 # Change or enter the name for the bookmark (e.g. <code>archive.today</code>).
-# Change or enter <code><nowiki>javascript:void(open('https://archive.today/?run=1&url='+document.location))</nowiki></code> into the Location field.
+# Change or enter <code><nowiki>javascript:void(open('https://archive.today/?run=1&url='+encodeURIComponent(document.location)))</nowiki></code> into the Location field.
 # To '''use''' the bookmarklet, simply click on it when you are on a web page you wish to archive. It initiates the archiving process. When the process is complete (it usually takes 5-15 seconds) you will be sent to the archived page.