Archived Pages

Goals
The Internet Archive wants to help fix broken outlinks on Wikipedia and make citations more reliable. Are there members of the community who can help build tools to get archived pages in appropriate places? If you would like to help, please discuss, annotate this page, and/or email alexis@archive.org.

Wayback API
To this end, we developed a new Wayback Availability API, that answers if a given URL is archived and currently accessible in the Wayback Machine. API also has timestamp option that will return the closest good capture to that date. For example, GET http://archive.org/wayback/available?url=example.com might return {    "archived_snapshots": { "closest": { "available": true, "url": "http://web.archive.org/web/20130919044612/http://example.com/", "timestamp": "20130919044612", "status": "200" }    } } Please visit API documentation page for details.

IA is crawling wikipedia outlinks
We are running specialized crawls to make this API more useful for Wikipedia community: Newly crawled URLs are generally available through the Wayback within a few hours.
 * IA crawling all new external links, citations and embeds made on Wikipedia pages within a few hours of their creation / update.
 * IA has been bulk-crawling external links periodically for the past 2 years

Implementation Ideas
What useful tools/service can we develop with new tool and service? We invite community members to come up with ideas and implementations. Here are a few ideas:

1. Bot fixing broken external links. When an external link is dead, query the Wayback Availability API to discover if there is a working archived version of the page. If the page is available in Wayback, either a) rewrite the link to point directly to the archived version, or b) annotate the link somehow to indicate that there is an archived version available.

2. Make citations more reliable. When someone cites content on the web, they are citing that URL as it exists at that moment in time. It would be useful to be able to point people to the exact, specific version of the page that was cited (as opposed to the version of that same page at the same URL as viewable later by Wikipedia users, which may have changed since the citation was created). For new citations, Wayback should have an archived version close to that date/time. For older citations, IA may or may not have a version that is appropriately close. If Wayback has an archived version close to the citation date, we could annotate the link somehow to indicate that there is an archived version.