User:Legoktm/archive.txt

From MediaWiki.org
Jump to navigation Jump to search

This is Legoktm's Lua module proposal on dealing with dead links. The main goal is to make it easier for editors to mark links as dead, and we do this by automating the lookup of the specific wayback machine URL.

In a perfect would we'd be able to automate the actual detection of dead links, but past experience has shown that to be unreliable with soft 404s, or websites having unintentional downtime, etc. We can probably provide some kind of suggestions on probably dead links, but IMO the final decision to mark a link as dead should be done by a human (for now, we can always revisit this in the future).

The extension would expose a Lua function that takes a URL and a timestamp. It will return a URL pointing to the archive.org copy that is within a reasonable time from the provided timestamp (probably plus/minus 6 months). Using a Lua module adds the flexibility of letting individual wikis integrate it into their existing citation templates however they want.

It would be too expensive to hit an archive.org API for every invocation of the Lua function on every parse, so we would have a local mirror of the wayback machine database of what URLs they have archived, and the timestamps for those. This wouldn't necessarily need to be a live copy, it could be synchronized every few days or weekly.

Since we know what URLs the wayback machine has stored, we can also let them know of newly added URLs to the wiki that they don't have, so they can pre-emptively archive it. We'd have a queue where newly added links are stored, and if they are still in the article after 24 hours, we let archive.org know so they can archive it. This isn't a high priority because IA is already crawling new links added to Wikipedia.