Extension:ArchiveLinks/Project/UserStories

Theme: Render external links with "cache" link in MediaWiki articles.
 * Render external links differently. ✅
 * Render external links differently based on configuration in LocalSettings.php ✅
 * Create a sample config that would work with www.archive.org. ✅
 * Create a sample config that would work with wikiwix.org. ✅
 * Create a sample config that would work with a local spidering system. ✅
 * Internationalize any UI that needs it (the word that appears in the "archive" link, anything else?) ✅

Theme: queueing links for spidering
 * On article save, get external links. ✅
 * On article save, get external links, place into a queue. ✅
 * Write another program that can consume links from the queue and prints it to the screen. ✅
 * Ensure that another program invoked at the same time doesn't contend with the other one.
 * Create a permanent blacklist for domains we don't want to spider. ✅ (the blacklist table is checked but there is no UI to populate it)
 * Ensure that any Wiki administrator can edit this blacklist. ❌ (I'm holding off on this for now, will come back to it later)
 * Ensure that we don't queue such links for archival. ✅

Theme: spidering a link and storing HTML.
 * Expand the program above to invoke wget to spider the link. ✅ (at least partially, there are still some lingering problems with meta tags not being followed and links not being rewritten properly.
 * Store these files in a permanent manner ✅ (at least in the most basic manner, swift support has yet to be added)

Theme: linking to the stored HTML
 * Create web handler for archived links stored locally.
 * If the local file doesn't exist, show an interstitial page and then send users to the original URL.
 * If the local file does exist, make it show a header, like Google Cache, with placeholder for content.
 * If we think the content should be there but we still can't find it, show an error message.
 * Make it so the archive links within articles link to the locally archived content as described above.

Theme: feed for external archive partners
 * decide on format for feed w/partner (Archive.org, etc.?) ✅
 * Push from MediaWiki (hard) or Archive.org polls? (easier,) ✅ They will pull via the API
 * What format -- RSS, ATOM, or something directly from API? http://www.mediawiki.org/wiki/API:Data_formats ? ✅ Data format
 * figure out how to serve this format ✅
 * create either Special page or API module for feed access
 * make sure the job queue runs and actually inserts stuff into the queue

Theme: putting it all together
 * Create a way for links to be spidered automatically
 * With exponential backoff, if the URL 404s
 * Develop heuristics to decide if you should re-spider a link, or just use previously cached items.
 * Develop heuristics to not spider & store links if determined to have malware.