User:Kevin Brown/ArchiveLinks/UserStories

From MediaWiki.org
Jump to: navigation, search

Theme: Render external links with "cache" link in MediaWiki articles.

  • Render external links differently. Yes check.svg Done
  • Render external links differently based on configuration in LocalSettings.php Yes check.svg Done
  • Create a sample config that would work with www.archive.org. Yes check.svg Done
  • Create a sample config that would work with wikiwix.org. Yes check.svg Done
  • Create a sample config that would work with a local spidering system. Yes check.svg Done
  • Internationalize any UI that needs it (the word that appears in the "archive" link, anything else?) Yes check.svg Done, I think?

Theme: queueing links for spidering

  • On article save, get external links. Yes check.svg Done
  • On article save, get external links, place into a queue. Yes check.svg Done
  • Write another program that can consume links from the queue and prints it to the screen. Yes check.svg Done
  • Ensure that another program invoked at the same time doesn't contend with the other one.
  • Create a permanent blacklist for domains we don't want to spider. Yes check.svg Sort of done (the blacklist table is checked but there is no UI to populate it)
  • Ensure that any Wiki administrator can edit this blacklist. X mark.svg Not done (I'm holding off on this for now, will come back to it later)
  • Ensure that we don't queue such links for archival. Yes check.svg Done

Theme: spidering a link and storing HTML.

  • Expand the program above to invoke wget to spider the link. Yes check.svg Done (at least partially, there are still some lingering problems with meta tags not being followed and links not being rewritten properly.
  • Store these files in a permanent manner Yes check.svg Done (at least in the most basic manner, swift support has yet to be added)

Theme: linking to the stored HTML

  • Create web handler for archived links stored locally. Znak A-14.svg Doing...
  • If the local file doesn't exist, show an interstitial page and then send users to the original URL.
  • If the local file does exist, make it show a header, like Google Cache, with placeholder for content.
  • If we think the content should be there but we still can't find it, show an error message.
  • Make it so the archive links within articles link to the locally archived content as described above.

Theme: feed for external archive partners

  • decide on format for feed w/partner (Archive.org, etc.?) Yes check.svg Done
  • figure out how to serve this format Yes check.svg Done
    • create either Special page or API module for feed access Yes check.svg Done
  • make sure the job queue runs and actually inserts stuff into the queue Yes check.svg Done

Theme: putting it all together

  • Create a way for links to be spidered automatically
  • With exponential backoff, if the URL 404s
  • Develop heuristics to decide if you should re-spider a link, or just use previously cached items.
  • Develop heuristics to not spider & store links if determined to have malware.
Personal tools
Namespaces

Variants
Actions
Navigation
Support
Download
Development
Communication
Print/export
Toolbox