Wikimedia Services/Revision storage for HTML and structured data: Use cases

We are considering the costs and benefits of To this end, we are collecting relevant use cases in this page. Please be bold, and add / tweak as needed.
 * 1) offering predictable performance for access to HTML and metadata of old revisions, and
 * 2) archiving citable HTML renders of articles as they looked at the time.

There are two categories of use cases. One where unavailability of stored (previously rendered) HTML is merely a performance / latency issue and doesn't affect the functionality / feature. The other is where the re-rendering of the page in the future affects functionality because the re-rendering will not be identical to the HTML version that was generated at the time the revision was created. It is useful to identify which category the use case belongs to.

Hot linking to old revisions
Needs: Fast access to old revisions

Both external sites & wiki content occasionally link to specific revisions of an article. Performance for those accesses should be reasonable and predictable.

FIXME: Whether this is a performance issue or a functionality issue depends more specifically on why the old revision was hot-linked to. So, I think this use case should be merged with more specific use cases where the actual reason for hotlinking is articulated.

(Also: revision tagging)

Visual diffing
Needs: Fast access to old revisions

The editing team is working towards a visual diffing service, with a view towards eventually becoming the default change review experience. This would enable VisualEditor users without wikitext knowledge to to review edits.

Most diffs are expected to be against relatively recent revisions, but it would at least be a bonus if performance would not degrade significantly when flipping through older diffs.

Not having stored HTML for older revisions could also affect a diff in some scenarios where wikitext semantics / syntax has evolved OR html versions have been updated. So, this is not merely a performance issue.

Citing Wikipedia content
Needs: Long-term storage of specific renders, "as they looked at the time".

Template and software changes make it difficult to reliably cite a specific revision of a Wikipedia article. MediaWiki always uses the latest version of any transcluded content. Facts in infoboxes can disappear when the template is edited, and news items or featured content on the main page are replaced every day.

Stored HTML versions are required for functionality.

Research and analytics
Needs: Reasonably fast and high-volume access to old revisions

Research / analytics use cases frequently have a need to extract information from a large number of revisions of an article. Examples include machine learning like ORES, as well as projects aimed at establishing the trustworthiness of specific parts of an article.

Currently, many of these projects are using custom wikitext parsers. This presents a high bar to entry. By lowering the bar to entry with more accessible HTML and structured data formats, such research would become more accessible, resulting in more contributions especially from outside researchers and tool writers.

HTML dumps
Needs: Reasonably fast and high-volume access to old revisions