Parsoid/Minimal performance strategy for July release

Peak edit rates are around 50reqs/second across all Wikipedias. In July, we need to sustain rates close to that as the Visual Editor is scheduled to become the default editor on all Wikipedias. Parsoid itself can be scaled up with more machines. We do however use the MediaWiki API to expand templates and extensions and to retrieve information about images. This can mean hundreds of API requests on large pages, which would overload the API cluster.

We have a long-term performance strategy as outlined in our roadmap that will also address the API overload problem. We might however not be able to implement enough of this before July. A minimal backup strategy to avoid overloading the API cluster is needed.

Leverage cached parse results to avoid API overload
We have a Varnish cache in front of the Parsoid cluster, which caches the parse result for a given revision. We can use this cached parse result to speed up subsequent parses. The main things we are interested in (template / extension expansions and image dimensions / paths) are available in the previous version and are marked up in a way that makes it relatively easy to extract and reuse.

Outline:
 * Retrieve previous version's HTML DOM from cache
 * Extract template, extension and image data from it and pre-populate internal caches with it
 * Parse new page, which will trigger API requests only on changed template transclusions / extensions / images.

Cache updates
Templates and images in particular can be modified, so we'll have to make sure our cached expansions are not getting too stale. A simple and promising option is to piggyback onto the linksUpdate job with a hook. The hook action can then either purge + re-request or implicitly refresh the Varnish copy.

On edit, a simple request for the new oldid (which is part of the URL) can leave the previous oldid's DOM in place until Parsoid does not need it any more. Parsoid can then explicitly purge that DOM after retrieving it.

For this to work, the cache-busting page_touched parameter needs to be removed from the GET URL.

Relevant links:
 * Implicit refresh without purge: https://www.varnish-cache.org/trac/wiki/VCLExampleEnableForceRefresh
 * Parsoid page on wikitech

Possibly relevant for other invalidation approaches:
 * |info&rvprop=content&titles=Foo touched and timestamp in page source response. Touched after an edit can theoretically be at most a few seconds behind the revision timestamp, but this is close enough to distinguish template updates from edit updates.