Parsoid/Ops needs for July 2013

Overview
We are planning to make VisualEditor the default editor on all (or at least most) Wikipedias by July 1st, 2013. This will cause most edits to be made by VisualEditor, which will put a lot of load on Parsoid. We need to make sure that Parsoid and the other affected subsystems can handle this.

Impact
(For the current architecture of our Parsoid deployment, see Parsoid.)

Every time a user clicks the Edit button, VisualEditor loads the HTML for the page from Parsoid. This is a GET request going through the Varnish caches, so it may be served from cache. If and when the user actually saves their edit, VisualEditor POSTs the edited page to Parsoid and gets wikitext back, which it then saves to the database as a regular edit. These POST requests obviously are not cacheable. The load on the Parsoid backends should therefore be one POST request per edit, and one GET request per edit (there is one GET request per edit attempt, but these are cached in Varnish and invalidated when the page is edited).

In particular the GET parse request may take a lot of time and resources on the Parsoid backends. Each request also causes Parsoid to request the wikitext source of the page from api.php, and may cause additional API requests if there are templates, extensions or images on the page.

When performing a regular edit, we can however reuse most template expansions from the previous version's cached DOM. Updates after template edits are more problematic. During the recent wikidata-related bot runs up to 9 million pages were scheduled for re-rendering per day. Even at a more conservative 5 million pages per day this results in 57 requests per second on average over the day. Without more precise dependency information per template-generated fragment all templates need to be re-expanded in these requests, which means that linksUpdate jobs will generate the bulk of our API requests. We might have to limit the rate of re-renders by delaying them so that several re-renders per page are collapsed into one, or even only performing a fraction of all updates in peak periods. See the detailed Parsoid performance plan for the July release for more detail.

Benchmarks
The last measurement for en:Barack Obama on a single backend yielded 0.4reqs/sec at a concurrency of 10. 26 seconds per request. Obama is one of the more complex pages (it shows up in the upper half of slow-parse.log), so most pages will parse significantly faster. The mean parse time across all pages is probably one order of magnitude faster than Obama (TODO: find a representative mix of edited pages and parse them).

We have not yet collected overly precise data on the API request volume. What we have so far:
 * complex pages can result in hundreds of API requests (one for the page source, template and extension)
 * the Parsoid round-trip setup already performs the API requests corresponding to about 80k pages in 24 hours, and has not caused issues in the API cluster
 * the average edit rate is of the same order of magnitude and can easily be handled by the API cluster
 * the peak edit rate of about 50 edits per second is by a factor of 50 higher than our current round-trip testing and would very likely bring the API cluster to its knees. We will however avoid most API requests for re-parses, so editing is not that much of a problem.
 * the rate of parses from refreshLinks is also very high, and seems to be most troublesome as templates cannot be reused in these. We will probably have to avoid actually executing each of these re-parses. A minimum TTL for template expansions or a probabilistic update are options we will consider.

Deployment plan
The short-term performance plan is simple enough that we aim to deploy this at the beginning of May. This means that we'll track all edits and template updates for a full month before the VE deployment, which will prime the caches and give us real-world performance data. It will also give us enough time to find workarounds for issues we find.

Resources
'''Note: This is a first approximation, and hinges very much on the determination of a mean parse time across a representative set of edits. More precise measurements are needed.'''

Based on the benchmarks above we are assuming a mean parse time of 5 seconds (~1/5th of Obama), which results in a request rate of 2 requests per second per 10-core node. For a peak edit rate of 50 requests per second, this would require about 25 nodes. To sustain an additional 50 refreshLinks jobs per second, a cluster of 50 nodes would be required. Actually performing all of these fully would overload the API cluster though, so a Parsoid cluster of 35 nodes similar to wtp1004 will probably be sufficient.

In addition, two Varnish boxes each with ideally 1GB SSD space (512G min) are needed to provide a large-enough cache to make template reuse effective enough.