Parsoid/Ops needs for July 2013

Overview
We are planning to make VisualEditor the default editor on all (or at least most) Wikipedias by July 1st, 2013. This will cause most edits to be made by VisualEditor, which will put a lot of load on Parsoid. We need to make sure that Parsoid and the other affected subsystems can handle this.

Impact
(For the architecture of our Parsoid deployment, see Parsoid.)

Every time a user clicks the Edit button, VisualEditor loads the HTML for the page from Parsoid. This is a GET request going through the Varnish caches, so it may be served from cache. If and when the user actually saves their edit, VisualEditor POSTs the edited page to Parsoid and gets wikitext back, which it then saves to the database as a regular edit. These POST requests obviously are not cacheable. The load on the Parsoid backends should therefore be one POST request per edit, and one GET request per edit (there is one GET request per edit attempt, but these are cached in Varnish and invalidated when the page is edited).

Each such request may take a lot of time and resources on the Parsoid backends. Each request also causes Parsoid to request the wikitext source of the page from api.php, and may cause additional API requests if there are templates on the page. The Parsoid team is looking into storing the generated HTML for each revision to reduce the need for API fetches for POST requests. An increase in Parsoid request volume will therefore also cause an increase in API request volume.

Benchmarks
Based on his current numbers, Gabriel believes that two Parsoid backends can sustain the throughput of an average day's edits, but we'd need more than that to be able to sustain peak rates. He currently estimates we'll need 10 backends (servers identical to wtp1004).

TODO insert actual numbers here

We have not yet collected data on the associated API request volume.