Parsoid/Performance/Landscape

Known performance problems

 * Lot of concurrent requests to the MW API (potential for overload exists, currently tackled by reducing retries and limiting concurrency)
 * inefficient processing of template expansions (1000s of small requests with associated client and server-side API overheads)
 * because of non-trivial per-request API overheads on the MW API end, lots of MW API cluster cpu time wasted in API overheads instead of real work
 * transient load spikes on the Parsoid cluster on occasion
 * timeouts on largish pages (some of this could also be other reasons like loops and other inefficiencies)
 * largish p95 (~5s) and p99 (~50s) parse times

Performance targets

 * reduce MW API cluster load + cpu usage
 * reduce Parsoid cluster cpu usage
 * improve latency of individual parse requests

Features yet to be supported

 * fast HTML <-> wikitext context editing in VE
 * support HTML editing of transclusion parameter. This is not enabled currently because of:
 * potential increase in loads on Parsoid and MW API cluster
 * HTML bloat and additional storage in RESTbase (addressed by separating html parameters into a different storage bucket)
 * potential impact on VE load / editing times (addressed by loading html params on demand from storage)

Full system context

 * RESTbase proxies parse requests
 * limits impact of parse latencies to a narrow set of use cases (back-to-back VE edits, HTML <-> wikitext switching in VE)
 * enables reuse from old revisions
 * enables incremental parsing
 * We want to keep system complexity down and over time, reduce it
 * We want to support HTML <-> wikitext switching in VE
 * We want to enable HTML editing of transclusion parameters
 * HTML -> WT serialization is not an issue currently .. barring spikes, our p95 avg is < 1s.

Performance ideas / projects
+ = PRO, - = CON, 0 = neutral / unknown

Increase tokenizer efficiency
+ can improve latency + some parts of this already done, awaiting final patches and deployment - doesn't help with MW API cluster load

MW API end: fix API overheads
( not within the mandate of the parsing team)

+ potential for helping all API clients, not just Parsoid - MW setup time opt is probably a large undertaking (addressing startup time is a difficult problem in most VMs and usually requires a lot of code untangling and rewrites) 0 concurrency limiting on the Parsoid end limits latency benefits of large number of individual API requests

MW API end: Set up a batching proxy in front of the PHP API ()
( not within the mandate of the parsing team)

+ potentially clean solution with no impact on consumers or PHP code + benefits all API consumers + gets to batch requests from many clients, so more batching with less latency cost - increases request latency: large delay => bigger batches, small delay => not much batching - hard to establish adequate request timeouts for a batch - hard to handle partial failures

Parsoid end: batching of API requests (: expand N transclusions in 1 MW API request)
+ potentially low hanging fruit work on the Parsoid end + large reduction in MW cpu usage (higher mw API overheads from individual http requests and associated req setup / response parse costs) + potential reduction in Parsoid cpu usage (lower node.js overheads from individual http requests and associated req setup / response parse costs) 0 latency may not necessarily improve - potential for higher peak memory usage and increased gc pressure (could offset the benefits from Parsoid-side reduction in API overheads) - potential for added complexity in parsoid for batching logic - harder to establish an adequate timeout for the entire batch - difficult to handle partial failures properly and within time budget Currently, Tim is working on a prototype so we get a realistic sense of what this can get us.

Reuse transclusion output from previous revisions
(currently disabled - see )

+ can reduce calls to MW API + can improve cpu usage + can improve latency - hacks in place to enable dom fragment reuse - potentially error prone even in single-template scenarios in some edge case scenarios - does not work for multi-template scenarios - does not work when templates are edited


 * smarter analyses to reuse in more scenarios can address last two negatives
 * requires fixing our testing holes -- see

Incremental parsing
+ most efficient of all approaches - not low-hanging fruit - requires fixing templating model - long term project

Rewrite tokenizer in Rust as a separate service
+ can help with latency of individual requests - doesn't help with MW API cluster load - not a small undertaking - yet another language introduced into the mix and associated ops dependencies and complexities

Rewrite Parsoid in C++
( strawman .. not going to happen )

+ can help with latency of individual requests - doesn't help with MW API cluster load -- could exacerbate it, if anything

Implement template expansion natively in Parsoid
( strawman .. not going to happen )

+ helps with MW API cluster load - may exacerbate latency of individual parse requests - introduces a lot more complexity in Parsoid, and away from our goal of paring this down

Throw hardware at the problem
± lazy solution - not a real solution - does not address parse latency problems - does nothing for 3rd party users