Parsoid/Performance/Landscape

From mediawiki.org

As of July 13, 2015, Parsoid cluster has a load of < 20% and MW API cluster has a load < 15%. So, the performance issues outlined here are not critical. This is primarily because we have gone the route of throwing hardware at the problem. However, given where we are in terms of functionality, and given that we still have some performance issues and new features that require fast / efficient wt -> html generation, it is time to start addressing this a bit more seriously.

Known performance problems[edit]

  • Lot of concurrent requests to the MW API (potential for overload exists, currently tackled by reducing retries and limiting concurrency)
  • inefficient processing of template expansions (1000s of small requests with associated client and server-side API overheads)
  • because of non-trivial per-request API overheads on the MW API end, lots of MW API cluster cpu time wasted in API overheads instead of real work
  • transient load spikes on the Parsoid cluster on occasion
  • timeouts on largish pages (some of this could also be other reasons like loops and other inefficiencies)
  • largish p95 (~5s) and p99 (~50s) parse times

Performance targets[edit]

  • reduce MW API cluster load + cpu usage
  • reduce Parsoid cluster cpu usage
  • improve latency of individual parse requests

Features yet to be supported[edit]

  • fast HTML <-> wikitext context editing in VE
  • support HTML editing of transclusion parameter. This is not enabled currently because of:
    • potential increase in loads on Parsoid and MW API cluster
    • HTML bloat and additional storage in RESTbase (addressed by separating html parameters into a different storage bucket)
    • potential impact on VE load / editing times (addressed by loading html params on demand from storage)

Full system context[edit]

  • RESTbase proxies parse requests
    • limits impact of parse latencies to a narrow set of use cases (back-to-back VE edits, HTML <-> wikitext switching in VE)
    • enables reuse from old revisions
    • enables incremental parsing
  • We want to keep system complexity down and over time, reduce it
  • We want to support HTML <-> wikitext switching in VE
  • We want to enable HTML editing of transclusion parameters
  • HTML -> WT serialization is not an issue currently .. barring spikes, our p95 avg is < 1s.

Performance ideas / projects[edit]

+ = PRO, - = CON, 0 = neutral / unknown

Increase tokenizer efficiency[edit]

+ can improve latency
+ some parts of this are already done, awaiting final patches and deployment ( https://gerrit.wikimedia.org/r/#/c/220698/ )
- doesn't help with MW API cluster load

Once this is deployed, we'll have a realistic idea of how much this helped with parsing performance. Additional work here is not really low-hanging fruit anymore.

MW API end: fix API overheads[edit]

( not within the mandate of the parsing team)

+ potential for helping all API clients, not just Parsoid
- MW setup time opt is probably a large undertaking (addressing startup time is a difficult problem in most VMs and usually requires a lot of code untangling and rewrites)
0 concurrency limiting on the Parsoid end limits latency benefits of large number of individual API requests

MW API end: Set up a batching proxy in front of the PHP API (task T45888)[edit]

( not within the mandate of the parsing team)

+ potentially clean solution with no impact on consumers or PHP code
+ benefits all API consumers
+ gets to batch requests from many clients, so more batching with less latency cost
- increases request latency: large delay => bigger batches, small delay => not much batching
- hard to establish adequate request timeouts for a batch
- hard to handle partial failures

Parsoid end: batching of API requests (task T45888: expand N transclusions in 1 MW API request)[edit]

+ potentially low hanging fruit work on the Parsoid end
+ large reduction in MW cpu usage (higher mw API overheads from individual http requests and associated req setup / response parse costs)
+ potential reduction in Parsoid cpu usage (lower node.js overheads from individual http requests and associated req setup / response parse costs)
0 latency may not necessarily improve
- might result in increased overall Parsoid request latency
- potential for higher peak memory usage and increased gc pressure (could offset the benefits from Parsoid-side reduction in API overheads)
- added complexity in parsoid for batching logic
- harder to establish an adequate timeout for the entire batch
- difficult to handle partial failures properly and within time budget

Currently, Tim is working on a prototype so we get a realistic sense of what this can get us.

Reuse transclusion output from previous revisions[edit]

(currently disabled - see task T98995)

+ can reduce calls to MW API
+ can improve cpu usage
+ can improve latency
- hacks in place to enable dom fragment reuse
- potentially error prone even in single-template scenarios in some edge case scenarios
- does not work for multi-template scenarios
- does not work when templates are edited 
  • smarter analyses to reuse in more scenarios can address last two negatives
  • requires fixing our testing holes -- see task T57438

Incremental parsing[edit]

+ most efficient of all approaches
- not low-hanging fruit
- requires fixing templating model
- long term project

Rewrite tokenizer in Rust as a separate service or binary node module[edit]

+ can help with latency of individual requests
- doesn't help with MW API cluster load
- not a small undertaking
- yet another language introduced into the mix and associated ops dependencies and complexities

Rewrite Parsoid in C++[edit]

( strawman .. not going to happen )

+ can help with latency of individual requests
- doesn't help with MW API cluster load -- could exacerbate it, if anything

Implement template expansion natively in Parsoid[edit]

( strawman .. not going to happen )

+ helps with MW API cluster load
- may exacerbate latency of individual parse requests
- introduces a lot more complexity in Parsoid, and away from our goal of paring this down

Throw hardware at the problem[edit]

Âą lazy solution
- not a real solution
- does not address parse latency problems
- does nothing for 3rd party users