Parsoid/Parser Unification/Pixel Diff Testing Stats

In order to compare rendering fidelity between Parsoid and core parser, we are going to be doing pixel diff testing on a subset of pages from various wikis and monitor progress. Initially, we are starting small (~25K pages across ~20 wikis) and eventually we will expand the test set. In these initial stages, we are likely going to get a number of false positives as we iron out wrinkles in the testing infrastructure.

Known issues

 * Test timeouts: If there differences between core parser rendering and Parsoid rendering on a page is large, the diffing algorithm (uprightdiff) might take too long or too much memory in which case the test run on that page will not complete since the tests are given a fixed time to complete (~5 mins). That is the reason you see < 100% test completion rate. As we fix sources of diffs, this test completion rate should naturally improve since there are fewer diffs and the diffing algorithm is likely to run to completion.
 * Unstyled Parsoid output: Parsoid's output is unstyled. The testing infrastructure loads the vector skin styles and applies it to the output.
 * Known CSS diffs: jawiki, hewiki, arwiki seem to have CSS diffs between Parsoid & core parser output that is artificially deflating numbers.
 * Known Cite CSS diffs: Parsoid generates identical HTML for refs and references across all wikis and relies on CSS to generate varied styling across wikis. On the core parser side, the core Cite extension actually generates varied output for different wikis and is not CSS-based. Parsoid's approach is better suited for editing clients, but we haven't yet done the work of identifying the precise CSS needed to emulate the output on all wikis. This work is captured on Phabricator. Pixel-diff testing will let us isolate these differences and fix up the CSS.
 * Known missing JS modules in Parsoid output: The testing infrastructure attempts to expand all collapsed content on pages before comparing output. Parsoid's output is currently uncollapsed by default (because of missing JS modules in Parsoid's output) and while some JS scripts attempt to expand collapsed sections, this is error-prone currently and doesn't capture all collapsed content and so we get large false positive differences when the collapsed state is different in the two screenshots. Even when we fix T162399 and add JS modules in Parsoid's output, we will need to figure out how to expand all content so we test rendering on all content.
 * For the same reason as above (missing JS modules in Parsoid's output), some pages don't have other jquery-inserted table sorter buttons and classes which also leads to large visual diffs on tables.

As we resolve these and other issues in the coming months, we will remove them from this list.