Parsoid/Parser Unification/Pixel Diff Testing Stats

In order to compare rendering fidelity between Parsoid and core parser, we are going to be doing pixel diff testing on a subset of pages from various wikis and monitor progress. Initially, we are starting small (~25K pages across ~20 wikis) and eventually we will expand the test set. In these initial stages, we are likely going to get a number of false positives as we iron out wrinkles in the testing infrastructure.

Known issues

 * Test timeouts: If there differences between core parser rendering and Parsoid rendering on a page is large, the diffing algorithm (uprightdiff) might take too long or too much memory in which case the test run on that page will not complete since the tests are given a fixed time to complete (~5 mins). That is the reason you see < 100% test completion rate. As we fix sources of diffs, this test completion rate should naturally improve since there are fewer diffs and the diffing algorithm is likely to run to completion.
 * Unstyled Parsoid output: Parsoid's output is unstyled. The testing infrastructure loads the vector skin styles and applies it to the output. While this is now mostly working, there may still be areas where the right styles may not apply because of HTML structure differences between Parsoid and legacy HTML. For example, this is the case with HTML output for media. Legacy parser will soon be updated to emit Parsoid-compatible HTML structures for media and this will eliminate some of these CSS diffs we currently see in these visualdiff test runs.
 * Known Cite CSS diffs: Parsoid generates identical HTML for refs and references across all wikis and relies on CSS to generate varied styling across wikis. On the core parser side, the core Cite extension actually generates varied output for different wikis and is not CSS-based. Parsoid's approach is better suited for editing clients, but we haven't yet done the work of identifying the precise CSS needed to emulate the output on all wikis. This work is captured on Phabricator. Pixel-diff testing will let us isolate these differences and fix up the CSS.
 * Known missing JS modules in Parsoid output: The testing infrastructure attempts to expand all collapsed content on pages before comparing output. Parsoid's output is currently uncollapsed by default (because of missing JS modules in Parsoid's output) and while some JS scripts attempt to expand collapsed sections, this is error-prone currently and doesn't capture all collapsed content and so we get large false positive differences when the collapsed state is different in the two screenshots.

As we resolve these and other issues in the coming months, we will remove them from this list.