Parsoid/Parser Unification/Pixel Diff Testing Stats

In order to compare rendering fidelity between Parsoid and core parser, we are going to be doing pixel diff testing on a subset of pages from various wikis and monitor progress. Initially, we are starting small (~25K pages across ~20 wikis) and eventually we will expand the test set. In these initial stages, we are likely going to get a number of false positives as we iron out wrinkles in the testing infrastructure.

Known issues

 * Parsoid's output is unstyled. The testing infrastructure loads the vector skin styles and applies it to the output.
 * jawiki, hewiki, arwiki seem to have CSS diffs between Parsoid & core parser output that is artificially deflating numbers.
 * Parsoid generates identical HTML for refs and references across all wikis and relies on CSS to generate varied styling across wikis. On the core parser side, the core Cite extension actually generates varied output for different wikis and is not CSS-based. Parsoid's approach is better suited for editing clients, but we haven't yet done the work of identifying the precise CSS needed to emulate the output on all wikis. This work is captured on Phabricator. Pixel-diff testing will let us isolate these differences and fix up the CSS.
 * The testing infrastructure attempts to expand all collapsed content on pages before comparing output. Parsoid's output is currently uncollapsed by default (because of missing JS modules in Parsoid's output) and while some JS scripts attempt to expand collapsed sections, this is error-prone currently and doesn't capture all collapsed content and so we get large false positive differences when the collapsed state is different in the two screenshots. Even when we fix T162399 and add JS modules in Parsoid's output, we will need to figure out how to expand all content so we test rendering on all content.
 * For the same reason as above (missing JS modules in Parsoid's output), some pages don't have other jquery-inserted table sorter buttons and classes which also leads to large visual diffs on tables.

As we resole these and other issues in the coming months, we will remove them from this list.