Parsing/Replacing Tidy/Linter/Stats/Pixel Diff Testing Stats

We are stopping these diffing runs as of May 29th.

Every week, we run a mass pixel-diff testing on about 72K pages from 60 different wikis from 4 different projects. We generate a rendering of the (latest revision of the) test page that uses Tidy and another rendering of the test page that uses Remex and then compare the two images via the uprightdiff package. If the two images are not completely identical, the diffing algorithm checks if the differences can be accounted for by purely vertical whitespace shifts.

The table below shows how the results changed since August 2017 when we first started these tests. At that time, our internal target for complete Tidy replacement was 95% pages render identically (after accounting for vertical whitespace shifts). Because of shortcomings of our tooling, it is not possible to get to 100% because of false positives and noisiness in tests.

As the table shows, as editors fix linter-identified issues on pages, this percentage has been slowly increasing.