Parsing/Visual Diff Testing

For evaluating changes to parsing or to the parser setup, we are using mass visual diff testing. In this setup, we have two mediawiki installs. One is the default (base) mediawiki, and the other is the experimental (expt) mediawiki install. Currently we run these via mediawiki-vagrant on labs VMs, but, these could be setup wherever. Currently these two vms are mw-base.wikitextexp.wmflabs.org and mw-expt.wikitextexp.wmflabs.org. Each of them is a multi-wiki setup initialized with production content from about 41 wikis from wikipedia, wikisource, wiktionary, and wikivoyage. As of April 29, 2016, there are about 50K titles that are usable for running tests.

Separately on promethium.wikitextexp.wmflabs.org, we run a testreduce-based testing setup that runs a visualdiff test on a test client. The visualdiff test requests the test tile form $wiki.base.wikitextexp.wmflabs.org and $wiki.expt.wikitextexp.wmflabs.org, generates screenshots for each of those via phantomjs (after doing some CSS and JS post-processing to strip the chrome, expand all collapsed boxes, etc.), and the compares the two screenshots via uprightdiff which in turns, generates a diff image with differences marked up while accounting for vertical pixel shifts of content on the page.

The test results are accessible at http://mw-expt-tests.wmflabs.org/.

Additional information about setup on promethium
The instructions to set up a private instance of the round-trip test server can be found here. A MySQL database is needed to keep the set of pages and the testing results.

Testreduce code
The testreduce code is in /srv/testreduce which is used to run the mw-expts-vd and mw-expts-vd-client services. The systemd controller files for these services are in /lib/systemd/system/mw-expts-vd.service and /lib/systemd/system/mw-expts-vd-client.services -- these files have derived from the puppetized code for similar services on ruthenium used for Parsoid's roundtrip testing.

The testreduce server config is in /etc/testreduce/mw-expts-vd.settings.js. The testreduce client config is in /etc/testreduce/mw-expts-vd-client.config.js which also includes a section that provides the config for the visual diff tests that are to be run.

Visualdiff code
The visualdiff code is in /srv/visualdiff that also provides config and hooks to use it with testreduce. /etc/testreduce/mw-expts-vd-client.config.js also provides the visualdiff config. It specifies how to fetch the HTML for the two screenshots, specifics uprightdiff as the diffing engine to use, and a few other parameters that control these -- the comments should be fairly self-explanatory. The uprightdiff code is in /srv/uprightdiff.

The

Managing services
To {stop,restart,start} all clients on a VM (not normally needed): Client logs are in systemd journals and can be accessed as:

Updating the code to test (and being run by the clients)
Unlike Parsoid where the code to test is determined by the latest git commit, in the mw-expts setup, the code to run lives on a separate VM, and sometimes the change might be in the config files, and may not be available in a git repository (at least as of today). The testreduc codebase implicitly assumes that the test to run is a git commit. However, the testreduce client config file (/etc/testreduce/mw-expts-vd-client.config.js) can declare a getGitCommit function that is then used by the server as clients to identify the test run in the database. So, in our case, this function simply returns a unique string identifying the test run based on changes to the code on the mw-expts labs VM. So, to initiate a new test run, simply change the string being returned by this function, save the file, and restart the mw-expts-vd-client service and you will be ready to go (of course, you have updated the code correctly on the mw-expts labs VM right?)

Updating the testreduce, visualdiff, uprightdiff code
Of course, there will continue to be bug fixes and tweaks to these codebases. To update the relevant code, simply go to /srv/testreduce, /srv/visualdiff, or /srv/uprightdiff, and do a git pull, and restart the affected services. As simple as that!

Resource usage and # of test clients
promethium is a one-off bare metals labs vm with 12 cpu cores, 32 gb memory, and a 400+gb disk. Even so, visual diff testing can use up all these resources. 20 testreduce clients seem to be about the upper-end of how many can be run at the same time. This is enough to sometimes bring cpu load to 13-15 and memory usage to 28+gb. Probably 16 clients is a more comfortable number. The # of test clients to run can be tweaked by editing /lib/system/systemd/mw-expts-vd-client.service

Accessing rendering and diff screenshots
The screenshots from phantomjs and from uprightdiff are written to /data/visualdiffs/pngs organized by wiki prefix. These images are accessible via HTTP @ http://mw-expt-tests.wmflabs.org/visualdiffs/pngs/. These images are overwritten with each test run. It takes too much disk space to store these images per test run. 125GB is used per test run. But, in the future, we could consider storing results from the most recent 2-3 runs or get a larger disk and expand that range a bit more.