Parsoid/Round-trip testing

The Parsoid code includes a round-trip testing system that tests code changes, composed of a server that gives out tasks and presents results and clients that do the testing and report back to the server. The code is in Parsoid/tests/server and Parsoid/tests/client.

There's a publicly accessible instance of the server at http://parsoid.wmflabs.org:8001/. It currently tests a representative (~160000) set of pages from different wikipedia languages.

Private setup
The instructions to set up a private instance of the round-trip test server are in Parsoid/tests/README. A MySQL database is needed to keep the set of pages and the testing results.

wmflabs setup
Coordinator on parsoid.wmflabs.org (ports 8001 for web and 8002 for internal API), about 50 clients on various parsoid-* VMS (36 cores as of early Dec 2012). Clients run code from the shared /data/project/parsoid-deploy/ repository, and commit suicide when the revision of that checkout changes. The client VM hostnames are parsoid-roundtrip{4-7}-8core and parsoid-roundtrip3. You need to ssh -A onto parsoid.wfmlabs.org before being able to log into the client VMs.

Both the coordinator and the clients are managed/restarted by upstart. Config in /etc/init/parsoid-rt-{server,rtclient}.conf.

To {stop,restart,start} all clients on a VM (not normally needed): sudo service parsoid-rt-client stop sudo service parsoid-rt-client restart sudo service parsoid-rt-client start

Logs are in /var/log/upstart/parsoid-rt-{server,client}.log.

Updating the code to test (and being run by the clients)
on parsoid.wmflabs.org, as root:

cd /data/project/parsoid-deploy/src git pull

Clients commit suicide when they notice that the code has changed, and upstart restarts them based on the new code. To restart them manually on a client node:

service parsoid-rt-client restart

Updating the round-trip server code
cd /data/project/parsoid-deploy-rtserver/src git pull service parsoid-rt-server restart

Todo / Roadmap
Please look at the general Parsoid roadmap.

Server UI and other usability improvements
We recently changed the server to use a templating system to separate the code from the presentation. Now other improvements could be done on the presentation itself.

Ideas for improvement:

 * Improve pairwise regressions/fixes interface on commits list . Done!
 * Flag certain types of regressions that we currently search for by eye: create views with
 * Regressions introducing exactly one semantic/syntactic diff into a perfect page, and
 * Other introductions of semantic diffs to pages that previously had only syntactic diffs.
 * Improve diffing in results views:
 * Investigate other diffing libraries for speed,
 * Apply word based diffs on diffed lines,
 * Diff results pages between revisions to detect new semantic/syntactic errors,
 * Currently new diff content appears before old, which is confusing; change this.

Some dsh tricks
Restart all rt servers dsh -cf /home/gwicke/rtclients sudo service parsoid-rt-client restart

Show memory usage of all node processes on all nodes dsh -M -cf /home/gwicke/rtclients ps -C nodejs -o rss= | sort -n -k2