Parsoid/Round-trip testing

The Parsoid code includes a round-trip testing system that tests code changes, composed of a server that gives out tasks and presents results and clients that do the testing and report back to the server. The code is in Parsoid/tests/server and Parsoid/tests/client.

There's a publicly accessible instance of the server at http://parsoid.wmflabs.org:8001/. It currently tests a representative (~160000) set of pages from different wikipedia languages.

Private setup
The instructions to set up a private instance of the round-trip test server are in Parsoid/tests/README. A MySQL database is needed to keep the set of pages and the testing results.

wmflabs setup
Coordinator on parsoid.wmflabs.org (ports 8001 for web and 8002 for internal API), about 50 clients on various parsoid-* VMS (36 cores as of early Dec 2012). Clients run code from the shared /data/project/parsoid-deploy/ repository, and commit suicide when the revision of that checkout changes. The client VM hostnames are parsoid-roundtrip{4-8}.8core. You need to ssh -A onto parsoid.wfmlabs.org before being able to log into the client VMs.

Both the coordinator and the clients are managed/restarted by upstart. Config in /etc/init/parsoid-rt-{server,rtclient}.conf.

To {stop,restart,start} all clients on a VM (not normally needed): sudo service parsoid-rt-client stop sudo service parsoid-rt-client restart sudo service parsoid-rt-client start

Logs are in /var/log/upstart/parsoid-rt-{server,client}.log.

Updating the code to test (and being run by the clients)
on parsoid.wmflabs.org, as root:

cd /data/project/parsoid-deploy/src git pull

Clients commit suicide when they notice that the code has changed, and upstart restarts them based on the new code. To restart them manually on a client node:

service parsoid-rt-client restart

Updating the round-trip server code
cd /data/project/parsoid-deploy-rtserver/src git pull service parsoid-rt-server restart

Todo / Roadmap
Please look at the general Parsoid roadmap.

Improve the server UI
Right now the results are presented in a very simple way, and the code for generating the HTML is just string concatenation. We should change it to use a templating system to separate the code from the presentation.

Other improvements could be done on the presentation itself, as right now it's all simple text tables.