Parsoid/SelserTesting

From mediawiki.org

The parserTests.js file is getting more and more mature, and one of the big steps in that has been the addition of tests for the SelectiveSerializer class, which we're using on English Wikipedia in order to avoid dirty diffs on big pages, especially in the case where the troublesome bits on the page weren't modified.

The background[edit]

The VisualEditor team gave us a great home-field advantage with a few patches that enabled change marking. This meant that anything the VisualEditor saw change would be easily tagged in the DOM, and therefore, when it got to Parsoid, we would be able to see that and serialize those parts of the page. The format for change markers can be found on this page.

The theory[edit]

Since the SelectiveSerializer class skips unmodified content, we need to give the test run some modifications. But why change the DOM if a change marker is what the SelSer class really needs? So we just marked the DOM up with random change markers and tried serializing things. After a long few weeks, SelSer is actually up to par with the regular serializer, and beginning to surpass it!

Now, we're also testing with actual changes. We create changes based on the DOM and a list of saved change markers, which allows us to run 20 rounds of selective serialization per test run without any random interference. That is to say, we get random-looking changes, but everyone on the team can see the same ones.

Moving forward[edit]

Considering that this system is mostly built around the Parsoid infrastructure, it's probably not as useful as it could be to other projects. A multi-stage test environment with pluggable interfaces (we use async.waterfall to achieve this) could be useful in other places, but that infrastructure is currently too tied up in our own code to be used elsewhere.