Parsoid/OPW tasks

TODO: Copy / move to Mentorship_programs/Possible_projects when ready

Cassandra backend for distributed round-trip test server
Our distributed round-trip test setup thoroughly tests Parsoid by converting 160000 Wikipedia articles from wikitext to HTML and back. Currently all result data is stored in MySQL, which does not deal very well with the large amount of data we throw at it. This project will address this by building a Cassandra backend for the round-trip test server. This will involve working with node.js and Cassandra. A candidate will ideally have at least some node.js experience, and is interested in distributed systems and storage.

Improve round-trip test server web UI
Our distributed round-trip test setup has a very simple UI, product of the sedimentary aggregation of features over time. Moreover, the HTML production is done directly in the code, using simple string concatenation. This project would improve that on two fronts: The project involves working on node.js. A candidate will ideally have some notions of web design, data visualization and javascript programming.
 * Separate the code from the UI using either a templating system or some other form of structured HTML generation.
 * Improve the UI making it more visually appealing and easier to understand at a glance.

Parser migration tool
Periodically we come across some bit of wikitext markup we'd like to deprecate. See Parsoid/limitations, Parsoid/Broken wikitext tar pit, and (historically) MNPP for examples. We'd like to have a real slick tool to enhance communication with WP editors about these issues:
 * It would display a list of wiki titles (filtered by wikipedia project) which contain deprecated wikitext. Each title would link to a page which would briefly describe the problem(s), general advice on how the wikitext should be rewritten, and (perhaps) some previously-corrected pages for editors to look at.
 * Ideally this would be integrated with a wiki workflow and/or contain "revision tested" information so that editors can 'claim' pages from the list to fix and don't step on each others work. Fixed/revised pages would be removed from the list until their new contents could be rechecked.
 * It should be as easy as possible for Parsoid developers to add new "bad" pattern tests to the tool. These would get added to the testing, with appropriate documentation of the problem, so that editors don't have to learn about a new tool/site for every broken pattern.
 * Some of these broken bits of wikitext might be able to be corrected by bot. The tool could still create a tasklist for the bot and collect and display the bots' fixes for editors to review.
 * The backend which looks for broken wikitext could be based on the existing round-trip test server. Instead of repeatedly collecting statistics on a subset of pages, however, it would work its way through the entire wikipedia project looking for broken wikitext (and preventing regressions).
 * Some cleverness might be helpful to properly attribute bad wikitext to a template rather than the page containing the template. This is probably optional; editors can figure out what's going on if they need to.

This project involves working on node.js, and probably MediaWiki bots and/or extensions as well. A candidate will ideally have some node.js experiences and some notions of web and UX design. This task could be broken into parts, if a candidate wants to work only on the front-end or back-end portions of the tool. This task is bug 46705.