Parsoid/Roadmap

Parsoid is now relatively mature, and supports the ongoing roll-out of the VisualEditor in July 2013. Page loads are normally fast as pages are pre-parsed right after edits. Saves are slightly slower, but still fast enough for most pages.

In the next steps we will move Parsoid closer into MediaWiki core, and research support for new exciting features like switching between HTML and Wikitext.

Image editing refinement [straightforward, Q3 2013]
Support editing of all image features and help others add extensions for other content like video. Also support Wikia in a Gallery port.

Performance: More efficient template updates [straightforward, Q3 2013]
Avoid most API load by only re-expanding transclusions that actually used an edited template. This will need changes to the core API to provide this information, and the capability to persist it on the parsoid side.

Core API: Provide public HTML API [straightforward, Q3 2013]
Provide a public MediaWiki API for the retrieval, re-expansion and saving of HTML+RDFa content. Asked for by Google, Kiwix and others, and will be used internally by VE too. Will enable creation of HTML-based bots and other editing tools.

Language variant support [hard, Q3-Q4? 2013]
The language variant conversion implementation will need a serious overhaul to work well with Parsoid. See bug 41716 for the details. In Q3, we are shooting for an interim solution that at least makes sure that existing language variant uses are preserved. For Q4 and possibly later, we plan to actually clean up language variants so that they become properly editable and can be processed efficiently in Parsoid.

Research / prototype: Support switching between HTML and Wikitext within one edit [hard, Q3-Q4 2013?]
Users would like to be able to switch back and forth between VE and wikitext editing. Supporting this while preserving DOM-based metadata and clean diffs is difficult, but not impossible. We have some ideas on this, which we'll prototype and evaluate. When this is working, provide a public API for this switch that can be used by clients like the VE.

Core storage integration: HTML / Wikitext compound storage [medium, Q3-Q4 2013]
Start work on a compound document type and associated content handler that can contain, for each revision: See bug 49143 for some initial details. This will be further refined in discussion with architects and ops. There is also potential overlap with the needs for Flow storage. Coordinate on this with Erik and the Flow team.
 * Page properties: Categories, TOC settings, redirects etc
 * Wikitext
 * HTML+RDFa
 * Parsoid round-trip information

In Q4, start to use it for VE edits and the public HTML API instead of the Varnish cache setup.

Research / prototype: Enforce proper nesting of most templates, and encapsulate compound content blocks [hard, Q3-Q4 2013]
Transclusions can currently affect arbitrary parts of the page by producing unbalanced HTML. This makes both WYSIWYG editing and efficient updates difficult. We would thus like to move towards properly nested template as much as possible. For existing multi-transclusion content we would like to enforce nesting as a unit, possibly using a extension-like tag wrapping such a content block.

We'll research and prototype ways to establish which templates should emit properly nested output, and how to encapsulate multi-transclusion content. Ideally the solution should also work consistently for old revisions.

Testing infrastructure [straightforward, Q3-Q4 2013]

 * Improve performance of our round-trip testing setup and generalize it further so that it can be used by VE and Mobile.
 * Collect additional statistics about performance metrics.
 * Add test infrastructure for extension handling and our web API.
 * ParserTests script
 * Handle html2wt and html2html better (fixing normalizing, tests)
 * Deal with various bugs in BZ

Miscellaneous [straightforward, Q3 2013]

 * Core API: Add core HTTP API for expansion of extension tags and transclusion content instead of wikitext-based expandtemplates and parse actions
 * Ongoing bug fixing, minor feature development and performance improvements
 * Cleanup and refactoring

Feature: Parse most transclusion parameters to DOM [medium, Q4 2013]
Once we have knowledge about which transclusion parameters can be balanced DOM fragments (or flatten to a string type) from TemplateData or another source, we can parse those parameters to DOM and thus enable visual parameter editing. This includes simple string parameters that can optionally be templated.

Research / prototype: HTML-only wiki support [hard, Q4 2013 - Q2 2014]
The Parsoid web service adds a complex dependency to MediaWiki installations, which is problematic for simple MediaWiki installations that just want to use the VisualEditor. Wikis interested in editing through the VisualEditor exclusively don't necessarily need wikitext-based storage. Instead, they can use HTML storage directly. This will require changes to the save and page load logic. A HTML-based visual diff will be used to compare two versions of an HTML document.

The biggest issue is going to be compatibility with the myriad of PHP extensions that currently use hooks in the PHP parser. This includes important extensions like AbuseFilter, which also operates on wikitext. For mixed HTML / Wikitext wikis we can continue to work on Wikitext for a while but would like to move towards We will try to get an overview of the issues we are facing in this area, but don't expect them to solve them in Q4 2013.

In Q1 2014, start to use Parsoid HTML for views with appropriate CSS rules to match the PHP parser's rendering. Continue to iterate on extension compatibility.

Support extensions in non-Wikipedia projects [hard, Q4 2013 - ?]
Non-Wikipedia projects like Wikibooks use specialized extensions for tasks like labeled section transclusion. We do support most of those extensions generically, but this won't necessarily result in a pleasant editing experience. In the case of labeled section transclusions for example, large parts of the page wrapped in will currently be editable as wikitext only. Since there is a element in HTML5, it might make sense to treat those section tags as HTML5 elements instead.

Similar solutions for all extensions used in sister projects might take longer than a quarter. Some of these solutions might also depend on HTML-only wiki support.

Research: Identify transclusion parameters in transclusion expansions for inline editing [hard, depends on DOM parameters, Q4 2014]
It would be great if simple DOM-based transclusion parameters could be identified and edited inline in the rendered transclusion expansion. This will require deeper changes to the PHP preprocessor, and depends on the DOM parameter parsing scheduled for Q4 2013.

Research / prototype: DOM-based templating [hard, Q1 2014 - ?]
MediaWiki's templating is strongly tied to wikitext: Template parameters are (wikitext) strings, and the template output is wikitext which is further interpreted by a multi-pass parser. Templates are a mix of logic (typically heavily using parser functions) and wikitext snippets, which has given them a reputation for being hard to read. The unstructured nature makes visual editing of templates difficult.

The prospect of HTML-only wikis without a dependency on Parsoid prompts us to re-examine how we do templating in MediaWiki. DOM-based templating with a clear separation between logic and the actual templates looks like a particularly promising option to us.

The main things we need in templates are
 * Simple expressions: provide access to modules and logic, but cannot define infinite loops or variables
 * Iteration: Iterate over finite data structures (JSON objects for example)
 * Conditionals: Include / evaluate a sub-DOM depending on an expression
 * Variable interpolation in attributes and text content
 * Ability to compute expressions and splice output in attributes and text content
 * Ability to invoke other templates and splice the output DOM into the template

This minimal functionality is relatively simple to implement on the DOM. It is desirable to make templates valid HTML documents, so that they can be edited in a visual editor. This can be achieved by encoding the control directives above in attributes similar to TAL, Distal or Genshi.

Type information for template parameters can be used to improve the user interface for editing individual parameters. Instead of (wikitext) strings, we plan to support parameters with JSON-compatible types (Objects, Arrays, Numbers, Strings, Booleans, Date) or DOM fragments. The return type of a template is a DOM fragment instead of wikitext.

Logic can be implemented in an actual programming language (Lua and possibly JavaScript through Scribunto), and can return the same JSON-compatible types. This adds some dependencies, but should still be within the reach of shared hosting installs. Logic should also be able to call templates and return the resulting DOM fragment, in which case it acts as a controller to a template. In this case, the logic should be called in the same namespace as templates so that adding a controller to a template does not require changes to existing callers.

On wikis with Parsoid installed, the wikitext-based template system can be integrated into a DOM-based template system to provide a transition path. Wikitext templates would accept only strings as parameters (other types would be coerced to strings), and would expand to DOM fragments after being parsed by Parsoid.

In this quarter, we plan to implement a first prototype of an HTML DOM-based templating system in PHP (possibly using the built-in XML DOM and XPath bindings), which should provide a good basis for a deeper evaluation.

Use Parsoid HTML for all page views [hard, stretch goal, Q2 2014]
Assuming the HTML-only wiki work in Q1 2014 worked out, we should be in a position where we can consider using Parsoid HTML for regular page views even on mixed HTML / Wikitext wikis like Wikipedia. This can speed up visual editing by eliminating the need to re-load content for editing.

To make this possible at a large scale, the size of compressed Parsoid HTML needs to be closer to that of the PHP parser. Site-specific CSS rules for the content need to be adjusted to work on Parsoid's more semantic HTML structures.