Parsoid/Roadmap

Parsoid is now relatively mature, and supports the ongoing roll-out of the VisualEditor in July 2013. Page loads are normally fast as pages are pre-parsed right after edits. Saves are slightly slower, but still fast enough for most pages.

In the next steps we will move Parsoid closer into MediaWiki core, and research support for new exciting features like switching between HTML and Wikitext.

Image editing refinement [Q2, bug 54844 and sub-tasks]
Support editing of all image features and help others add extensions for other content like video. Also support Wikia in a Gallery port.

Full-stack testing [Q2, bug 56590]
Switch our round-trip testing to use our web API rather than calling Parsoid directly, so that we cover the entire Parsoid stack.

Revision storage with Rashomon [Q2, bug 49143]
Revision storage API with Cassandra backend, which is currently being tested. See User:GWicke/Notes/Storage.

Core API: Provide public HTML API [Q2]
Provide a public MediaWiki API for the retrieval, re-expansion and saving of HTML+RDFa content. Asked for by Google, Kiwix and others, and will be used internally by VE too. Will enable creation of HTML-based bots and other editing tools. See bug 48483. Also strongly related to revision storage.

Parse template parameters to DOM [Q2]
This will expose the nested information in template parameters and enable visual editing of these. Initially we'll parse all parameters to both wikitext and HTML and support editing in either. Later we'll disable HTML for parameters that are marked as 'unbalanced-wikitext' in wikidata.

Basic language variant support [Q2/Q3]
The language variant conversion implementation will need a serious overhaul to work well with Parsoid. See bug 41716 for the details. In Q2, we are shooting for an interim solution that at least makes sure that existing language variant uses are preserved. We might also later clean up the language variant subsystem in general, but don't plan to work on that for now.

Research / prototype: Stable element ids [hard, Q2-Q3?]
Users would like to be able to switch back and forth between VE and wikitext editing. Supporting this while preserving DOM-based metadata and clean diffs is difficult, but not impossible. We have some ideas on this, which we'll prototype and evaluate. When this is working, provide a public API for this switch that can be used by clients like the VE. See bug 52936 for some pointers on metadata storage and -preservation.

Performance: More efficient template updates [Q3]
Avoid most API load by only re-expanding transclusions that actually used an edited template. This will need changes to the core API to provide this information, and the capability to persist it on the parsoid side. Depends on revision storage.

Research / prototype: Enforce proper nesting of most templates, and encapsulate compound content blocks [hard, Q3-Q4]
See bug 55524. Transclusions can currently affect arbitrary parts of the page by producing unbalanced HTML. This makes both WYSIWYG editing and efficient updates difficult. We would thus like to move towards properly nested templates as much as possible. For existing multi-transclusion content we would like to enforce nesting as a unit, possibly using a extension-like tag wrapping such a content block.

We'll research and prototype ways to establish which templates should emit properly nested output, and how to encapsulate multi-transclusion content. Ideally the solution should also work consistently for old revisions.

Research / prototype: HTML-only wiki support [hard, Q3-?]
The Parsoid web service adds a complex dependency to MediaWiki installations, which is problematic for simple MediaWiki installations that just want to use the VisualEditor. Wikis interested in editing through the VisualEditor exclusively don't necessarily need wikitext-based storage. Instead, they can use HTML storage directly. This will require changes to the save and page load logic. A HTML-based visual diff will be used to compare two versions of an HTML document.

The biggest issue is going to be compatibility with the myriad of PHP extensions that currently use hooks in the PHP parser. This includes important extensions like AbuseFilter, which also operates on wikitext. For mixed HTML / Wikitext wikis we can continue to work on Wikitext for a while but would like to move towards We will try to get an overview of the issues we are facing in this area, but don't expect them to solve them in Q4 2013.

In Q1 2014, start to use Parsoid HTML for views with appropriate CSS rules to match the PHP parser's rendering. Continue to iterate on extension compatibility.

Support extensions in non-Wikipedia projects [hard, Q3-?]
Non-Wikipedia projects like Wikibooks use specialized extensions for tasks like labeled section transclusion. We do support most of those extensions generically, but this won't necessarily result in a pleasant editing experience. In the case of labeled section transclusions for example, large parts of the page wrapped in will currently be editable as wikitext only. Since there is a element in HTML5, it might make sense to treat those section tags as HTML5 elements instead. We'll have to investigate if and how the lovely arbitrary overlapping section feature in LST is used, and whether sections are often marked up in unbalanced ways.

Similar solutions for all extensions used in sister projects might take longer than a quarter. Some of these solutions might also depend on HTML-only wiki support.

Research: Identify transclusion parameters in transclusion expansions for inline editing [hard, depends on DOM parameters, Q4 2014]
It would be great if simple DOM-based transclusion parameters could be identified and edited inline in the rendered transclusion expansion. This will require deeper changes to the PHP preprocessor, and depends on the DOM parameter parsing scheduled for Q4 2013.

Research / prototype: DOM-based templating [hard, Q3-?]
MediaWiki's templating is strongly tied to wikitext: Template parameters are (wikitext) strings, and the template output is wikitext which is further interpreted by a multi-pass parser. Templates are a mix of logic (typically heavily using parser functions) and wikitext snippets, which has given them a reputation for being hard to read. The unstructured nature makes visual editing of templates difficult.

The prospect of HTML-only wikis without a dependency on Parsoid prompts us to re-examine how we do templating in MediaWiki. DOM-based templating with a clear separation between logic and the actual templates looks like a particularly promising option to us.

The main things we need in templates are
 * Simple expressions: provide access to modules and logic, but cannot define infinite loops or variables
 * Iteration: Iterate over finite data structures (JSON objects for example)
 * Conditionals: Include / evaluate a sub-DOM depending on an expression
 * Variable interpolation in attributes and text content
 * Ability to compute expressions and splice output in attributes and text content
 * Ability to invoke other templates and splice the output DOM into the template

This minimal functionality is relatively simple to implement on the DOM. It is desirable to make templates valid HTML documents, so that they can be edited in a visual editor. This can be achieved by encoding the control directives above in attributes similar to TAL, Distal or Genshi.

Type information for template parameters can be used to improve the user interface for editing individual parameters. Instead of (wikitext) strings, we plan to support parameters with JSON-compatible types (Objects, Arrays, Numbers, Strings, Booleans, Date) or DOM fragments. The return type of a template is a DOM fragment instead of wikitext.

Logic can be implemented in an actual programming language (Lua and possibly JavaScript through Scribunto), and can return the same JSON-compatible types. This adds some dependencies, but should still be within the reach of shared hosting installs. Logic should also be able to call templates and return the resulting DOM fragment, in which case it acts as a controller to a template. In this case, the logic should be called in the same namespace as templates so that adding a controller to a template does not require changes to existing callers.

On wikis with Parsoid installed, the wikitext-based template system can be integrated into a DOM-based template system to provide a transition path. Wikitext templates would accept only strings as parameters (other types would be coerced to strings), and would expand to DOM fragments after being parsed by Parsoid.

In this quarter, we plan to implement a first prototype of an HTML DOM-based templating system in PHP (possibly using the built-in XML DOM and XPath bindings), which should provide a good basis for a deeper evaluation.

Use Parsoid HTML for all page views [hard, stretch goal, Q4]
Assuming the HTML-only wiki work in Q1 2014 worked out, we should be in a position where we can consider using Parsoid HTML for regular page views even on mixed HTML / Wikitext wikis like Wikipedia. This can speed up visual editing by eliminating the need to re-load content for editing.

To make this possible at a large scale, the size of compressed Parsoid HTML needs to be closer to that of the PHP parser. Site-specific CSS rules for the content need to be adjusted to work on Parsoid's more semantic HTML structures.