Parsoid/DOM notes

Editability

 * Edit transclusions in VE without using wikitext.
 * Easily preview updated transclusions in VE after edits.

Performance

 * Reuse prior transclusions from cache.
 * If possible, have it be a drop-in replacement rather than requiring further processing (currently, it gets placeholder tokens in the stream and goes through handlers, and is unpacked in the end). DOM Reuse issues are discussed further below.

What gets in the way
This is simple if output of templates were well-formed DOM trees on their own. But, this is not true of text-based wikitext templates.
 * Template output need not be a well-formed DOM tree. They can affect arbitrary amount of surrounding context in the page in which they are included.
 * Often, a mix of (1 or more) transclusions, top-level page wikitext, and maybe extension output together form a well-formed DOM subtree.

Editability

 * With mixed transclusion output and page content, editability of page content in the mixed output has to be done in wikitext mode (currently not supported, and hence uneditable).
 * When transclusion args are changed, the changes can leak out of the DOM structure to other sections, and in the worst case, an entire section might have to be re-rendered (assuming sections are hard boundaries for well-balanced trees).

Performance

 * Hampers ability to reuse HTML of prior expansions.

Longer-term strategy for fixing templates
One or more of the following:
 * Gradually move to DOM-based templates.
 * Consider separating data from presentation. For example in large tables, there is a lot of repetitive wikitext that serves no purpose except to introduce syntax errors, foster-parentable content, etc.
 * Use new-extension tags like to wrap transclusions and other output that collectively produce a well-formed DOM tree but individually do not.
 * For the rest, enforce well-formedness of output from text-based templates that are seen bare on the page.
 * Maybe provide new wikitext sugar/tools/syntax that makes it easier for template authors to write templates that produce well-formed DOM trees.
 * Edit templates and use bots to fix uses where possible to minimize cases that require wrapping.

Considerations

 * Acceptability of solution by editors
 * What do we do about all the old revisions which we cannot go in and  edit / fix / wrap in extension tags?

DOM fragment reuse issues
When reparsing a page P, let F be a dom-fragment that corresponds to the transclusion of a template T. Let N be the container node within which F gets inserted. Let us also assume that the output of the template is always well-balanced.

There are three scenarios to deal with when reusing DOM fragments. When reparsing P, currently F is converted to representative wrapper tokens which then participate in various transformations (pre, list, p-wrapping, etc). During post-processing of the DOM, F is unwrapped and inserted. This technique will let us handle scenarios 1 and 2. But, without additional guarantees/constraints on F' and the container node N, VE won't be able to just take F' and drop that in place on the client-side. In the worst case, it will require a serialize + reparse to get HTML nesting constraints (as implemented in the HTML5 parser) exactly right.
 * 1) Page P is edited to  P' . Output of F is unchanged. How do we reuse F from P when parsing  P' ?  This is the common workflow on page edits.
 * 2) Page P is unchanged. Template T that produces F is changed which now produces  F' . How do we now re-render P to incorporate  F' ?
 * 3) Page P is edited in VE. Parameters to T are modified in VE which changes F to  F' . How does VE re-render P to incorporate  F' ?

DOM scopes
If we introduce a notion of self-contained DOM-scopes in a page P (within which all dom-trees are balanced and changes dont leak out to surrounding scopes), then VE would have to request Parsoid to re-render the closest enclosing DOM scope that contains N, the container node for F (and  F' ).

DOM scopes are trees within a page P that are balanced to full DOMs on their own independent of surrounding context, and for which, replacement DOMs can be dropped-in without any analysis or transformations.

For example, wikitext sections are natural independent DOM scopes within a page. But, more generally, all direct children of the tag of P are natural DOM scopes.

Currently, Parsoid treats certain kinds of extension content, image captions, and link targets as independent DOM balancing contexts. However, they can still affect surrounding page context depending on the dom-tree ancestor nodes whin which the output is inserted. For example, even after balancing output of link target independently, if it contains an , it causes a restructuring of the parent  and introduces new sibling nodes there.

So, for the purpose of DOM-fragment reuse, in the general case, it is not possible to guarantee drop-in replacement except for top-level nodes of P (children of ). However, in certain constrained contexts and with some knowledge about the dom fragment F (or F'), and its container node N, we can do better than that.

As long as we always have the fallback solution of dealing with the enclosing DOM scope for the dom fragment F (or  F' , as the case may be), both Parsoid and VE can then use the simple drop-in solution in certain scenarios.

Examples: In general, the acceptability criteria is whether P' == DOM.parse((P' = P.replace(F, F')).serialize). If yes, then F' can be dropped into N (both in Parsoid and VE) without any additional analysis or transformations.
 * 1) N =  and F' =
 * 2) N =  and F' = 