Parsoid/HTML5 DOM with RDFa

Note: An early version of this document was discussed in a wikitext-l thread in February 2012. After further investigation we now use RDFa, as it provides some features we can make very good use of. See RDFa vocabulary for current design work.

Wikitext can be divided into shorthand notation for HTML elements and higher-level features like templates, media display or categories.

The shorthand portion of wikitext maps quite directly to an HTML DOM. Details like the handling of unbalanced tags while building the DOM tree, remembering extra whitespace or wiki vs. html syntax for round-tripping need to be considered, but appear to be quite manageable. This should be especially true if some normalization in edge cases can be tolerated. We plan to localize normalization (and thus mostly avoid dirty diffs) by serializing only modified DOM sections while using the original source for unmodified DOM parts. Attributes are used to track the original source offsets of DOM elements.

Higher-level features can be represented in the HTML DOM using different extension mechanisms:
 * Introduce custom elements with specific attributes:  For display or WYSIWYG editing these elements
 * their presentational DOM, but identify and . Template arguments and similar information are stored as JSON in data attributes, which made their conversion to the JSON-based WikiDom format quite easy.

Both are custom solutions for internal use. For an external interface, a seems to fit our needs l.

Assuming a template that expands to a div and some content, this would be represented like this:

In this case, an expanded template argument within (for example) an infobox is identified inside the template-provided HTML structure, which could enable in-place editing.

Unused arguments (which are not found in the template expansion) or unexpanded templates can be represented using non-displaying meta elements:
 * differences.
 * differences.