Specs/HTML

See Parsoid/HTML5 DOM with microdata for the general idea and background. This is work in progress, feel free to suggest improvements! See http://rdfa.info/ for RDFa documentation and a live parser.

RDFa structures
Global prefix mappings:
 * Convention: Capital for types, lowercase for attributes.
 * Generally use the prefix instead of vocab definitions to avoid clashes (and allow mixing) with user-supplied RDFa. User-supplied RDFa with the mw prefix is moved to a non-clashing prefix in Parsoid.
 * Generally use the prefix instead of vocab definitions to avoid clashes (and allow mixing) with user-supplied RDFa. User-supplied RDFa with the mw prefix is moved to a non-clashing prefix in Parsoid.

Wiki links

 * "mw:wikiLink" for regular wikilinks with editable content, "mw:simpleWikiLink" for the Foo variant where the content is derived from the target.
 * : This produces a triple of type http://mediawiki.org/rdf/wikiLink from the current article to the link target.
 * We might want to add more attribute information about this link (the namespace for example). It is not clear how to do this in RDFa without adding extra HTML structures (meta elements for example). Namespaced attributes would be useful for this, but are not supported in HTML5.
 * remaining info (presence of generated link content, tail) in data-psd:rt round-trip info. This is private to Parsoid, must not be modified and can change without notice.
 * in data-mw indicates generated content.
 * in data-mw indicates link tail (see example below)

Nowiki blocks
There are two options to handle nowiki editing:
 * 1) Strip the tags from the DOM and let the serializer add those that are needed after each edit
 * 2) Keep them in the DOM for more accurate round-tripping of manually created nowiki blocks, and prevent non-text content from being entered into these blocks in the editor (TODO)

We picked option 2 for now. The nowiki content remains editable. If the content is modified in a way that makes nowiki unnecessary Parsoid can remove the wrapper in the serializer.

TODO
The following constructs still need a RDFa markup definition. They will initially only be marked with data-gen="both" for simple read-only round-tripping.
 * Unexpanded and expanded templates
 * template parameter references
 * noinclude, onlyinclude, includeonly
 * behavior switches (only data-gen="both" currently, source-based round-tripping)
 * category links (only data-gen="both" currently, source-based round-tripping)
 * tag extensions including citations
 * redirects
 * ISBN / RFC / PMED autolinks