Specs/HTML

See Parsoid/HTML5 DOM with microdata for the general idea and background. This is work in progress, feel free to suggest improvements! See http://rdfa.info/ for RDFa documentation and a live parser.

RDFa structures
Global prefix mappings:
 * Convention: Capital for types, lowercase for attributes.
 * Generally use the prefix instead of vocab definitions to avoid clashes (and allow mixing) with user-supplied RDFa. User-supplied RDFa with the mw prefix is moved to a non-clashing prefix in Parsoid.
 * Generally use the prefix instead of vocab definitions to avoid clashes (and allow mixing) with user-supplied RDFa. User-supplied RDFa with the mw prefix is moved to a non-clashing prefix in Parsoid.

mw:Placeholder and general client behavior
A  protects DOM structures from any editing. Clients are expected to preserve / protect subtrees marked as such. Clients are also expected to preserve any DOM subtrees marked up with  in the http://mediawiki.org/rdf/ namespace they don't understand. This decouples clients from Parsoid development, and lets them concentrate on editing constructs whose special semantics they understand without having to implement all possible content elements.

Thumbnails
 

Simple image
 

Wiki links

 * "mw:wikiLink" for regular wikilinks with editable content, "mw:simpleWikiLink" for the Foo variant where the content is derived from the target.
 * : This produces a triple of type http://mediawiki.org/rdf/wikiLink from the current article to the link target.
 * We might want to add more attribute information about this link (the namespace for example). It is not clear how to do this in RDFa without adding extra HTML structures (meta elements for example). Namespaced attributes would be useful for this, but are not supported in HTML5.
 * remaining info in data-parsoid round-trip info. This is private to Parsoid, must not be modified and can change without notice.
 * in data-mw indicates link tail (see example below)

 alternate linked content 

 Main Page 

Link with tail:  Potatoes 

Category links
 

 

Language links
 Foo 

Interwiki non-language links
 en:Foo </tt>

Autolinked URLs
 http://example.com </tt>

Numbered external link
 </tt>

Named external link
 Link content </tt>

Nowiki blocks
There are two options to handle nowiki editing:
 * 1) Strip the tags from the DOM and let the serializer add those that are needed after each edit
 * 2) Keep them in the DOM for more accurate round-tripping of manually created nowiki blocks, and prevent non-text content from being entered into these blocks in the editor (TODO)

We picked option 2 for now. The nowiki content remains editable. If the content is modified in a way that makes nowiki unnecessary Parsoid can remove the wrapper in the serializer.

 foo  </tt>

HTML entities
 œ </tt>

Behavior switches
Help:Magic_words. Not yet implemented, tracked in 37909.

 </tt>

 </tt>

 __NEWSECTIONLINK__ </tt>

 __NONEWSECTIONLINK__ </tt>

 __NOGALLERY__ </tt>

 __HIDDENCAT__ </tt>

 __NOCONTENTCONVERT__ </tt>

 __NOCC__ </tt>

 __NOTITLECONVERT__ </tt>

 __NOTC__ </tt>

<tt> </tt>

<tt> __NOINDEX__ </tt>

<tt> __INDEX__ </tt>

<tt> __STATICREDIRECT__ </tt>

Template content
Implementation progress tracked in bug 37911.

<tt> </tt> <meta about="#mw-t1" property="mwt0:Foo#1" content="positional"> <meta about="#mw-t1" property="mw:src" content="http://en.wikipedia.org/wiki/Template:Foo">


 * Define a global prefix for the template namespace (mwt0 in this case). Reasoning: Prefix definitions are scoped to a DOM subtree, so the prefix definition would need to be repeated for multi-rooted template output. This should also be easier to figure out, and makes semantic sense since we are talking about the same property even if it is transcluded repeatedly. The trailing colon in the namespace URL apparently needs to be urlencoded, at least to satisfy http://rdf.info/play.

Templates in attributes
<tt> Some text content </tt>

<tt> <div style="">... </tt>

The exact content of the attribute content for editing purposes could be serialized HTML DOM. Alternatively we could include that directly as a sub-dom in a div-wrapped section at the start or end of the document.

Extension content
<tt> </tt>

noinclude / includeonly / onlyinclude
Not yet implemented, tracked in 40305. We only care about these in the actual page context, not in transcluded pages / templates. <tt> foo bar baz </tt>

<tt> foo bar baz </tt>

<tt> foo bar baz </tt>

TODO
The following constructs still need a RDFa markup definition. They will initially only be marked with typeof="mw:Placeholder" for simple read-only round-tripping.
 * Unexpanded and expanded templates
 * template parameter references
 * noinclude, onlyinclude, includeonly
 * behavior switches (only typeof="mw:Placeholder" currently, source-based round-tripping)
 * tag extensions including citations (partly done)
 * redirects
 * ISBN / RFC / PMED autolinks
 * galleries

Magic links
Proposed!

ISBN link
<tt> ISBN 978-1413304541 </tt>

RFC link
<tt> RFC 1945 </tt>

PMID link
<tt> PMID 20610307 </tt>