Parser 2011/Core tag hooks

Parsing and detection
Tag hooks resemble HTML/XML elements in appearance:
 * XML-style &lt;foo>...&lt;/foo> or &lt;foo/> for empty elements
 * except for some HTML-alikes, implied close is not good. (implied close sometimes sorta works but... todo)
 * tag & attribute names are case-insensitive (like HTML, unlike XML)
 * unknown tags will be treated as plain text source, including the element start/end portions

There are some differences between HTML-alikes and tag hooks:
 * HTML-alikes always contain more wikitext
 * contents of a tag hook node might be interpreted and output as anything (eg, arbitrary HTML block output)

HTML-alikes a little different in handling of nesting etc?

Tag hooks may also be invoked using parser function syntax, via. HTML-alikes may not.

HTML-alikes
These tags are pretty much passed through to their HTML equivalents, with contained text allowed to be expanded further as wiki markup. There are some limitations!


 * Block
 * h1, h2, h3, h4, h5, h6
 * div
 * center
 * blockquote
 * ol
 * ul
 * dl
 * p
 * hr (empty)
 * li
 * dt
 * dd
 * Table specials (block-ish)
 * table
 * caption
 * thead
 * tbody
 * tfoot
 * tr, td, th
 * Freaky mixed
 * ins, del
 * Inline
 * b
 * i
 * u
 * font
 * big
 * small
 * sub
 * sup
 * cite
 * code
 * em
 * es
 * strike
 * strong
 * tt
 * var
 * ruby, rt, rb, rp
 * span
 * abbr
 * dfn
 * kbd
 * samp
 * br (empty)

Attribute filtering
Only whitelisted attributes may be passed through; some attributes will also have validation applied.

Context settings:
 * $wgAllowRdfaAttributes,
 * $wgAllowMicrodataAtributes,
 * $wgHtml5


 * id validation:
 * @id -- on any element, ids are scrubbed for char validity, and some things get exlucded
 * CSS validation:
 * @style -- on any element, the style attribute is scrubbed to remove evil bits
 * URL validation:
 * HTML standard: @rel, @rev
 * RDFa: @about, @property, @resource, @datatype, @typeof
 * HTML5 microdata: @itemid, @itemprop, @itemref, @itemscope, @itemtype

todo: provide canonical whitelists

nowiki
The &lt;nowiki> tag hook is always available by default; it suppresses parsing of wikitext within the contents, leaving it as plain text.

Expected rendering:
 * inline .... or also block?
 * contents taken as plain text, no further processing
 * drop into the surrounding block contexts (generic paragraph if none given)

Attributes:

pre
The &lt;pre> tag hook is always available by default. It works slightly differently from just an HTML &lt;pre> tag, as it also suppressed parsing of wikitext within the contents.

Expected rendering:
 * block
 * contents taken as plain text much like &lt;nowiki>
 * put text in unwrappable area, using newlines to force line breaks
 * use monospace font by default
 * area may be set off
 * if handling of wide text is needed, recommend scrolling rather than wrapping.

Attributes:
 * pass through legit HTML attributes for an HTML pre, more or less (todo)

gallery
(todo)

html
Optional: in classic parser only used when $wgRawHtml is enabled. Should not be used in general as it is unsafe!

Expected rendering:
 * inline
 * contents taken as raw HTML and shoved to output in place

Note that this may actually... explode horribly, depending on how it's used. Might need to be dropped entirely for some usages (wikimedia fundraising sites need love here)

Attributes:

Extension:Math
As implemented in Extension:Math in MediaWiki 1.19 and later (built in to core in 1.18 and earlier).

math
Expected rendering:
 * Latex fragments should be validated and rendered in a suitable way that shows the equation nicely.
 * Plaintext-only: when only plaintext output is available, render the latex source text as if it were plain text, but forced to left-to-right display direction.
 * Screen: screen-sized bitmap output rendering is acceptable, but scalable & selectable text is better if possible.
 * Print: dedicated print output should produce scalable output if possible; if rendering to bitmap, consider making a higher-resolution form than you would for screen

Attributes:

todo: details of validation / whitelist of allowed sequences / common latex setup code

Extension:Cite
Extension:Cite provides tag hooks for rendering basic footnotes / endnotes / citations, and is widely used within Wikipedia and related sites.

In Extension:ParserPlayground's JS code, this is implemented in the MWRefTagHook and MWReferencesTagHook classes. These sample implementations attempt to expand into further parse nodes, so shouldn't require special renderer support (in theory).

ref
Represents and anchored link to a reference note, optionally defining the text of the note.

Expected rendering:
 * if a 'group' attribute is provided, this sets which footnote group is used. If none provided, a default group is used.
 * if a 'name' attribute is provided, it should be used to associate with another matching ref or cite entry in the same group. Otherwise, the next unused sequence number within that group will be assigned. Each group starts numbering at 1.
 * if content is provided, it should be stored for later display by a &lt;references> tag (associated with the group and index number)
 * inline rendering should show a small trigger, such as a superscript footnote link or toggle, which points to or displays the actual note