Topic on Extension talk:Cite

Nbarth (talkcontribs)

Cite.php syntax is not compatible with HTML5; this is an issue for consistently using HTML5 in MediaWiki – see mw:HTML5.

Cite.php uses XML-style syntax for “empty/void elements” – <ref name="foo" /> and <references /> – in a way that is not compatible with HTML5; the XML-style is from XHTML 1.0. This is not necessarily a problem per se, as these are Tag extensions, not actual HTML/XML tags, but as the web moves to HTML5, this will be a cause of inconsistency and confusion to users unless corrected. Further, it requires the MediaWiki software to support inconsistent syntax.

The easiest solution is probably to use templates to insulate users from the details of the tag syntax; this is detailed below.

Issue

The issue is that in HTML5, the minimized tag sytax <foo/> is invalid for non-void elements, and is generally interpreted simply as a tag – <foo> – the trailing slash (/) is stripped, rather than interpreted as a terminator. As the Cite.php tags are non-void (they can have content), this is interpreted as a bare opening tag, without a closing tag.

Thus instead of XHTML <ref name="foo" /> and

<references />

in HTML5 one would need <ref name=foo></ref name=foo> and

<references></references>

(note that HTML5 allows unquoted attribute values).

Because the ref and references tags are very widely used (notably on English Wikipedia), this issue has very broad scope, even if the impact is minor.

Background

The details are confusing; here is some background.

XML allows minimized tag syntax <foo/> for empty elements (see XML: Tags for Empty Elements and XHTML1: Empty elements). In XML, “empty element” means “has no content, whether or not element type declared with keyword EMPTY (meaning ‘cannot have content’)”, so it refers both to elements that cannot have content, such as <br> (usually serialized as <br/> where the slash is a terminator, but can be serialized as <br></br>) and elements that can have content, but do not, like an empty paragraph: <p></p>.

Per HTML5 “Elements” documentation: HTML5 “void element“ means “cannot have content”; it corresponds to XML’s EMPTY keyword. Further, trailing / are dropped (formally the slash is actually forbidden for non-void elements). This changes behavior for empty non-void elements. Void elements only have a single tag, where the / is optional, as in <br> though <br/> is also valid. Empty non-void elements, like empty paragraphs, need a closing tag (though this may be implied in some cases). For example, <p/> in HTML5 does not behave like <p></p> (an empty paragraph) as in XHTML, but it’s invalid syntax, and generally instead acts as bare opening tag: <p>.

Proposal

The <ref name="foo" /> syntax is tricky, but the <references /> should be easy to handle. The use of the {{reflist}} template – which always (can?) output <references>...</references> – means that users are insulated from this. Further, <references></references> has the same behavior whether it has content or not: it includes any references in its content and lists pending inline refs.

The <ref name="foo" /> syntax is trickier because there are two behaviors for the <ref name="foo"> tag: create a new named ref (which is non-void), or refer to an existing ref (which is void, or handled by “if empty, then…”). At the tag level, clearer is to separate these, say by <ref name="foo">...</ref> followed by <ref2 name="foo"> to create and then refer to a named reference.

This would be easier if this issue were isolated from users via a template. At English Wikipedia {{ref}} is already used for a separate type of referencing, but {{fn}} (for “footnote”) is an almost unused template, so it could be used, and would be consistent with the {{sfn}} and {{efn}} template names. Using this, users could write {{fn|name=foo|...}} and then {{fn|name=foo}} later.

In more technical detail, the existing XML-style tags could be supported indefinitely, or deprecated then removed, with an automated migration tool. The tool would:

  • Replace <ref name="foo" /> by <ref2 name="foo"> (leave quotes as they are allowed) – or even by {{fn|name=foo}} and
  • Replace <references /> by <refererences></refererences> – or even by {{reflist}}

Summary

In summary:

  • Authors should use templates: {{fn}} and {{reflist}}
  • Behind the scenes, add <ref2> tag and migrate existing usage.

How does this overall approach sound?

Dantman (talkcontribs)
Reply to "HTML5-style syntax?"