Thread:Extension talk:Cite/HTML5-style syntax?

Cite.php syntax is not compatible with HTML5; this is an issue for consistently using HTML5 in MediaWiki – see HTML5.

Cite.php uses XML-style syntax for “empty/void elements” –  and   – in a way that is not compatible with HTML5; the XML-style is from XHTML 1.0. This is not necessarily a problem per se, as these are Tag extensions, not actual HTML/XML tags, but as the web moves to HTML5, this will be a cause of inconsistency and confusion to users unless corrected. Further, it requires the MediaWiki software to support inconsistent syntax.

The easiest solution is probably to use templates to insulate users from the details of the tag syntax; this is detailed below.

Issue
The issue is that in HTML5, the minimized tag sytax  is invalid for non-void elements, and is generally interpreted simply as a tag –   – the trailing slash (/) is stripped, rather than interpreted as a terminator. As the Cite.php tags are non-void (they can have content), this is interpreted as a bare opening tag, without a closing tag.

Thus instead of XHTML  and in HTML5 one would need  and (note that HTML5 allows unquoted attribute values).

Because the ref and references tags are very widely used (notably on English Wikipedia), this issue has very broad scope, even if the impact is minor.

Background
The details are confusing; here is some background.

XML allows minimized tag syntax  for empty elements (see XML: Tags for Empty Elements and XHTML1: Empty elements). In XML, “empty element” means “has no content, whether or not element type declared with keyword EMPTY (meaning ‘cannot have content’)”, so it refers both to elements that cannot have content, such as  (usually serialized as   where the slash is a terminator, but can be serialized as  ) and elements that can have content, but do not, like an empty paragraph:.

Per HTML5 “Elements” documentation: HTML5 “void element“ means “cannot have content”; it corresponds to XML’s EMPTY keyword. Further, trailing / are dropped (formally the slash is actually forbidden for non-void elements). This changes behavior for empty non-void elements. Void elements only have a single tag, where the / is optional, as in  though   is also valid. Empty non-void elements, like empty paragraphs, need a closing tag (though this may be implied in some cases). For example,  in HTML5 does not behave like   (an empty paragraph) as in XHTML, but it’s invalid syntax, and generally instead acts as bare opening tag:.

Proposal
The  syntax is tricky, but the   should be easy to handle. The use of the  template – which always (can?) output   – means that users are insulated from this. Further,  has the same behavior whether it has content or not: it includes any references in its content and lists pending inline refs.

The  syntax is trickier because there are two behaviors for the   tag: create a new named ref (which is non-void), or refer to an existing ref (which is void, or handled by “if empty, then…”). At the tag level, clearer is to separate these, say by  followed by   to create and then refer to a named reference.

This would be easier if this issue were isolated from users via a template. At English Wikipedia  is already used for a separate type of referencing, but   (for “footnote”) is an almost unused template, so it could be used, and would be consistent with the   and   template names. Using this, users could write and then  later.

In more technical detail, the existing XML-style tags could be supported indefinitely, or deprecated then removed, with an automated migration tool. The tool would:
 * Replace  by   (leave quotes as they are allowed) – or even by   and
 * Replace  by   – or even by

Summary
In summary:
 * Authors should use templates:  and
 * Behind the scenes, add  tag and migrate existing usage.

How does this overall approach sound?