HTML5

A page for ideas about what features of HTML5 we can use in MediaWiki, assuming we end up switching to it. See the spec.

FAQ about MediaWiki use of HTML5

 * Isn't HTML5 unfinished?
 * Some parts of HTML5 are stable and interoperably implemented, and there's no reason not to use those parts. See the WHATWG FAQ entry "When will HTML5 be finished?".  Part of the purpose of this page is to identify which parts of the spec are currently usable (and useful to us) in practice.


 * But HTML5 is tag soup!
 * HTML5 doesn't require XML well-formedness – e.g., you can omit attribute quote marks – but it does permit it. MediaWiki currently still outputs well-formed XML by default.  This means that by default, you can still (modulo bugs) parse MediaWiki pages using XML libraries, transform them via XSLT, etc.  MediaWiki administrators who want to reduce the size of output HTML can disable $wgWellFormedXml.


 * So MediaWiki outputs XHTML5?
 * According to the HTML5 spec, XHTML5 must be output with an XML MIME type. If you configure MediaWiki and/or your server to serve an XML MIME type, then it's possible you can get it to serve XHTML5, because XHTML5 is fairly close to a subset of HTML5, and it's possible by design to create documents that can be served as either HTML5 or XHTML5.  However, for practical purposes, people don't use XML MIME types much, so normally MediaWiki won't output XHTML5, but rather HTML5 that happens to be well-formed XML (if $wgWellFormedXml is true).  This gives most of the advantages of XML (such as they are), but doesn't cause minor output bugs or typos in system messages to break the site.


 * Will MediaWiki continue to support XHTML 1.0 Transitional?
 * Currently MediaWiki trunk still supports XHTML 1.0 Transitional (just disable $wgHtml5). It's not clear how long non-HTML5 output formats will remain supported.

Short term
Things we can start doing as soon as we like without too much effort, without harmful side effects.


 * Support / without JavaScript.
 * Need to confirm fallback behavior on Safari w/o XiphQT, to make sure the fallback content [eg, Java player] is displayed correctly.
 * HTML5 form attributes: required, pattern, etc. Maybe also new input types like date (is this workable in the short term?).
 * Started in . Now used in a bunch of places, more being added.
 * Remove useless elements and attributes:, type="" on script/style.
 * Started in . Work is ongoing.
 * Use data-* if that's useful anywhere (HTML diffs used something like this at some point before removing them for XHTML validity reasons).

Medium term
Things that will take more care or effort, or will require broader browser support to be useful.


 * Remove closing tags, attribute quotes, etc. Need to be careful: breaking XML well-formedness might break bots.
 * The potential benefit here is reduced HTML output size, thus reduced bandwidth usage and faster page loads. Gains will be moderate, but they add up. The downside noted is that client-side tools doing UI screen scraping with an XML parser would fail; scrapers would need to use a proper HTML parser instead or move to using the API... worst case would be that they switch to regex-based scraping. :)
 * Also note that some devs have expressed strong objections to moving away from well-formed XML. This will need some arguing over.  :)
 * This is now controlled by the $wgWellFormedXml setting, defaults to true (keep outputting well-formed XML).
 * Embed MathML and SVG inline, at least for some users. We'd have to be very careful about sanitizing this to avoid XSS ― especially in the case of a browser that doesn't support inline MathML/SVG, and so will treat the contents of the tags as HTML.  (We could do this with XHTML 1 too, but only if we serve content with an XML MIME type, which we probably don't want to if we can avoid it.  So it would be more convenient with HTML5.)

Long term
Things that we can't do without more browser support. Not much point in working too much on this; too much will depend on how browser development progresses.


 * Start using semantic HTML5 tags like, , etc., and allow (some of) them in user input. This doesn't work acceptably in IE right now without JavaScript hacks: the elements can't be used for styling, so are mostly pointless.
 * Use new HTML5 functional tags like (in addition to / ). Long term because this doesn't seem to be possible to do usefully in a backward-compatible manner without script (is it?).
 * Yes, it is: the contents of and and such are available as fallback. But it's hard to think of uses for these.

Validity issues
Once we're sure we're going with HTML5, we need to start handling validity issues. One validity checker is at http://validator.w3.org/ (although of course, like any validator, it will not catch all errors). Bugs should be filed on these, but here are some that are already known:


 * There are likely still places where the software outputs deprecated stuff like cellpadding, align, font, etc.
 * Line-initial : without a ; is usually used for indentation, and creates a &lt;dl> without any &lt;dt>'s. We need to make this output &lt;div class=indent> or something, if we can't persuade people to use semantic markup instead.
 * Might be a bad idea, seeing how this is being used on talk pages. —Ms2ger 14:23, 17 July 2009 (UTC)
 * That's why we can't just make it invalid. (It's used in articles too, to indent quotations and such.)  We can still change the markup generated so it's clearly presentational rather than pretending to be a definition list, while maintaining the current visual effect. —Simetrical (talk • contribs) 16:52, 17 July 2009 (UTC)
 * You could use  instead of   on Talk pages. 90.135.214.11 11:47, 18 September 2009 (UTC)
 * No we can't, it was removed three days ago. It would probably have been inappropriate anyway; the recommended way to do stuff like blog comments is nested &lt;article> tags, or maybe nested &lt;section>.  See the article element section. —Simetrical (talk • contribs) 18:05, 18 September 2009 (UTC)
 * Users can currently still input now-invalid elements/attributes like &lt;font>, cellpadding, etc. We could try to automatically translate these, but it would be tricky in some cases.  Also, maybe it's better to give the validation errors and encourage users to use semantic HTML instead of auto-translating their presentational garbage?

Also note Manual:$wgValidateAllHtml, but is tidy ready for HTML5?