HTML5

From MediaWiki.org
Jump to: navigation, search

A page for ideas about what features of HTML5 we can use in MediaWiki, assuming we end up switching to it. See the spec. HTML5 was first enabled in MediaWiki by default in r53034 (July 10, 2009), and was enabled on Wikimedia sites on November 12. Then immediately disabled again because it caused problems with XMLHttpRequest not liking named entities. Yay. As of r59741, that bug should be fixed, and HTML5 was enabled again on Wikimedia wikis in 2011-02-23. Then disabled shortly thereafter, again.

Contents

[edit] FAQ about MediaWiki use of HTML5

What exactly do you mean by HTML5?
This page only discusses using HTML5 for our static HTML markup, instead of XHTML 1.0 Transitional. Although HTML5 includes JavaScript APIs (getElementsByClassName(), drag-and-drop, etc.) and other things too, using these is entirely uncontroversial and isn't covered here.
Why should we use HTML5?
First, HTML5 is the next HTML standard. All new HTML features will be added there, and we'll have to use it eventually if we want to take advantage of them. Currently MediaWiki uses XHTML 1.0, which became a W3C Recommendation in January 2000 – eleven years ago. There are some modest immediate benefits to using HTML5 already (see #Already done and #Short term below), so we may as well switch sooner rather than later.

Second of all, more idealistically, a major goal of HTML5 is to advance the open web by supplanting proprietary technologies like Flash and Silverlight. It goes to great lengths to reduce the need for closed-source and vendor-locked software by introducing elements like <video> and <canvas>. It also aims to drastically lower the bar to creating a new browser by specifying huge amounts of behavior that previously had to be painstakingly reverse-engineered from existing browsers. All of this accords very closely with Wikimedia's values of "thriving open formats and open standards on the web", and both MediaWiki and Wikipedia should do what they can to be trendsetters and help advance these goals.

Isn't HTML5 unfinished?
HTML5 is a Draft Standard at the WHATWG, and is a late Working Draft at the W3C. Large parts of HTML5 are stable, and there's no reason not to use those parts. See the WHATWG FAQ entry "When will HTML5 be finished?". Part of the purpose of this page is to identify which parts of the spec are currently usable (and useful to us) in practice.
But HTML5 is tag soup!
HTML5 doesn't require XML well-formedness – e.g., you can omit attribute quote marks – but it does permit it. MediaWiki currently still outputs well-formed XML by default. This means that by default, you can still (modulo bugs) parse MediaWiki pages using XML libraries, transform them via XSLT, etc. MediaWiki administrators who want to reduce the size of output HTML can disable $wgWellFormedXml. When HTML5 has been around for a while and HTML5 parsing libraries are as prevalent as XML parsing libraries, this benefit might not be so compelling anymore.
So MediaWiki outputs XHTML5?
According to the HTML5 spec, XHTML5 must be output with an XML MIME type. If you configure MediaWiki and/or your server to serve an XML MIME type, then it's possible you can get it to serve XHTML5, because XHTML5 is fairly close to a subset of HTML5, and it's possible by design to create documents that can be served as either HTML5 or XHTML5. However, for practical purposes, people don't use XML MIME types much, so normally MediaWiki won't output XHTML5, but rather HTML5 that happens to be well-formed XML (if $wgWellFormedXml is true). This gives most of the advantages of XML (such as they are), but doesn't cause minor output bugs or typos in system messages to break the site.
Will MediaWiki continue to support XHTML 1.0 Transitional?
Currently MediaWiki trunk still supports XHTML 1.0 Transitional (just disable $wgHtml5). It's not clear how long non-HTML5 output formats will remain supported, but XHTML 1.0 remains supported for the 1.16 release, and possibly many more after that if there's demand for it.

[edit] Already done

Useful things that have already been at least partially implemented in trunk, and require $wgHtml5 to be on.

  • HTML5 form attributes: required, pattern, etc. Maybe also new input types like date. Started in r54567.
  • Remove useless elements and attributes: <head>, type="" on script/style. Started in r54695, lots more done in later revisions but lots more still to do.
  • Use the spellcheck attribute where useful. r59360 does this on edit summaries.

[edit] Short term

Things we can start doing as soon as we like without too much effort, without harmful side effects.

  • Support <video>/<audio> without JavaScript.
    • Need to confirm fallback behavior on Safari w/o XiphQT, to make sure the fallback content [eg, Java player] is displayed correctly.
      • Should be possible to do with JavaScript. Currently we require JavaScript to get videos to work on any browser, so this will be a step forward! At least Firefox and Chrome won't need it.
    • See Extension:TimedMediaHandler ( it does not output the object tag or java cortado as a child of the video tag, but we could add that in ).
  • Use data-* if that's useful anywhere (HTML diffs used something like this at some point before removing them for XHTML validity reasons).
  • Use more comprehensible HTML IDs. Currently we restrict them to forms acceptable to HTML4, meaning a subset of ASCII, which results in horrible stuff for foreign languages (or even punctuation in English) with lots of dots and hex codes when we auto-generate header anchors. HTML5 says IDs can be any Unicode string that doesn't contain whitespace, which would allow us to output much nicer-looking IDs. (XHTML1 is more generous than HTML4 as well, but significantly less generous than HTML5: it doesn't allow most punctuation.)
    • Things to be careful of: what do browsers accept in practice? (Need to test anchors in links, in HTTP redirects, in CSS selectors, in JavaScript, anything else?) Should we rule out some ASCII characters for the sake of better compatibility with, e.g., CSS? (IDs with "." or ">" or such will require escaping for use in CSS.) How should we handle backward compatibility with existing links from external sources?
    • The code for this is largely done (grep for $wgExperimentalHtmlIds), but can't be enabled until at least Opera 10.10 becomes irrelevant.

[edit] Medium term

Things that will take more care or effort, or will require broader browser support to be useful.

  • Remove closing tags, attribute quotes, etc. Need to be careful: breaking XML well-formedness might break bots.
    • The potential benefit here is reduced HTML output size, thus reduced bandwidth usage and faster page loads. (Hypothetically, inconsistent use of quotes might increase gzipped HTML size, but testing on a sample page gave 4061 bytes gzipped when always using quotes and 4045 when not.) Gains will be moderate, but they add up. The downside noted is that client-side tools doing UI screen scraping with an XML parser would fail; scrapers would need to use a proper HTML parser instead or move to using the API... worst case would be that they switch to regex-based scraping. :)
    • Also note that some devs have expressed strong objections to moving away from well-formed XML. This will need some arguing over.  :)
    • This is now controlled by the $wgWellFormedXml setting, defaults to true (keep outputting well-formed XML).
  • Embed MathML and SVG inline, at least for some users. We'd have to be very careful about sanitizing this to avoid XSS ― especially in the case of a browser that doesn't support inline MathML/SVG, and so will treat the contents of the tags as HTML. (We could do this with XHTML 1 too, but only if we serve content with an XML MIME type, which we probably don't want to if we can avoid it. So it would be more convenient with HTML5.)

[edit] Long term

Things that we can't do without more browser support. Not much point in working too much on this; too much will depend on how browser development progresses.

  • Start using semantic HTML5 tags like <article>,
Personal tools
Namespaces
Variants
Actions
Site
Support
Download
Development
Communication
Print/export
Toolbox