Requests for comment/Content API

Problem statement
With the growing popularity of mobile apps, JavaScript in the browser and moves towards fragment caching (ESI) MediaWiki's content is increasingly accessed through web APIs. The existing MediaWiki API is not optimized for high-volume content access. Per-request overhead is relatively high (20-30ms) and caching and URL rewriting is not generally possible as the URL schema is not deterministic and many end points are POST-based.

The storage service proposes a REST-style content interface for internal use. A part of this internal interface can also be used as a public content API. To make this work well, issues important for an external content API need to be considered in the design of the storage service.

Goals

 * Support high request volumes -- provide an efficient API to retrieve content from the mobile apps, ESI, bots etc..
 * Caching support -- No random query parameter URLs that cannot be purged.
 * Support rewriting -- use URL patterns that support URL-based rewriting in something like Varnish.
 * API versioning -- enable evolution of APIs without breaking users unnecessarily
 * Consistency -- use essentially the same URL scheme externally and internally. Return the same content internally and externally, and make links in content work in both contexts without extensive rewriting.

API entry points
Our page names have established URIs. Page-related sub-resources (versions) in the content API can be conveniently and intuitively exposed as subresources of the canonical article resource. Example:

Other public content that is not page-related will need other entry points. Candidates:
 * : separate entry point, does not work well for wikis without a  style prefix.
 * : Stay within the wiki namespace, but don't collide with articles as those can't start with an underscore. Works well with or without  style prefix. This looks like the best option so far.

Deterministic URIs for caching
Query strings should be deterministic so we can purge content from caches. This means that there should be exactly one query parameter. Options considered are:
 * : Sounds odd, as the key does not really match the sub-resource on the right.
 * : Would require query string key order normalization (alphabetic ordering) in caches as many client libraries don't let users control the order of parameters. Requests with missing mandatory parameters or invalid combinations are rejected. Unclear how listings would be modeled in a pure key-value model. With paths those naturally fall out of incomplete paths and the trailing slash. Harder to discover valid parameter combinations; with a path any path prefix is valid.
 * : Looks more path-y, but is longer and more noisy.
 * : Short and does not induce strange meaning like key=value. The path is a bit more broken up than the second option, but looks natural and less noisy for people used to query strings. Current favorite.

Slashes in page names
Page names can contain slashes. This complicates the use of relative links in content, especially on rename or where content fragments from several pages is combined in one output page (think Flow timelines). One option is to prefix relative links in a page called  with ../../. This is current Parsoid behavior.

Another option which we intend to move to is to make all links relative to the wiki root, and make this work by setting  in the skin. This also avoids issues with accesses to  style URIs. Setting base href is much cheaper than rewriting all hrefs in content, and allows the combination of content fragments even where that is not easily possible (ESI).

Strawman page-related API tunneled to Rashomon backend
Following the goal of using the same URL schema internally and externally, the page-related subresources can be made publicly available as: GET /wiki/Main_Page?api/v1/rev/latest/html -- returns latest html, purged on new revision / re-render

See the storage service RFC for more example URLs following the same pattern.

Strawman general content API handled by storage service backend
An example request to a public key-value bucket as mentioned in the storage service RFC:

GET /wiki/_api/v1/math-png/96d719730559f4399cf1ddc2ba973bbd.png