Requests for comment/Content API

Problem statement
With the growing popularity of mobile apps, JavaScript in the browser and moves towards fragment caching (ESI) MediaWiki's content is increasingly accessed through web APIs. The existing MediaWiki API is not optimized for high-volume content access. Per-request overhead is relatively high (20-30ms) and caching and URL rewriting is not generally possible as the URL schema is not deterministic and many end points are POST-based.

The storage service RFC proposes a REST-style content interface for internal use. A part of this internal interface can also be used as a public content API. To make this work well, issues important for an external content API need to be considered in the design of the storage service.

Goals

 * Support high request volumes -- provide an efficient API to retrieve content from the mobile apps, ESI, bots etc..
 * Caching support -- no random query parameter URLs that cannot be purged.
 * Support rewriting -- use URL patterns that support URL-based rewriting in something like Varnish.
 * API versioning -- enable evolution of APIs without breaking users unnecessarily
 * Consistency -- use essentially the same URL scheme externally and internally. Return the same content internally and externally, and make links in content work in both contexts without extensive rewriting.

Resource / URI layout considerations
The design of a URI layout involves a lot of trade-offs, which are discussed in more detail in these notes. Your feedback on this is more than welcome. This is a summary of the current thinking:

API entry point
or

See the notes for more options and detail.

Page sub-resources
Page-related information like revisions or metadata are most naturally represented as sub-resources. The main issue here is that page names can contain slashes. Another issue is that URIs should be deterministic so that they can be cached.

Main options:
 * Slashes in page title not encoded
 * Query string for sub-resource path
 * Query string for sub-resource path


 * Slashes in page title encoded
 * Regular REST path for sub-resources
 * Main disadvantage: Breaks relative links from content within the API
 * Main disadvantage: Breaks relative links from content within the API

See the notes for details and more options.

Relative links in stored and rendered content vs. URIs
We would like to use relative links in stored content wherever possible. Page names containing slashes complicate this a bit, as normal browser behavior is to interpret relative links relative to the page name.

The current solution used by Parsoid is to prefix relative links in a page called  with ../../. Sadly this does not work so well when content fragments from several pages are combined in one output page, for example in Flow timelines. All links in the content would need to be rewritten so that they work with a different page name. Similar issues occur when pages are renamed.

A promising alternative is to make all links relative to the wiki root, and make this work even for pages containing slashes by setting  in the skin. This also avoids issues with alternate path-less entry points like. Setting base href is much cheaper than rewriting all hrefs in content, and allows the combination of content fragments even where that is not easily possible (ESI).

Strawman page-related API tunneled to Rashomon backend
Following the goal of using the same URL schema internally and externally, the page-related subresources can be made publicly available as: GET /wiki/::1/pages/Main_Page?rev/latest/html -- returns latest html, purged on new revision / re-render

See the storage service RFC for more example URLs following the same pattern.

Strawman general content API handled by storage service backend
An example request to a public key-value bucket as mentioned in the storage service RFC:

GET /enwiki/::1/pages/Main_Page?rev/latest/html GET /enwiki/::1/math-png/96d719730559f4399cf1ddc2ba973bbd.png