Page Previews/API Specification

Up until now, we've mostly gotten away with using the  MediaWiki API provided by TextExtracts and RESTBase to allow us to scale out Page Previews to a couple of large Wikipedias without issue.

However, the requirement that certain classes of pages should be handled differently means that TextExtracts is no longer the most appropriate place to house the notion of what a page preview. We should aim to keep TextExtracts as simple and as general as possible. It may be that we compose the  API and the new Page Preview API rather than integrating them but this is not a goal of this work.

To be clear, the primary goal of this work is to minimise the amount of text/HTML processing in the Page Previews client: the less work the client has to do to display a preview, the better.

Intros
The Page Preview API returns well-formed HTML5 representing the introductory elements of a page, which are defined as follows: Herein we'll refer to these elements as an "intro".
 * The first paragraph from the introductory section.
 * The first ordered, unordered, or definition list that is the next sibling of the first paragraph.

Generic intros
The notion of a "generic" preview was introduced early on in the rewrite of Page Previews (T151054).

A generic preview should be shown when Page Previews cannot generate a meaningful intro for a page, even though may have meaningful content.

Markup allowed in an intro
The Page Preview must strip all tags from the intro apart from the following exceptions.

Emphasis
The Page Preview API must retain any bolded or italicised text in the intro, i.e. the Page Preview API must not strip,  , and   tags.

Formulae/MathML
In order to support browsers that don't support MathML, the Page Preview API:
 * 1) Must strip   tags; and
 * 2) Must not strip either the inline or block layout fallback images generated by #math while parsing the page.

Stripping of parenthetical statements
The Page Preview API must strip all content enclosed within balanced parentheses.

Responses
A successful response from the Page Preview API must have the following properties: Where an  type property must have the following properties:

For a page in the wiki's content namespace(s)
The Page Preview API must respond with 200 OK.

The  property of the response must be set to.

For a page outside of the wiki's content namespaces
The Page Preview API must respond with 200 OK.

The  property of the response must be set to.

The  and   properties of the response must not be set.

For a disambiguation page
The Page Preview API must respond with 200 OK.

The  property of the response must be set to.

The intro property of the response should be set to the intro of the page so that the client may display it if appropriate.

For a page that doesn't exist
The Page Preview API must respond with 404 Not Found.

The response body must be empty.

For a page that redirects to another page
The Page Preview API must respond with 302 Found.

The  HTTP header must be set to the URL that will get the intro for the target page.

The response body must be empty.

For a page that doesn't have an intro section
The Page Preview API must respond with 200 OK.

The  property of the response must be set to.

The  property of the response must be set to.