Requests for comment/Storage service

Request for comment (RFC)
Storage service
Component	Services
Creation date	12 December 2013
Author(s)	Gabriel Wicke
Document status	declined See Phabricator.

Tracked in Phabricator
Task T100705

RESTBase revision store prototype (previously known by working title Rashomon). Other bucket types TODO.

Problem statement[edit]

In the short term, the Parsoid team needs a way to store revisioned HTML, wikitext and metadata efficiently and reliably. This information needs to be accessible through a web API for efficient retrieval of revision data. The generality of a web interface can make this storage solution available to both Node.js code like Parsoid and PHP code in MediaWiki. A part of this storage interface can also be used as a public content API, which is discussed in a separate RFC.

As we argue in the service RFC it is also desirable to have more generalized storage interfaces available for MediaWiki in general. The revision storage service needed for Parsoid can serve as a first example of how such a more general storage interface might look.

Goals[edit]

Storage backend abstraction -- let storage specialists optimize storage, free others from having to deal with it.
Share storage implementations -- reuse the implementation of different storage solutions across applications
Extensibility -- provide extension points for content handlers like Parsoid, on-demand metadata generation etc.
Scalability -- easily add more boxes to handle growing load
Reliability -- no Single Point of Failure, cross-datacenter replication
API versioning -- enable evolution of APIs without breaking users unnecessarily
Consistency -- use essentially the same URL scheme externally and internally. Return the same content internally and externally, and make links in content work in both contexts without rewriting.

Revision storage service as a first step[edit]

While this RFC is about a more general storage service, we have been working on one part of it that we need right now: a revision storage service.

Read API[edit]

GET /v1/enwiki/page/Main_Page/rev/ -- list revisions, latest first

This returns a JSON listing of revisions. Long lists can be retrieved with a paging URI.

Each revision can have multiple timestamped properties. The very latest properties of this page can be retrieved this way:

GET /v1/enwiki/page/Main_Page/rev/latest/ -- list properties of latest revision, cached & purged

Properties include:

wikitext: a single wikitext entry per revision
html: potentially several HTML entries per revision. Each re-render on template or image update is stored with a timestamp. The timestamp lets us efficiently retrieve the page as it looked at time X in the past, including templates etc at the time.
meta: JSON page metadata like categories, global behavior switches etc; split in a static and a dynamic (template-generated) part.
parsoid: Parsoid round-trip information
<some other key>: arbitrary metadata added by extensions (blame maps, annotations etc) (See the Element ID page.)

For example, this will retrieve the latest HTML for [[Main Page]]:

GET /v1/enwiki/page/Main_Page/rev/latest/html -- returns latest html, cached & purged

Several kinds of revisions are supported. One is the familiar MediaWiki revision ID, an integer:

GET /v1/enwiki/page/Main_Page/rev/12345/ -- list properties for this MediaWiki revision

The newest HTML for a given MediaWiki revision can be retrieved with this redirect:

GET /v1/enwiki/page/Main_Page/rev/12345/html -- redirects to latest html timeuuid URI

A point-in-time query can be performed by directly passing a timestamp instead:

GET /v1/enwiki/page/Main_Page/rev/2013-02-23T22:23:24Z/html
    find revision as it was at time X, not cacheable, redirects to timeuuid UR

Internally, all entries are stored with a timeuuid. This lets us store several timestamp-ordered properties per logical MediaWiki version. It also makes point-in-time queries efficient.

GET /v1/enwiki/page/Main_Page/rev/8f545ba0-2601-11e3-885c-4160918f0fb9/html
    stable revision snapshot identified by Type 1 UUID. Immutable apart from HTML spec updates

Some details of the URI layout are currently under reconsideration, see #Resource_/_URL_layout_considerations.

Write API[edit]

POST /v1/enwiki/page/Main_Page/rev/

Atomically create a new revision with several properties.

Required post vars with example values:

_parent=Main_Page/rev/12344: The parent revision. Returned as x-parent header with regular GET requests, and part of JSON returned at /enwiki/page/Main_Page?rev/ history info.
_rev=12345: The new revision. Returned as x-rev: Main_Page?rev/12345 header with regular GET requests. Also part of history info.

Optional post vars:

_timestamp=2013-09-25T09:43:09Z: Timestamp to use in timeuuid generation. Needed to import old revisions. Should require special rights. Normal updates should use the current time.

Typical property post vars:

html, wikitext, parsoid: The html, wikitext or parsoid information of this revision. All entries that are passed in are stored atomically.
meta: The page metadata, JSON-encoded. Language links, categories etc. Divided into static (in page content) and dynamic parts (template-generated, can change on re-expansion).
Returns: JSON status with new timeuuid on success, JSON error message otherwise. Implicitly purges caches.

POST /v1/enwiki/page/Main_Page/rev/8f545ba0-2601-11e3-885c-4160918f0fb9/

Insert new (versions of) properties for the given timeuuid base revision. A new timeuuid will be generated.

Typical property post vars:

html, wikitext: The html and wikitext of this revision
meta: The page metadata, JSON-encoded. Language links, categories etc. Divided into static (in page content) and dynamic parts (template-generated, can change on re-expansion).
Returns: JSON status with new timeuuid on success or JSON error message otherwise. Implicitly purges caches.

POST /v1/enwiki/page/Main_Page/rev/12345/

Insert new (versions of) properties for the given revid base revision. Alternative form for the timeuuid-based update above. A new timeuuid will be generated.

Typical property post vars:

html, wikitext: The html and wikitext of this revision
meta: The page metadata, JSON-encoded. Language links, categories etc. Divided into static (in page content) and dynamic parts (template-generated, can change on re-expansion).
Returns: JSON status with new timeuuid on success, JSON error message otherwise. Implicitly purges caches.

PUT /v1/enwiki/page/Main_Page/rev/8f545ba0-2601-11e3-885c-4160918f0fb9/html

Destructively update a versioned property. This property needs to exist. Example use case: update HTML to the latest DOM spec. Requires elevated rights.

Returns: JSON status on success, JSON error message otherwise. Implicitly purges caches.

Resource / URL layout considerations[edit]

The design of a URI layout involves a lot of trade-offs. Some of those are discussed in more detail in these notes. Your feedback on this is more than welcome. This is a summary of the current thinking:

External API entry points[edit]

/wiki/pages/Main_Page or /wiki/_api/v1/pages/Main_Page

See the notes for more options and detail.

Page sub-resources[edit]

Page-related information like revisions or metadata are most naturally represented as sub-resources. The main issue here is that page names can contain slashes. Another issue is that URIs should be deterministic so that they can be cached.

Options:

/wiki/_api/v1/pages/Foo%2FBar/rev/latest/html: Slashes in page title percent-encoded; Regular REST path for sub-resources; Disadvantage: inconsistency with normal read URIs

/wiki/_api/v1/pages/Foo/Bar?rev/latest/html: Slashes in page title not encoded; Query string for sub-resource path; Disadvantage: ugly and somewhat atypical query string use

See the notes for details and more options.

Relative links in content[edit]

We would like to use relative links in stored content wherever possible. Page names containing slashes complicate this a bit, as normal browser behavior is to interpret relative links relative to the page name.

The current solution used by Parsoid is to prefix relative links in a page called Foo/Bar/Baz with ../../. Sadly this does not work so well when content fragments from several pages are combined in one output page, for example in Flow timelines. All links in the content would need to be rewritten so that they work with a different page name. Similar issues occur when pages are renamed.

A promising alternative is to make all links relative to the wiki root (href="Foo"), and make this work even for pages containing slashes by setting <base href="/wiki/"> in the skin. This also avoids issues with alternate path-less entry points like index.php?title=foo&.... Setting base href is much cheaper than rewriting all hrefs in content, and allows the combination of content fragments even where that is not easily possible (ESI).

Revision storage service implementation[edit]

Front-end: RESTBase[edit]

We implemented a Node.js-based HTTP service called RESTBase. This stateless server runs on each storage node and load balances backend requests across storage backend servers. The Node server processes use little CPU and can sustain thousands of requests per second. Clients can connect to any server they know about, which avoids making these servers a single point of failure or a bottleneck.

The current implementation is fairly basic and does not yet provide desirable features like authentication. It does however provide the basic revision storage functionality and lets us start storing HTML and metadata soon.

First supported backend: Cassandra[edit]

In the MediaWiki setup at the Wikimedia foundation, the wikitext of revisions is stored in ExternalStore, a blob store based on MySQL. As a pure key-value store ExternalStore relies on external data structures to capture revision information, for example in the MySQL revision table. This complicates storage management tasks like the grouped compression of consecutive revisions. The use of MySQL makes it relatively difficult to make both indexing and ExternalStore highly available without a single point of failure. Reads and writes of current revisions are not evenly spread across machines in the cluster, which is not ideal for performance.

After considering Riak and HBase, we investigated and tested Cassandra as an alternative backend storage solution with good results. Features of Cassandra include:

Symmetric DHT architecture based on Dynamo paper without single point of failure
Local storage based on journaling with log structured merge trees similar to LevelDB or BigTable with compression support. An import of an enwiki wikitext dump compresses to approximately 16% of the input text size on disk, including all index structures.
Scalable by adding more boxes, automatically distributes load and uses all machines for reads/writes
Replication support with consistency configurable per query; rack and datacenter awareness

Performance in write tests using three misc servers and spinning disks was around 900 revisions per second, which is well beyond production requirements. Stability of the new Cassandra 2.0 branch is on track for production use in January. Overall this let us choose Cassandra as the first storage backend we support. The storage service interface makes it straightforward to add or switch to different backends in the future without clients having to know about it.

Generalizing the storage service with other bucket types[edit]

In addition to a revision storage bucket as implemented by RESTBase, other types with different characteristics and features can be added. One of those types is a simple key-value store without versioning, very similar to what is proposed in the DataStore RFC. Other specialized bucket types could include a key-value store with support for range queries, counters, queues or time series.

Create a simple blob bucket:

PUT /v1/enwiki/math-png
Content-type: application/json; spec=mediawiki.org/specs/bucket/1.0

{'type': 'blob'}

Get Bucket properties

GET /v1/enwiki/math-png
Content-type: application/json; spec=mediawiki.org/specs/bucket/1.0
{'type': 'blob'}

Add an entry to a bucket:

PUT /v1/enwiki/math-png/96d719730559f4399cf1ddc2ba973bbd.png
Content-type: image/png

Fetch the image back:

GET /v1/enwiki/math-png/96d719730559f4399cf1ddc2ba973bbd.png

List bucket contents:

GET /v1/enwiki/math-png/ -- returns a JSON list of 50 or so entries in random order, plus a paging URL
=>
Content-type: application/json; spec=mediawiki.org/specs/bucketlisting/1.0

{ .. }

Similarly, other bucket types can be created. Example for a bucket that supports efficient range / inexact match queries on byte string keys and a counter:

// Create an ordered blob bucket
PUT /v1/enwiki/timeseries
Content-Type: application/json; spec=mediawiki.org/specs/bucket/1.0
{ 'type': 'ordered-blob' }
// Add an entry
PUT /v1/enwiki/timeseries/2012-03-12T22:30:23.56Z-something
// get a list of entries matching an inequality
GET /v1/enwiki/timeseries/?lt=2012-04&limit=1
// range query
GET /v1/enwiki/timeseries/?gt=2012-02&lt=2012-04&limit=50

Another example, this time using a counter bucket:

// Create an ordered blob bucket
PUT /v1/enwiki/timeseries
Content-Type: application/json; spec=mediawiki.org/specs/bucket/1.0
{ 'type': 'counter' }
// Read the current count
GET /v1/enwiki/views/pages
// Increment the counter, optionally with an increment parameter
POST /v1/enwiki/views/pages

Notes:

Access rights and content-types can be configured by bucket. Entries in public buckets are directly accessible to users able to read regular page content through the public web api: GET /v1/wiki/math-png/96d719730559f4399cf1ddc2ba973bbd.png
Paging through all keys in a bucket is possible with most backends, but is not terribly efficient.
The ordered-blob type can be implemented with secondary indexes or backend-maintained extra index tables.

API versioning[edit]

We support two different mechanisms for API versioning:

A coarse global API version as in /v1/. This is only incremented when the URI layout or high-level API style changes completely. Everything below /v1/ might work differently after an increment.
Fine-grained per-resource spec negotiation using the Content-Type. We opt to use a content-type parameter ('spec') for versioning instead of the main content-type itself: Content-Type: application/json; spec=mediawiki.org/specs/bucket/1.0
The main reasons for this choice are:
- The main content type works as-is in browsers and clients
- The spec URI is self-describing and provides easy access to the actual specification
- Varnish processing is relatively simple and generic:
  - We use Vary: Accept
  - Any incoming Accept header without a spec parameter that starts with mediawiki.org/specs/ is mapped to the empty string; those with that parameter are order-normalized and then forwarded.
- The content is also available without a specific Accept header, in which case the latest spec version of the resource is returned.

Background[edit]

Evolving HTTP APIs by Mark Nottingham

Related RFCs[edit]

RFC: Services and narrow interfaces -- Making the case for service-oriented architecture
RFC: Content API -- Public REST content API based on the storage service discussed in this RFC
RFC: PHP Virtual REST Service -- VFS-like bindings from PHP to other services

Problem statement[edit]

Goals[edit]

Revision storage service as a first step[edit]

Read API[edit]

Write API[edit]

Resource / URL layout considerations[edit]

External API entry points[edit]

Page sub-resources[edit]

Relative links in content[edit]

Revision storage service implementation[edit]

Front-end: RESTBase[edit]

First supported backend: Cassandra[edit]

Generalizing the storage service with other bucket types[edit]

API versioning[edit]

Background[edit]

Related RFCs[edit]

See also[edit]