RESTBase/StorageDesign/Short longterm

Current (low latency access)

 * 1) Storage of current revisions (most up to date render of most current revision);
 * 2) Resilient in the face of non-linearized writes; Precedence defined by revision ID and render time-stamp, not write time
 * 3) Storage of past revisions for a TTL period (at least) after they have been superseded by something newer (aka recent)
 * 4) Storage of arbitrarily old revisions (on request), for a TTL period (at least), from the time of the request (aka historical)
 * 5) 50p read latencies of 5ms, 99p of <100ms

Archive

 * 1) 99p read latencies < 200ms.

Recency
One of the requirements is for a window of recently superseded values; Values must be preserved for a predefined period after they have been replaced by something newer. This sliding window of recent history is needed to support application concurrency (see: MVCC Multiversion concurrency control ). An example use-case is Visual Editor: A user begins an edit by retrieving the HTML of the most recent revision of a document, Rn. While they are working on their edit, another user commits a change, making the most recent revision Rn+1. The first user then subsequently attempts to commit their change, requiring the Parsoid meta-data for Rn, despite it having now been superseded by Rn+1.

An open question remains regarding the latency requirements for recent data. For example: Should access by comparable to that of current? Is the aggregate of current, and a secondary lookup of comparable latency acceptable (2x current)? Is the aggregate of current and a secondary lookup of archive storage acceptable (current + (10x current))?

Short term
In the short term, this approach uses two tables per logical table, and two shared timeline tables. Both per-logical tables are identical to the current schema (and can optionally use the current table), and only differ in the TTL table having a default time to live configured. Shared timeline tables:

Updates on page edit
Write to a logical table Asynchronous (and potentially probabilistic) retention policy update
 * 1) Update the revision timeline table
 * 2) Update the render timeline table
 * 1) Read old latest value from current_and_recent
 * 2) Write new value to current_and_recent
 * 3) If old latest value has revision id > new revision id: Write new value to TTL table
 * 1) Select expired revision from revision_timeline table
 * 2) Select expired render from render_timeline table
 * 3) Range delete revisions from current_and_recent
 * 4) Range delete renders from current_and_recent

Reads

 * 1) The   table is consulted
 * 2) On a miss, the   table is consulted

Pros

 * Slightly lower l latency for recent render cache misses: Requests for recent renders can always be handled by the current_and_recent table, and don't fall back to the TTL table.
 * One less write per logical table than Option 2. (But 1-2 additional writes shared across tables).

Cons

 * The shared timeline introduces a requirement to synchronize updates across services. For example, a format migration in the mobile HTML output would force a re-render & render update of the underlying Parsoid HTML.
 * Corner case: A fully qualified lookup against  can be a hit despite it being eligible for deletion. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.

Long term
Long term, archival storage will replace the TTL table.

Updates on page edit
Write to a logical table Asynchronous (and potentially probabilistic) retention policy update
 * Update the revision timeline tables
 * Write new value to current_and_recent
 * Write new value to archival storage
 * Select expired revision from revision_timeline table
 * Select expired render from render_timeline table
 * Range delete revisions from current_and_recent
 * Range delete renders from current_and_recent

Reads

 * 1) The   table is consulted
 * 2) On a miss, the   table is consulted

Pros

 * Lower latency for recent render cache misses: Requests for recent renders can always be handled by Cassandra storage, and don't fall back to archival storage.

Cons

 * The shared timeline introduces a requirement to synchronize updates across services. For example, a format migration in the mobile HTML output would force a re-render & render update of the underlying Parsoid HTML.
 * Corner case: A fully qualified lookup against  can be a hit despite it being eligible for deletion. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.

Short term
In the short term, this approach uses two tables per logical table. Both tables are identical to the current schema (and can optionally use the current table), and only differ in the TTL table having a default time to live configured. The first of the two tables uses (probabilistic) deletes of all previous revisions and/or renders on update in order to maintain a view of current versions. The second table uses Cassandra TTLs to automatically expire records and stores recently superseded values, along with any historical values that had to be generated in-line.

Writes (updates)
Asynchronously, and potentially probabilistically:
 * 1) Read the latest render from the   table
 * 2) Batch:
 * 3) Write the value read above to the   table
 * 4) Write the updated render to the   table
 * 5) Write the updated render to the   table
 * Apply range delete for previous renders of the revision, (and for previous revisions if the  policy is used)

Reads

 * 1) The   table is consulted
 * 2) On a miss, the   table is consulted

Pros

 * Simplicity.
 * Avoids the need to index revisions and renders by the time of their replacement
 * Avoids race conditions between timeline updates & actual render updates.

Cons

 * One more write than Option 1 in common case (latest revision update).
 * Corner case: A fully qualified lookup against  is a hit despite the values copied to   having since expired. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.

Long term
As in Option 1, the recent_and_historical table is replaced by archival storage.

Writes (updates)
Asynchronously, and potentially probabilistically:
 * 1) Write the updated render to the   table
 * 2) Write the updated render to the   table
 * Apply range delete for previous renders of the revision, (and for previous revisions if the  policy is used)

Reads
Note: Archival storage is expected to perform asynchronous thinning & compression activities in the background. These are expected to only affect renders older than 24 hours or so.
 * 1) The   table is consulted
 * 2) On a miss, the   table is consulted

Pros

 * Simplicity.
 * Avoids the need to index revisions and renders by the time of their replacement.
 * Avoids race conditions between timeline updates & actual render updates.

Cons

 * Archival solution has a requirement to implement asynchronous thinning.
 * Read latency: Requests for old revisions / renders that are not hits in Varnish will hit archival storage, which is likely to have higher latency than current revision storage.
 * Corner case: A fully qualified lookup against  can be a hit despite the values copied to   having since expired. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.