RESTBase/StorageDesign/Short longterm

Current (low latency access)

 * 1) Storage of current revisions (most up to date render of most current revision);
 * 2) Resilient in the face of non-linearized writes; Precedence defined by revision ID and render time-stamp, not write time
 * 3) Storage of past revisions for a TTL period (at least) after they have been superseded by something newer (aka recent)
 * 4) Storage of arbitrarily old revisions (on request), for a TTL period (at least), from the time of the request (aka historical)
 * 5) 50p read latencies of 5ms, 99p of <100ms

Archive

 * 1) 99p read latencies < 200ms.

Recency
One of the requirements is for a window of recently superseded values; Values must be preserved for a predefined period after they have been replaced by something newer. This sliding window of recent history is needed to support application concurrency (see: MVCC Multiversion concurrency control ). An example use-case is Visual Editor: A user begins an edit by retrieving the HTML of the most recent revision of a document, Rn. While they are working on their edit, another user commits a change, making the most recent revision Rn+1. The first user then subsequently attempts to commit their change, requiring the Parsoid meta-data for Rn, despite it having now been superseded by Rn+1.

An open question remains regarding the latency requirements for recent data. For example: Should access by comparable to that of current? Is the aggregate of current, and a secondary lookup of comparable latency acceptable (2x current)? Is the aggregate of current and a secondary lookup of archive storage acceptable (current + (10x current))?

Short term
In the short term, this approach uses two tables per logical table, and two shared timeline tables. Both per-logical tables are identical to the current schema (and can optionally use the current table), and only differ in the TTL table having a default time to live configured. Shared timeline tables:

Writes (updates)
Read the previous value Once per render / revision: Once per logical table write: Asynchronous (and potentially probabilistic) retention policy update. Not all of the steps required each time, can only delete renders if old_rev == new_rev and only delete revisions otherwise. So maximum 2 queries.
 * 1) Read old latest value from current_and_recent
 * 1) Update the revision timeline table
 * 2) Update the render timeline table
 * 1) Write new value to current_and_recent
 * 2) If old latest value has revision id > new revision id: Write new value to TTL table (unlikely to happen)
 * 1) Select expired revision from revision_timeline table
 * 2) Select expired render from render_timeline table
 * 3) Range delete revisions from current_and_recent
 * 4) Range delete renders from current_and_recent

Reads

 * 1) The   table is consulted
 * 2) On a miss and request for specific revision / render, the   table is consulted

Pros

 * Slightly lower latency for recent render cache misses: Requests for recent renders can always be handled by the current_and_recent table, and don't fall back to the TTL table.
 * One less write per logical table than Option 2. (But 1-2 additional writes per Parsoid render. Can decrease this load with probabilistic updates, at the cost of slower GC.).

Cons

 * Corner case: A by-revision or fully qualified lookup against  can be a hit despite it being eligible for deletion. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.
 * Possible solutions:
 * Do not check the current_and_recent table for fully qualified renders?
 * Have a static column in current_and_recent indicating the currently latest revision and if the one received is not - stash it to TTL?
 * Timeline indexing is specific to Parsoid revisions, and does not readily generalize to arbitrary table schemas. We would lose the retention policy abstraction.

Long term
Long term, archival storage will replace the  table.

Writes (updates)
Once per render / revision: Once per write to a logical table: Asynchronous (and potentially probabilistic) retention policy update
 * Update the revision timeline tables
 * Write new value to current_and_recent
 * Write new value to archival storage
 * Select expired revision from revision_timeline table
 * Select expired render from render_timeline table
 * Range delete revisions from current_and_recent
 * Range delete renders from current_and_recent

Reads

 * 1) The   table is consulted
 * 2) On a miss and request for specific revision / render, the   table is consulted

Pros

 * Lower latency for recent render cache misses: Requests for recent renders can always be handled by Cassandra storage, and don't fall back to archival storage.

Cons

 * Corner case: A by-revision or fully qualified lookup against  can be a hit despite it being eligible for deletion. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.
 * Timeline indexing is specific to Parsoid revisions, and does not readily generalize to arbitrary table schemas. We would lose the retention policy abstraction.
 * Archival solution is expected to handle thinning & compression.

Short term
In the short term, this approach uses two tables per logical table. Both tables are identical to the current schema (and can optionally use the current table), and only differ in the TTL table having a default time to live configured.

Writes (updates)
Asynchronously, and potentially probabilistically:
 * 1) Read the latest render from the   table
 * 2) Batch:
 * 3) Write the value read above to the   table
 * 4) Write the updated render to the   table
 * 5) Write the updated render to the   table
 * Apply range delete for previous renders of the revision, (and for previous revisions if the  policy is used)

Reads

 * 1) The   table is consulted
 * 2) On a miss and request for specific revision / render, the   table is consulted

Pros

 * Simplicity.
 * Avoids the need to index revisions and renders by the time of their replacement
 * Avoids race conditions between timeline updates & actual render updates.
 * Generalizes to arbitrary tables and retention policies. Example: "latest" retention policy can omit archival parts.

Cons

 * One more content write than Option 1 in common case (latest revision update).
 * Corner case: A fully qualified lookup against  is a hit despite the values copied to   having since expired. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.

Long term
As in Option 1, the  table is replaced by archival storage.

Writes (updates)
Asynchronously, and potentially probabilistically:
 * 1) Write the updated render to the   table
 * 2) Write the updated render to the   table
 * Apply range delete for previous renders of the revision, (and for previous revisions if the  policy is used)

Reads
Note: Archival storage is expected to perform asynchronous thinning & compression activities in the background. These are expected to only affect renders older than 24 hours or so.
 * 1) The   table is consulted
 * 2) On a miss and request for specific revision / render, the   table is consulted

Pros

 * Simplicity.
 * Avoids the need to index revisions and renders by the time of their replacement.
 * Avoids race conditions between timeline updates & actual render updates.
 * Generalizes to arbitrary tables and retention policies. Example: "latest" retention policy can omit archival parts.

Cons

 * Archival solution has a requirement to implement asynchronous thinning & compression.
 * Read latency: Requests for old revisions / renders that are not hits in Varnish will hit archival storage, which is likely to have higher latency than current revision storage.
 * Corner case: A fully qualified lookup against  can be a hit despite the values copied to   having since expired. After the successful read, and a probabilistically applied range delete removes the record.  The likelihood of this happening can be reduced by increasing the range delete probability (at the expense of generating more tombstones, obviously).  The possibility of this occurring can not be entirely eliminated if range delete probability is < 1.0.