RESTBase/StorageDesign

Retention policies using application-level TTLs
This approach uses a schema identical to that of the current storage model, one that utilizes wide rows to model a one-to-many relationship between a title and its revisions, and a one-to-many relationship between each revision and its corresponding renders. It differs only in how it approaches retention.

Since renders are keyed on a type-1 UUID, maintaining a single current render, and (at least) 24 hours worth of past renders, is as simple as batching a range delete on new renders using a  predicate 24 hours less than the one being inserted.

Limiting renders is more challenging, since the revision is a monotonically increasing integer, without temporal context. As a result, an additional table is needed to establish this relationship, mapping timestamps to corresponding revisions. Records in the  table are keyed by domain (on the assumption that mediawiki sharding would never be more granular than this). Updates can be performed probabilistically, if necessary. TTLs can be applied to prevent unbounded partition growth.

Strawman Cassandra schema:

Sample queries:

Properties of table
Only one  table is needed, for all logical/RESTBase tables (e.g. parsoid html, data, and section offsets, mobileapps, etc).


 * The number of domains is on the order of 100s
 * Most domains have an edit frequency on the order of 100s per day
 * 4 see an edit frequency on the order of 10s of K per day
 * 2 see an edit frequency on the order of 100s of K per day
 * Distribution on this table will be poor, but does it matter?

Table-per-query
This approach materializes views of results using distinct tables, each corresponding to a query.

Queries

 * The most current render of the most current revision (table: )
 * The most current render of a specific revision (table: )
 * A specific render of a specific revision (table: )

Algorithm
Data in the  table must be durable, but the contents of   and   can be ephemeral (should be, to prevent unbounded growth), lasting only for a time-to-live after the corresponding value in   has been superseded by something more recent. There are two ways of accomplishing this, either by a) copying the values on a read from, or b) copying them on update, prior to replacing a value in. Neither of these strategies are ideal.

For example, with non-VE use-cases, copy-on-read is problematic due to the write-amplification it creates (think: HTML dumps). Additionally, in order to fulfill the VE contract, the copy must be done in-line to ensure the values are there for the forthcoming save, introducing additional transaction complexity, and latency. Copy-on-update over-commits by default, copying from  for every new render, regardless of the probability it will be edited, but happens asynchronously without impacting user requests, and can be done reliably. This proposal uses the copy-on-update approach.

Update logic pseudo-code:

Option 1a
Precedence is first by revision, then by render; The current table must always return the latest render for the latest revision, even in the face of out-of-order writes. This presents a challenge for a table modeled as strictly key-value, since Cassandra is last write wins. As a work around, this option proposes to use a constant for write-time, effectively disabling the database's in-built conflict resolution. Since Cassandra falls back to a lexical comparison of values when encountering identical timestamps, a binary value encoded first with the revision, and then with a type-1 UUID is used to satisfy precedence requirements.

Strawman Cassandra schema:

Issues/Drawbacks

 * Breaks  semantics (without timestamps tombstones do not have precedence)
 * Defeats a read optimization designed to exclude SSTables from reads (optimization relies on timestamps)
 * Defeats a compaction optimization meant to eliminate overlaps for tombstone GC (optimization relies on timestamps)
 * Is an abuse of the tie-breaker mechanism
 * Lexical value comparison only meant as a fall-back for something considered a rare occurrence (coincidentally identical timestamps)
 * Lexical value comparison is not part of the contract, could change in the future without warning (has changed in the past without warning)
 * Cassandra semantics are explicitly last write wins; This pattern is a violation of intended use/best-practice, and is isolating in nature

Option 1b
Identical to the 1a proposal above, with the exception of how the  table is implemented; In this approach,   is modeled as "wide rows", utilizing a revision-based clustering key. For any given, re-renders result in the   and   attributes being overwritten each time. To prevent unbounded grow of revisions, range deletes are batched with the.

Strawman Cassandra schema:

Example: Batched INSERT+DELETE

Issues/Drawbacks

 * Creates a hard dependency on Cassandra 3.x