Topic on Talk:RESTBase/StorageDesign

Update algorithms for multi-table approaches

5
GWicke (talkcontribs)

I was a bit surprised to see a discussion of read-before-write approaches in the multi table section. The original intention with the multi table designs was to avoid wide rows, avoid race conditions, and preserve eventual consistency. Read-before-write strategies without the use of idempotent writes (explicit TIMESTAMP or value ordering) sacrifice eventual consistency across datacenters. Since updates are going to be DC-local & not idempotent, network partitions between datacenters can easily lead to the wrong version being permanently considered "latest".

I re-added idempotent writes as an update strategy option, but am wondering whether it is worth considering read-before-write strategies at all.

EEvans (WMF) (talkcontribs)

TTBMK, the semantics we are interested in require that we retain historic revisions for a period of up to a specified TTL, after they have been superseded by something newer. How do we conform to these semantics if all we are doing is writing-through on update?

For example: revision 1 is written to all 3 tables on 2018-01-01T00:00:00, a TTL of 24 hours is used. A user begins an edit at 2018-01-01T23:58:00 and attempts to save at 2018-01-02T00:05:00, (after the records have expired).

GWicke (talkcontribs)

Idempotent writes use explicit TIMESTAMP or byte ordering to let the latest revision win in an eventually consistent manner.

EEvans (WMF) (talkcontribs)

This doesn't answer my question; If writes to the TTL tables only occur on update (when the corresponding value is also written to the current table), then it will only last for the TTL period after they were created. Any access after that is subject to a miss. IOW, if the semantics are such that we keep around past versions for a period of TTL after they were superseded (which is what the current semantics are), then this will fail. If this is intentional, then what are you proposing we do on such a miss? Perform an in-line request to Parsoid to re-generate and re-store the content?

GWicke (talkcontribs)

There are three cases we need to consider:

  1. Old revision is found in storage
    1. Remaining TTL is sufficient to finish typical tasks like VE editing: Do nothing, return content.
    2. Remaining TTL is not sufficient to finish typical tasks like VE editing: Rewrite data associated with render to extend TTL (proposal in original discussion).
  2. Old revision is not found in storage: Render on demand; TTL will be sufficient.
Reply to "Update algorithms for multi-table approaches"