Flow/Architecture/Memcache

In addition to sharding the database, flow will take full advantage of the memcache infrastructure available. We should also take into consideration Redis. Both caches have their benefits and we can likely benefit from both. For most general key/value operations i suggest using memcache. but there are operations that could benefit greatly from structures in redis like the sorted sets implementation[1].

What can we cache?
Everything. The reverse-proxy cluster serving up most of the wiki content to visitors does not apply to editors. To provide the editors the responsiveness they deserve we must aggressively cache within the application. To this end it may be usefull to add a small extension(or hopefully it already exists and just needs to be configured?) to the current profiling to make it obvious how many queries are running specifically against the flow database shards, rather than the full gamut, from the web pages.

Reads
All reads in flow must hit the memcache infrastructure first. All queries must be answerable by a key/value store. By providing that guarantee we can utilize CAS transactions within memcache to read/verify/write data back to memcache ensuring sequential access. This scheme does not fully protect against Race conditions involving multiple keys. In scenarios where it is a problem we can utilize Redis along with LUA scripts to atomicly adjust multiple keys(although that only works against a single redis instance, so the keys must be sharded to the same instance. The current production configuration involves 16 servers each running memcache and redis).

There must still be one source of truth within the memcache cluster. Basically that means per database id there should be one matching key in memcache containing its row content. Other query answers should be stored as an id, a list of ids, or perhaps some other structures but still containing ids. There are likely exceptions to this.

We must strive to fetch multiple keys in single requests. Heavy usage of memcached could mean quite a few round trips. We should minimize that were possible when it doesn't overly complicate the code. In that vein ops has recently enabled twemproxy on the foundation web servers. All memcached requests will go through a twemproxy which enables proxying multiple client connections onto one or few server connections. This architectural setup makes it ideal for pipelining requests and responses and hence saving on the round trip time.

Writes
For the most part wiki's dont delete anything. Even things that are deleted are only really hidden from view except in specific cases. This greatly simplifies our task of caching, we simply need to cache data that is guaranteed not to change. We must tailor our data model ideas to a write-once scenario. For the small bits of data that can be changed we need to use CAS to read the value from memcache, update it as necessary, and write it back to memcache.

We must write to memcached before we write to the database. Writes to memcache are fast. We are already reading everything from memcache. That means writing to database first would provide more time for stale data to be read, and for user actions to be performed against that stale data. By writing to memcache not only when it is requested and not found, but pre-filling with all appropriate data we minimize the time stale data exists and provide editors with a very responsive backend.

So how does this actually work?
Yea, i'm not sure either. There are many difficult questions glossed over, which I'm sure your thinking right now. Tackle them as we see them.

Redis Sorted Sets
The foundation has recently added redis to the set of servers powering the WMF. Redis has a structure they call a sorted set. Each sorted set has a value and a score. The scores are numbers, it accepts double precision floating point scores. Possible values to use as a score are a timestamp or row number within a query result. The value can be any string, ids are useful.

The set can be queried in a variety of ways, for example executing a query that may only display 20 items could query the database for 100 and store them in the sorted set. The next requested page can be retrieved by issuing 'ZRANGE myzset 20 20' which has favorable big O complexity and fairly memory efficient.

In a sample test i created 100k sorted sets each with 100 members. The scores used were 64 bit integers, and stored value was also a 64 bit integer. This utilized 240M VIRT, 198M RSS, which gives us a density of perhaps 400k users each with 100 flow uuid's sorted by time in 1GB of memory.

Redis LUA
Among its many features, Redis embeds a Lua interpreter on the server side. It is possible that we could store text representations of the discussion trees in redis and perform manipulations like adding a reply in lua to provide atomicity to updates without retries. This may also be a premature optimization and will not be initially used.