Core Platform Team/Initiatives/Mainstash Multi-DC
In MW there is an abstract interface for KV called BagOStuff, Mainstash is an implementation of this BagOStuff currently backed by Redis, though Mainstash itself is agnostic of the storage engine used.
In order to migrate our infrastructure to an active-active multi-DC context we will migrate from using Redis to Cassandra as the Key-Value store as major issue with Redis is that it does not support multiple DCs as there is no means for replication to other DCs.
The core work here is determining how the routing of the data accessed via Mainstash should be changed from Redis to Cassandra. The configuration should be changed to point to an entity that is Multi DC capable.
There are at least 3 options:
- Access Cassandra directly from PHP. Cassandra driver for PHP is just a wrapper around the C++ driver, which means deploying and maintaining native C++ code. While there is a minor penalty having a service atop Cassandra, what is an unknown is PHP’s connection management. We could either be creating a new connection to Cassandra for every request or we would have to have a pool of connections that are constantly open to Cassandra. In the latter case, this is paradigmatically the same as having a service atop Cassandra. Equally, Cassandra’s can persist connections.
- Kask. There would need to be some small changes in Kask, for example, Kask should be able to accept a TTL - in terms of developer resources the changes overall are relatively trivial
- Using current RestBase-Cassandra infrastructure, use the API exposed by the to-be-built RestBase storage service that will exist after the RestBase split has been completed.
- Significance and Motivation
In the future, our infrastructure will migrate to an active-active DC configuration and this project is in service of that initiative.
Previously, when changing over DCs, for Mainstash it was necessary to place MediaWiki in Read Only then copy over the contents found in Redis and finally stop Read Only mode. This is impractical because A) It can take some time to copy and move the content B) in the case of an emergency where a switch is needed all the caches are lost which causes dramatic performance degradation. This is true for Multi-DC as a whole but also pertains to this specific case.
The original purpose of Mainstash was for session data, however, its uses have grown and it is now used by many application as a general purpose KV cache. Though each entity manages its own keys there is no clear separation within Mainstash. There is nothing that prevents one application from reading or writing any key that is stored within. At present there is a convention that keys are prefixed with the name of your entity or extension.
Over 90% of the data within Mainstash is currently Echo notifications, there is a caveat in that while 90% of keys are Echo-related, that doesn't mean that other usages are negligible. Echo dominates Redis because it uses it as a permanent storage, so at any given point in time, it will have the highest number of keys. However, that doesn't mean that other usages don't use Redis a lot, as they expire keys on a regular basis.
Echo uses Mainstash in an unusual way, it is considered in part as a source truth which is inconsistent with the behaviour of Redis in that it evicts keys. In some cases, on MW sites the notifications that have been read are not correctly marked as read because the data stored in Redis corresponding to the read is lost. Last seen is stored in Mainstash possibly in part for performance reasons.
Continuous replication is made unreliable under Redis due to both its implementation of TTL as well as its internal management of key removal from cache. The latter can be done manually but when Redis reaches a threshold within range of its memory limit it will begin evicting keys. So even though you are replicating you can end up with different keys in differents DCs, so continuous replication is not possible. There are also some edge case security concerns - after replication if a delete is sent but this info is lost and not replicated, theoretically someone could retrieve that data from the other DC, though in practice this is extremely difficult because at the first milestones writes will still only go to one DC. The same Redis that serves Mainstash also serves session data, we can’t guarantee both security and consistency.
- Specification of Mainstash to Cassandra interface
- Implementation of Mainstash to Cassandra interface
- Mainstash in production is configured to access a Cassandra Key-Value store before the next DC change over(End of October 2019)
- Baseline Metrics
- Target Metrics
- All services currently using Mainstash as a storage location, we should break this out into specific groups and contacts.
- Known Dependencies/Blockers
- We may need to coordinate work with owners of components using Mainstash. Most notably, since Echo uses it as a permanent storage solution, we need to work with them on a migration strategy, cf. https://phabricator.wikimedia.org/T222851 .
Epics, User Stories, and Requirements
- Persistent data storage (not LRU or other space-bounded caching)
- General-case storage
- Specific requirements for particular components will be handled separately (e.g. session storage)
- Most components cannot be configured to use different storage
- Key-value storage matching BagOStuff interface
- key type: string
- value type: string, integer
- Client-set time-to-live (TTL)
- Atomic increment
- Set-if-not-exists (setnx, add)
- lock/unlock keys
- get/set/delete batch
- Multi-data-centre access
- read-write in primary data centre
- read-only in secondary data centre(s)
- a single Web request uses only one data centre
- access to different data centres may occur in the same Web session
- All requests for write HTTP methods ( (POST/PUT/DELETE) go to primary data centre
- "datacenter_preferred" cookie provides ~10s affinity for primary data centre after write request
- Synchronous access for web requests
- 10^0 OOM reads per request
- 10^0 OOM writes per request
- Current single-DC solution is Redis (10^0 OOM get, 10^2 OOM set)
- Deployment without a noticeable change in performance for end-users
- Minimize code changes required for components of MediaWiki or extensions that access main stash
- add new BagOStuff subclass for this storage
- Configuration for this storage
- Hot migration to new storage
- Retain current stored values and TTLs
- Which of the available options is optimal in the short and medium term?
- How do we ensure we have consistency of access across data storage use-cases?
- Does this work necessitate a deeper conversation about our intentions for storage long term or can that conversation happen in parallel with this work?
- How will data Echo/Notification expects to be persistent be handled in the future?
- Are there clear usecases for arbitrary TTLs?
- How does this affect our compliance with the PII retention policy?
- Redis data was considered ephemeral, meaning it did not strictly contreven the policy. Ideally, we'd be able to reason about the retention of data in a given namespace (table in Casssandra), so we should definitely think carefully before trading that away for arbitrary TTLs.
- Mainstash presently does not logical separate data into strict namespaces. We can follow the same pattern established by session storage, using logically separated namespaces.
- We need to investigate if there are usecases that prevent this
- We need to discuss enforcing this separation with stakeholders