Wikimedia Performance Team/Multi-DC MediaWiki

Active-active MediaWiki (a.k.a. "Multi DC") is a long-term cross-cutting project driven by the Performance Team to give MediaWiki the ability to serve read requests from multiple datacenters. Currently MediaWiki is only capable of serving requests from the primary datacenter.

The ability to serve MediaWiki requests from multiple datacenters will bring significant performance gains to our logged-in users, which currently must connect to our primary datacenter to access content while they are logged in. This is a huge performance penalty for anyone distant from our primary datacenter and stands in contrast to the logged-out experience of getting content from the, often much closer, nearest cache PoP. Logging into our sites is essentially an instantaneous performance penalty due to MediaWiki's current inability to work in multiple datacenters concurrently.

Having MediaWiki served from 2 or more datacenters during normal operations also ensures better resilience in case of a datacenter failure.

History
The project was formalised via an RFC in 2015. Since then, Aaron Schulz has driven the effort of converting all the MediaWiki sub-systems to make them able to work in an active-active context with more than one datacenter serving MediaWiki requests. You can see the history of subtasks on Phabricator.

This document focuses on what is the major blockers left to complete in order to finish this project and enable the active-active serving of MediaWiki. They all require cross-team coordination.

ChronologyProtector
ChronologyProtector is a system which ensures that editors always see their own edits after making them. As of September 2020, an architectural solution has been found and the Performance Team, in collaboration with Service Operation s, will migrate ChronologyProtector to a new data storage (either Memcached or Redis), during Q2 2020-2021, which will make it capable of working in an active-active context.

Session storage
User sessions are being migrated from Redis to a new data storage system, Kask, by the Core Platform Team with help from the Performance Team. The last part of that work, migrating CentralAuth sessions, is currently scheduled for completion in Q2 2020-2021.

Main Object Stash
The Redis cluster that was used to store sessions also stored miscellaneous user data, the Main Object Stash, that also needs to be moved out of Redis. The plan is to move this data to a new small MariaDB cluster. This project requires new hardware, which is being procured and set up in Q2 2020-2021 by the Data Persistence Team. The Performance Team will take care of migrating the Main Object Stash as soon as the new database cluster is available, i.e. in Q2 or Q3 2020-2021.

MariaDB cross-datacenter secure writes
MediaWiki being active-active means that writes still only go to the primary datacenter, however a fallback is required for edge cases where a write is attempted in a secondary datacenter. In order to preserve our users' privacy, writes need to be sent encrypted across datacenters. Multiple solutions are being considered, but a decision has yet to be made on which one will be implemented. This work will be a collaboration between the Data Persistence Team and the Performance Team. We hope for it to happen during fiscal year 2020-2021.