Wikimedia Performance Team/Multi-DC MediaWiki

Active-active MediaWiki (a.k.a. "Multi DC") is a long-term cross-cutting project driven by the Performance Team to give MediaWiki the ability to serve read requests from multiple datacenters. Currently MediaWiki is only capable of serving requests from the primary datacenter.

The ability to serve MediaWiki requests from multiple datacenters will bring significant performance gains to our logged-in users, which currently must connect to our primary datacenter to access content while they are logged in. This is a huge performance penalty for anyone distant from our primary datacenter and stands in contrast to the logged-out experience of getting content from the, often much closer, nearest cache PoP. Logging into our sites is essentially an instantaneous performance penalty due to MediaWiki's current inability to work in multiple datacenters concurrently.

History
The project was formalised via an RFC in 2015. Since then, Aaron Schulz has driven the effort of converting all the MediaWiki sub-systems to make them able to work in an active-active context with more than one datacenter serving MediaWiki requests. You can see the history of completed tasks on Phabricator.

This document focuses on what is left to do in order to complete this project.

There are a few things left to complete.

1) ChronologyProtector
ChronologyProtector is a system that ensures editors always see their own edits. As of September 2020, an architectural solution has been found and the Performance Team, in collaboration with SRE, will migrate ChronologyProtector to a new data storage during Q2 2020-2021, which will make it capable of working in an active-active context.

2) Session storage
User sessions are being migrated to a new data storage system by the Core Platform Team with help from the Performance Team. This work is currently scheduled for completion in Q2 2020-2021.

3) MariaDB cross-datacenter secure writes
MediaWiki being active-active means that writes still only go to the primary datacenter, however a fallback is required for edge cases where a write is attempted in a secondary datacenter. In order to preserve our users' privacy, writes need to be sent encrypted across datacenters. Multiple solutions have been considered, but a decision has yet to be made on which one will be implemented. This work will be a collaboration between the Data Persistence Team and the Performance Team.

4) MainStash out of local Redis
https://phabricator.wikimedia.org/T212129

Harware being procured and set up in Q2.