Core Platform Team/Initiative/Serve read requests from multiple datacenters/Initiative Description

Project Lead
Makro Obrovac

Current state
In development

Expected start
In progress

Summary
TBD

Significance and motivation
TBD

Milestones and major tasks

 * Replace all the DB slaves in codfw (need new hardware to support traffic)
 * Update and/or replace GTID_Wait / heartbeat
 * Decide where to move mainstash data (non-session data in Redis)
 * Migrate mainstsash data to new location
 * Decide what is the acceptable replication lag for MediaWiki?
 * Current lag is 10-15 seconds. ~1 second would be acceptable for 90% of reads get what they need, last 10% needs reengineering
 * Evaluate lag for codfw, if not acceptable, then engineer a solution
 * Update MediaWiki code to wait for replication (if needed)
 * TLS Proxy work
 * Separate traffic on varnish level
 * Serve thumbnails

Outcome
Increase the scalability of the platform for future applications and new types of content, as well as a growing user base and amount of content

Baseline

 * TBD

Target

 * TBD

Methodology and rationale
The primary measure of the success of this project is that services can access storage directly and do not require RESTBase.

Time and resource estimate
Core Platform: TBD

Performance: TBD

SRE: TBD

Dependencies
Multi-DC Session Storage

Collaborators
Core Platform

Performance

SRE

Stakeholders
Core Platform

SRE

Open questions
Many parts of MediaWiki assume that the DB is "close by and secure", which will change for Multi-DC. How do we address this?

Long term, what do we do with MYSQL? It doesn’t do master-master operations and blocks us from supporting writes in both locations.

If we remove performance tricks in MediaWiki, would the performance be acceptable and enable us to generalize the DB abstraction? Working on the Abstract Schema Migrations RFC may clear some of this question up.

Phabricator
TBD

Relevant materials, plans and RFCs
TBD