Wikimedia Technology/Annual Plans/FY2019/TEC1: Reliability, Performance, and Maintenance
This program acts as the cornerstone for any other program at the Foundation; in the absence of the work that goes into the maintenance of our infrastructure the Foundation cannot deliver on its mission.
- 1 Program outline
- 1.1 Teams contributing to the program
- 1.2 Annual Plan priorities
- 1.3 How does your program affect annual plan priority?
- 1.4 Program Goal
- 1.4.1 Outcome 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure.
- 1.4.2 Outcome 2: Better designed systems
- 1.4.3 Outcome 3: Users can leverage a reliable and public Infrastructure as a Service (IaaS) product ecosystem for VPS hosting.
- 1.4.4 Outcome 4: Members of the Wikimedia movement are able to develop and deploy technical solutions with a reasonable investment of time and resources on the Wikimedia Cloud Services Platform as a Service (PaaS) product.
- 1.4.5 Outcome 5: Performance and Function of Wikimedia properties on mobile devices is tested and monitored
- 1.4.6 Outcome 6: Improved MediaWiki availability and reduced read-only impact from data center fail-overs
- 1.5 Resources
- 1.6 Targets
- 1.7 Dependencies
Teams contributing to the program
Site Reliability Engineering, Analytics, MediaWiki Platform, Wikimedia Cloud Services, Performance
Annual Plan priorities
Primary Goal: 3. Knowledge as a Service - evolve our systems and structures
How does your program affect annual plan priority?
Wikimedia's sites and services and underlying technical infrastructure are core to its work in furthering its mission. This program is about sustaining and evolving the infrastructure and structures that support previous achievements as well as future work.
We will maintain the availability of Wikimedia’s sites and services for our global audiences and ensure they’re running reliably, securely, and with high performance. We will do this while modernizing our infrastructure and improving current levels of service when it comes to testing, deployments, and maintenance of software and hardware.
Outcome 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure.
- Output 1.1
- Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services)
- Output 1.2
- Maintain data center infrastructure and equipment lifecycle from procurement through break-fix to decommissioning
- Output 1.3
- Improve security, stability, performance and scalability of MediaWiki.
- Output 1.4
- Perform incident response, diagnosis, and followup on system outages or alerts across our stack.
- Output 1.5
- We have scalable, reliable and secure systems for data transport and storage.
Outcome 2: Better designed systems
- Output 2.1
- Assist in the architectural design of new services and making them operate at scale
Outcome 3: Users can leverage a reliable and public Infrastructure as a Service (IaaS) product ecosystem for VPS hosting.
- Output 3.1
- Maintain existing OpenStack infrastructure and services
- Output 3.2
- Pay down technical debt and allow upgrading of the core OpenStack platform to modern, supported releases by replacing the current network topology layer with OpenStack Neutron, which has become the standard for most OpenStack deployments.
- Output 3.3
- Increase availability of compute resources for the IaaS product by expanding deployment of physical resources beyond the current single broadcast domain
Outcome 4: Members of the Wikimedia movement are able to develop and deploy technical solutions with a reasonable investment of time and resources on the Wikimedia Cloud Services Platform as a Service (PaaS) product.
- Output 4.1
- Maintain existing Grid Engine and Kubernetes web services infrastructure and ecosystems.
Outcome 5: Performance and Function of Wikimedia properties on mobile devices is tested and monitored
- Output 5.1
- Performance testing of both the mobile web and native app experiences is conducted on a regular basis, in order to identify regressions in the user experience
- Output 5.2
- Wikimedia native applications are instrumented for performance monitoring similarly to our web properties
Outcome 6: Improved MediaWiki availability and reduced read-only impact from data center fail-overs
- Output 6.1
- Production deployment of routing of MediaWiki GET/HEAD requests to the secondary data center.
|Site Reliability Engineering||
|Travel & Other||
- Ubuntu operating systems completely replaced by Debian
- Measurement method
- 100% of OpenStack infrastructure services served from hosts running Debian Jessie or newer operating systems by end of FY2018/19 Q3.
- 100% of Cloud VPS hosted instances running Debian Jessie or newer operating systems by end of FY2018/19 Q3.
- Full deployment of OpenStack Neutron as software defined networking (SDN) layer for Cloud Services OpenStack clusters
- Measurement method
- Nova-network SDN removed from all Cloud Services OpenStack clusters by end of FY2018/19 Q2.
- Expand OpenStack hosting to multiple broadcast domains
- Measurement method
- Virtual machine hosting in a second broadcast domain available for alpha testing by end of FY2018/19 Q4.