Wikimedia Performance Team/Sprints

Status

 * 🔴 < 30% done
 * 🟡 < 70% done
 * 🟢 70 to 100% done

2021-2022
See also internal 2021-2022 roadmap and internal Jan-Mar 2022 achievements.

Outreach:


 * Support product development by Inuka Team (Wikipedia Preview), Reading Web (NearbyPages, and RelatedArticles), CPT (WebAuthn), Design Systems Team (WVUI/Vue.js), and WMDE (Kartographer-revid)
 * Participate in SLO working group to help establish an SLO around MediaWiki Save Timing SLO.
 * Participate in W3C WebPerf WG, provide feedback to Chrome team on Google Web Vitals and Chrome bugs.
 * Organise four Web Perf Hero awards.

Insights:


 * Migrate our device lab to BitBar.
 * Evaluate and build proof-of-concept synthetic testing on bare metal instead of at AWS.
 * Write runbooks for investigating RUM alerts, WPT alerts, and WPR alerts.
 * Support to SRE Observablity in developing a new Prometheus-compatible MW-Stats client library.
 * On-going maintenance of WebPageTest, WebPageReplay, and Fresh-node.

Improvement:


 * Multi-DC: Deploy MainStash DB and migrate away from Redis-based MainStash (T212129).
 * Multi-DC: MariaDB-TLS tested and enabled for all wikis.
 * Multi-DC: CDN routing logic written and deployed to Beta and Prod behind feature flag.
 * ResourceLoader debug mode v2, reduce wait time on complex pages from ~1 minute to ~1 second.
 * Guidance and code review for DBA-led normalization of "templatelinks" MediaWiki database table, to reduce storage pressure and improve query performance. (T299417)
 * Support to SRE ServiceOps for MW-on-K8s project.
 * Develop precache-based GlobalUserEdit API for CentralAuth, following an incident.

2020-2021
See also internal 2020-2021 roadmap.

Outreach:


 * Support product launch by Anti-Harrasment Team (IPInfo extension), and CPT (API Portal skin, API Portal OAuth extension, Changes to OAuth ext).
 * Support development kick-off of Abstract Wikipedia (WikiLambda) through early check-in and 1-month team residency/matrixing in both directions.
 * Organise the Web Performance devroom for FOSDEM 2021 (recordings).
 * Organise the first Web Perf Hero award.
 * Speak at the We Love Speed conference (recording).


 * Get published in the Web Performance Calendar (4x: Human performance metrics, Profiling PHP at scale, Future of Web Vitals from a non-Googler, Setting up a device lab).
 * Enable teams to create their own production error dashboards in Logstash with a template, written guide, and video presentation.

Insights:


 * Expand navtiming RUM metrics pipeline with new Layout Shift metric.
 * Kobiton setup for our device lab, expand to include iOS in addition to Android.
 * Explore BitBar for our device lab.
 * Explore moving WPT/WPR infra away from AWS.

Improvement:


 * Multi-DC: Implement multi-dc strategy for ChronologyProtector (T254634).
 * Multi-DC: Determine and start implementing strategy for MainStash DB (T212129).

2019-2020
See also 2019-20 Q1#Performance and internal 2019-2020 roadmap.


 * Outreach:
 * Support product launches by Parsing Team (Parsoid-PHP launch), Editing Team (DiscussionTools launch), Growth Team (GrowthExperiments launch), and Inuka Team (Wikipedia KaiOS app launch).
 * Support RelEng around establishing production error triage workflows and semi-automation thereof.
 * Organise the first Web Performance conference at FOSDEM (blogpost, recordings).
 * Organise WMF-wide frontend web performance training.
 * Provide performance expertise to Frontend Architecture Working Group (FAWG).
 * Get published in the Web Performance Calendar (2x: Measuring LT and FID, Big questions on RUM)
 * Insights:
 * Organise and oversee implementation of First Paint metric in WebKit for Apple Safari (blogpost).
 * Introduce detailed metrics from WANCache time spans for MediaWiki developers (T197849).
 * Explore new RUM metrics for navtiming pipeline, such as First Input Delay.
 * Participate in Chrome Origin trial for Element Timing and provide feedback on upcoming W3C standard (blogpost).
 * Release WikimediaDebug v2 (blogpost).
 * Create our own Mobile Device Lab.
 * On-going maintenance of WebPageTest, WebPageReplay, and XHGui (Migrate from Mongo to MySQL).


 * Improvements:
 * PHP7 Transition: Finish the transition from HHVM and support SRE with instrumentation, sampling, and benchmarking.
 * Multi-DC: Start work on MainStash DB.
 * Reduce MediaWiki backend startup time to reclaim PHP7 latency increase in certain areas. (T233886, T189966).
 * Reduce frontend page startup cost in ResourceLoader (blogpost).

2018-2019
See also 2018-19 Q1, 2018-19 Q2, and internal 2018-2019 roadmap.

Insights:


 * Annual Plans/FY2019/TEC1: Current levels of service are maintained and/or improved.
 * Expand synthetic testing to more non-English wikis.
 * Introduce Excimer, sampling profiler for PHP 7 to replace HHVM Xenon (T176916).
 * Introduce Fresnel, performance testing in MediaWiki CI jobs. (T133646).
 * Research and develop and test new RUM metrics that better match user perception (T187299, Rossi 2019 paper).

Outreach:


 * Design and implement the AS Report, to expand and formalize collaborations to leverage our influence with browsers vendors and ISPs. (Announcement on Techblog).
 * Initiate and work on Wikimedia Foundation becoming an official W3C member organization. This expands the Performance Team's participation in web standards and moves us from an "invited expert" (individual) to a represented membership organisation. (Announcement on wikimediafoundation.org)
 * Publish the first post in the Perf Matters at Wikipedia series.
 * Get published in the Web Performance Calendar (5x: Magic numbers, Comparing HAR, Measuring Wikipedia, Why perf matters, AVIF).

Improvement:


 * Annual Plans/FY2019/TEC1: Improve MediaWiki availability and reduce read-only impact from data center switchovers.
 * Annual Plans/FY2019/TEC4: PHP7 Migration: Guide the work and support other teams.
 * Introduce support for packageFiles to ResourceLoader (T133462).
 * Introduce support for WebP compression format to Thumbor.
 * Reduce page load time by refactoring the startup module to need only one roundtrip instead of two (T192623).
 * Guidance, CR and testing for new AbuseFilter parser (development by Daimona) to improve Save Timing (T156095).

2017-2018
See also Annual Plan/2017-2018#Technology, 2017-18 Q3, 2017-18 Q4, and internal 2017-2019 roadmap.

Outreach:


 * Measure performance from Asia both pre- and post- Singapore data center coming online. Includes: Add capability to navtiming for geographic oversampling.
 * Publish in the Web Performance Calendar (Automate performance regression alerts).

Insights:


 * Program 1. Availability, performance, and maintenance.
 * All production sites and services maintain current levels of availability or better.
 * Maintain a comprehensive toolset to measure the performance of our platforms.
 * Enhance performance testing infrastructure using the Chrome Tracelog (T182510).
 * Review current research on performance perception (T165272).
 * Build sampling profiler for PHP 7 to replace HHVM Xenon (T176916). Includes creation of the new php-excimer extension.
 * Implement new "Backend-Timing" metric on Apache PHP web servers, as first full measurement of MediaWiki latencies. Backed by Prometheus. (T131894)
 * Develop new "navtiming2" metric definitions, addressing what we learned since 2015, and enable use of stacked graphs (T104902).
 * Migrate WebPageTest hosting from Windows to Linux.

Improvement:


 * Support for HHVM-PHP7 migration and upgrade.
 * Expand support in Thumbor to private wikis.
 * Program 8. Multi-datacenter support.

2016-2017
See Annual Plan/2016-2017Program 4: Improve site performance on Meta-Wiki.