Wikimedia Performance Team

As the Wikimedia Foundation’s Performance Team, we create value for readers and editors by making it possible to retrieve and render content at the speed of thought, from anywhere in the world, on the broadest range of devices and connection profiles.

We focus on providing equal access to a frustration-free experience, regardless of whether someone is using a brand-new laptop on a fast network in a large metropolitan area, or if they're using an inexpensive mobile device in a rural area with unreliable internet connectivity.

Team


Focus
Outreach. Our team strives to develop a culture of performance first in the movement. We ensure that performance is a prime consideration in technological and product developments across the movement.

Monitoring. By developing better tooling, designing better metrics, automatically tracking regressions, all in a way that can be reused by anyone, we monitor the right metrics and discover issues that can sometimes be hard to detect.

Knowledge. We are the movement's reference on all things performance, which requires keeping up with rapid changes in technology across our entire stack. We are actively involved in new performance standards being built for the web.

Improvement. Some performance gains require a very high level of expertise and complex work to happen before they are possible. We undertake complex projects that can yield important performance gains in the long run.

Presentations and blog posts

 * Performance blog'''
 * Why performance matters at Wikimedia (video, restricted to staff), Gilles Dubuc, 2021
 * The role of the performance team (video, restricted to staff), Gilles Dubuc, 2021
 * Humans can (also) measure your performance at WeLoveSpeed (video), Gilles Dubuc, 2021
 * How to Logstash (video, restricted to staff), Timo Tijhof, 2020
 * How to make sense of real user performance metrics (RUM) at Velocity Conference Berlin (video), Gilles Dubuc, 2019.
 * Keeping Wikipedia Fast at WeLoveSpeed (video), Peter Hedenskog, 2019.
 * Tech talk "Creating Useful Dashboards with Grafana" (video), Timo Tijhof, 2016.
 * Tech talk "Let's talk about web performance" (video), Peter Hedenskog, 2015.

Milestones

 * 2014: Migration from Zend PHP to HHVM. This greatly reduced backend response time (2x faster).
 * 2014: Statsv. Greatly simplifies sending light weight data to statsd and Graphite from front-end code and apps.
 * 2015: Helped with the HTTPS + HTTP/2 migration.
 * 2015: Asynchronous ResourceLoader top queue.
 * 2015: Optimistic save (aka "edit stashing").
 * 2015: Improve cache hits for front-end resources.
 * 2015-2016: DeferredUpdates (src). Greatly contributed to bringing edit save time median below 1 second.
 * 2015: WebPageTest. Now used by several teams to do synthetic performance testing.
 * 2015: Xenon/Flame graphs. Surfaces performance of all our PHP backend.
 * 2015: First team offsite attending the Velocity conference. Being high profile enough to speak at that conference became an aspirational goal.
 * 2016: Introduction of many Grafana dashboards to track performance.
 * 2017: Implemented a performance metric alert system on top of Grafana.
 * 2017: Migrated MediaWiki and all extensions to jQuery 3.
 * 2017: Published first of many team tech blog posts.
 * 2015-2018: Thumbor. Rewrote the media thumbnailing layer for Wikimedia production.
 * 2018-2019: Migrated from HHVM to PHP 7.
 * 2019: Spoke at major performance conferences. Including Velocity's last edition ever, closing the loop on our 2015 inspiration to speak there.
 * 2019: Published our first research paper.
 * 2020: Created a real device performance monitoring lab.
 * 2020-2021: Hosted a web performance devroom at FOSDEM.
 * 2020-2021: Trained 30+ staff members on frontend web performance.
 * 2020: Awarded the first Web Perf Hero award.

Dashboards
A big part of our work is devoted to collecting and analyzing site performance data to ensure that we have a holistic and accurate understanding of what users experience when they access Wikimedia sites. You can discover our dashboards by visiting the Wikimedia Performance portal. A selection of our dashboards is also linked below:
 * Navigation Timing
 * ResourceLoader
 * WebPageTest
 * Save Timing
 * Edit Stash
 * MySQL aggregate

Tools
Below is an overview of the various applications, tools, and services we use for collecting, processing, and displaying our data.

Data collection
Maintained by Wikimedia:


 * wikimedia/arc-lamp - [PHP] Collect data from Excimer and send aggregated and sampled profiles from production requests to Redis. Used for flame graphs.
 * Navigation Timing (docs | GitHub) - [JS] MediaWiki plugin to collect Navigation Timing data.
 * WebPageTest – Synthetic testing of web performance, at wpt.wmftest.org.
 * WebPageReplay – Synthetic testing of web performance.
 * WebPageTest runner (GitHub) - [JS] Collect data from WebPageTest API and send to Statsd or Graphite.
 * Jenkins configuration (GitHub) - [YAML] Jenkins job that triggers WebPageTest runs.
 * Tendril (GitHub). [PHP] Real-time MariaDB analytics and performance.

We also use:


 * Tideways-XHProf (GitHub) - Profile any request via X-Wikimedia-Debug and view it in XHGui.

Processing and display
Maintained by Wikimedia:


 * performance.wikimedia.org (see | GitHub) - Static website that serves as portal to Flame Graphs, profiling, and other dashboards.
 * navtiming (GitHub) – [Python] Process data from Navigation Timing beacons and submit the data to Statsd/Graphite.
 * EventLogging – [Python] Platform for schema-based data.
 * coal (see | GitHub) - [Python] Custom Graphite writer and Web API. Frontend graphs made with D3.
 * PerformanceInspector (docs | GitHub) - [JS] MediaWiki plugin to profile the current page and find potential performance problems.
 * Statsv – [Python] Receiver for simple statistics over HTTP for statsd.
 * perflogbot (source) - [JS] An IRC bot tracking behaviour of ResourceLoader in Wikimedia production
 * dbtree - Detailed MySQL cluster information (see also: Grafana: MySQL dashboard).

We also use:
 * Grafana – Dashboard for visualising data from Prometheus and Graphite, publicly viewable at grafana.wikimedia.org.
 * Flame graphs (brendangregg/FlameGraph) – Viewing data from the sampling profiler for production traffic to Wikipedia and other sites, at https://performance.wikimedia.org/xenon.
 * XHGui – Viewing data from XHProf, a function-level hierarchical profiler, used when manually debugging individual requests, at https://performance.wikimedia.org/xhgui.
 * Statsd – Metrics aggregation between instrumented applications and Graphite.
 * Logstash, at logstash.wikimedia.org (NDA restricted).
 * Memkeys

Data storage

 * Prometheus – Storage of metrics and statistics. See also: Prometheus (internal runbook).
 * Kafka – Distributed streaming and storing of events. See also: Kafka (internal runbook).


 * Graphite – Timeseries database. See also Graphite (internal runbook).

Workflow

 * Grafana dashboards
 * Gerrit Code-Review: Performance Team dashboard (how-to: Add Gerrit navigation link)

Contact

 * Phabricator workboard (Issue tracker)
 * Freenode IRC: