Wikimedia Performance Team/Backend performance

These are the performance guidelines for MediaWiki backend development aimed at being deployed to Wikimedia Foundation wikis.

What to do (summary)

 * Know where the time is spent. Regularly measure performance both in your development environment and later in production (See also: How to measure).
 * Identify relevant latency metrics, and take responsibility for them. As developers you have the best idea of usage patterns and what areas are worth monitoring to learn when an experience becomes slower and/or when there are capacity problems. Consider documenting your latency metrics, and using them on a monthly or quarterly basis to prioritise maintenance as-needed to uphold the quality of service. (Example)
 * When performance is subpar, this can often be indicative of a deeper issue in how the code is solving a particular need. Think about the root cause, and whether certain costs can be structurally avoided or deferred. (See also: #You are not alone.)

Ballpark numbers

 * When accessing information (e.g. a view or API request over GET), aim for your backend to respond within 50ms at the median and within 200ms at the p99. In other words, common requests to popular data that benefit from internally warm caches respond within 50ms (e.g. database server cache, or Memcached/Apcu), and requests that encounter internal cache misses still gather and render all data within 200ms.
 * When performing write actions (e.g. a POST request), aim for your backend to respond within 1 second. Make sure that the amount of work a request performs is naturally and deterministically limited (e.g. do not rely on server-level memory limits or timeouts). It is encouraged to deny and incentive against usage patterns that we cannot maintain at scale. Remember that you can schedule tasks via the Job queue and Deferred updates which let you run code asynchronously either on the same server after the response has been sent ("post-send deferred update"), or within a few seconds on a separate server cluster ("jobqueue job").

General performance principles
Wikimedia infrastructure:
 * Your code is running in a shared environment. Thus, long-running SQL queries can't run as part of the web request. Instead they should be made to run on a dedicated server (use the JobQueue), and watch out for deadlocks and lock-wait timeouts.


 * The tables you create will be shared by other code. Every database query must be able to use one of the indexes (including write queries!). Use EXPLAIN on your queries and create new indices where required.


 * Choose the right persistence layer for your needs: Redis job queue, MariaDB database, or Swift file store. Only cache if your code can always performantly handle the cached data disappearing; otherwise, persist the data.
 * Wikimedia uses and depends heavily on many different caching layers, so your code needs to work in that environment! (But it also must work if everything misses cache.)
 * The cache hit ratio should be as high as possible; watch out if you're introducing new cookies, shared resources, bundled requests or calls, or other changes that will vary requests and reduce cache hit ratio.

Measure
Measure how fast your code runs, and make decisions based on these facts to avoid having to rely on superstition or feeling. Use the below guidelines together with the Architecture guidelines and Security guidelines.

Focus both on individually perceived performance (your code runs relatively fast) and scalability (your code can run fast, even on a large wikis and when run many times concurrently).

Percentiles
Always consider high percentile values rather than median. For backend code, consider the 99% numbers (99th percentile) instead of mean averages or medians. (Learn why:  ) In other words, you don't want half of your users to have a good experience, you want all of them to. So you need to look at the 99th slowest sample to really understand performance.

It is very common that performance data in the web has two different "signals": one for users accessing the application on a warm cache and other from users accessing the application on cold cache. Calculating averages on a dataset with these two signals is pointless. To do a fast check on the data make sure you have at least 10.000 data points and calculate the 50th and 90th percentile. Those numbers might differ greatly and that would be an indication on performance issues that you can fix. For example, if network roundtrips are quite slow and you have a lot of resources being fetched you shall see a huge difference between users coming to your site with cached resources (thus avoiding all those slow round trips) and not. Even better, if you have sufficient data, you can calculate 1, 50, 90 and 99 percentiles. A good rule of thumb is that to have statistical significance you need 10.000 data points to calculate a 90th percentile, 100.000 for a 99th percentile and 1 million for a 99.9.

This rule of thumb oversimplifies matters a bit, but works well for performance analysis.

Latency
Aim for your software to provide a reasonably fast experience, regardless of network latency. Client-side latency depends on numerous factors, including: your backend response time, the CDN response time, the round-trip time for the client's network (RTT from CDN to device and back), and the transfer rate (bandwidth) the client's network is capable of.

The RTT and bandwidth of a connection are not always correlated. For example, a Gigabit connection with an RTT of 2 seconds will not transfer anything in less than 2 seconds, regardless of payload size. One could imagine that bandwidth is like the size of a truck (or the number of lanes on a highway), and RTT is how fast one may travel on the highway. If 1000 kg are transferred by the truck in 1 hour, this does not mean that 1 kg will arrive in 3 seconds. We recommend the free e-book High Performance Browser Networking by Ilya Grigorik. Examples of high RTT connections are mobile devices for any web request after a brief period of inactivity. They first need to negiotiate and establish their credentals and connection to a celltower before the actual request even starts.

Strategies for improving or obscuring latency: See also Page load performance#Latency for advice on preloading and stale revalidation.
 * Recognise when code paths will always be fast, e.g. fast response and low error/timeout threshold (memcache, simple and optimised database queries, etc).
 * Recognise when code paths may be slower at times, e.g. under load or for other variable reasons (e.g. parsing a page, fetching information from another wiki, loading something from a URL). When creating a code path that may be slow, document this fact and/or follow a naming convention within your code base to differentiate the two. Such as " " vs " ".
 * Consider breaking operations into smaller pieces, that can be individually re-used and composed at a higher level. This avoids building up an ever growing amount of computations that happen whenever anything happens (which essentially guruantees you're always doing more work than you need even for the common case).
 * Keep a tight timeout. Once you've set a p99 latency objective for your backend responses, think about what scenarios might cause a response to fall in that last 1%. For example, if your feature includes internal requests to other services, what timeouts do they have? What re-tries do they allow? Consider what happens if suddenly a majority of web traffic starts to exercise your 1% scenario. Do we start to exhaust all backend capacity, or do we fail quickly and free up resources for other features and re-tries? Once you've set and achieved a latency objective, put limits in place that ensure you will shift toward errors instead of shifting to all responses being slow and exhausting all available resources.
 * Responses to unregistered users (web clients without a session) must generally be cachable by the CDN.
 * For page views, page actions, and special pages, this is controlled by OutputPage.
 * A good example of a cachable Action API module is ApiOpenSearch.
 * A good example of a cacheable REST API route is Rest\Handler\SearchHandler.

Remember to leverage the platform to benefit and build atop all prior work for scaling MediaWiki to our performance needs. (More that below)

Leverage the platform!

 * Move work to post-send Deferred updates or Jobs if it doesn't have to happen in the critical path of a web response.
 * When interacting with memcache, use the  idiom and use WANObjectCache. Avoid calling memcached directly. WANCache automatically takes care of numerous "at scale" needs such as stampede protection, purging, mutex locks, keep your caches warm by asynchronously and pre-emptively regenerating values before they expire. This reduces the chances of hot keys ever getting a cache miss. WANCache also integrates with dev tooling such as rate and latency stats on Grafana: WANObjectCache.
 * Utilize core service classes whenever possible and look for batch methods that can process several of your requests in parallel. Our database abstraction layer, cache interfaces, and HTTP clients all support batching. For example:,  ,  ,.

How often will my code run?
Think about how often the server will execute your code. We tend to categorise server-side concerns in these buckets:
 * 1) Always.
 * 2) * Critical code running unconditionally on every web request. This should generally be kept very direct, minimal, and obvious in what it costs to do its work. This includes early Setup hooks, extension.json callbacks, and service wiring.
 * 3) * The typical budget for new needs from core or extension hooks during the Setup phase is under 0.1ms.
 * 4) On page views.
 * 5) * Code running on all HTML web responses. This is almost always, but not for action=raw requests, or HTTP 304 "Not Modified" responses to a page view URL.
 * 6) * The typical budget for new output on a pageview is 1ms. Keep in mind that we aim to respond within 50ms to a majority of requests (#Ballpark numbers).
 * 7) When parsing page content.
 * 8) * Most backend requests for pages views fetch page content from the ParserCache and render a skinned page view. The parsing of page content happens on a relatively small portion of page views only (e.g. cache miss), as well as during the response to saving of edits. For that reason, it is acceptable to perform a limited number of more expensive operations during the parser run, e.g. from a parser hook. These hooks generally do not execute while someone is waiting during a page view, as the result of these operations is retained in the parser cache.
 * 9) * The typical budget for components that render content is 10ms (for most pages using your feature), or upto 100ms for pages that utilize your feature in an unusually complex manner.
 * 10) When performing a write action.
 * 11) * Editing is the most common non-read action. The latency of processing an edit is actively monitored as Save Timing.
 * 12) ** The typical budget for components that parse content or otherwise synchronously hook into edit submissions is 10ms (for most pages), or upto 100ms for unusually complex pages. Keep in mind that we aim to respond within 1 second for write actions, and there are many extensions participating in that shared budget (#Ballpark numbers).

You are not alone
If your code is amplifying a pre-existing performance issue in other component or service, identity these and ensure relevant teams are made aware. The Performance Team can help you in finding and/or advocating for these cross-component issues.

Work with Performance Team to understand general performance targets before you start architecting your system. For example, a user-facing application might have an acceptable latency of 200ms while a database might have something like 20ms or less, especially if further access is decided based on the results of previous queries. You don't want to prematurely optimize, but you need to understand if your targets are physically possible.

You might not need to design your own backend; consider using an existing service, or having someone design an interface for you. Consider modularization. Performance is hard; do not try to reinvent the wheel.

Use of shared resources and execution context
Be mindful that your code is running in an environment that uses shared resources such as databases, queues, and cache servers. Thus, long-running queries (e.g. 5+ seconds) should run on dedicated servers. For example, regeneration of complex special page lists use the "vslow" database servers. Watch out query patterns that are prone to deadlocks and lock-wait timeouts; long-running transactions, inefficient WHERE clauses in either write or locking SELECT queries, insertion queries preceded by "gap locking" queries in the same transaction. When assessing whether your queries will take "a long time" or cause contention, profile them. These numbers will always be relative to the performance of the server in general, and to how often it will be run.

The main execution context is that of the response to a single web request, with the other context being CLI mode (e.g. Maintenance scripts). Be mindful of the fact that various extensions can add extra queries and updates via hooks. To minimize the risk of timeouts, deadlocks, and half completed updates due to interactions between core and extensions, one should strive for speed and simplicity of RDBMS and object store writes during the main transaction round. Updates that take non-trivial time or are complex should use DeferredUpdates or the JobQueue when possible, to better isolate different modules from one another. Use simple cache purges over re-computations when data entries change to avoid slowdowns (also to avoid problems with race conditions as well as multi-datacenter replication).

Rate limiting
If your product exposes new user actions that make database modifications beyond the standard page creation / page editing mechanism, then firstly consider whether this is appropriate and scalable. You're going to have a lot less maintenance overhead and operationa risk if you adopt "Everything is a wiki page". See also Choose boring technology by Dan McKinley.

If you do have to expose new "write" actions, make sure a rate limit is applied.

Example:


 * UrlShortener exposes API to create new short URLs, which needs a rate limit. Typically powered by . T133109

For expensive computations that are not write actions, such as power user features that may expose slow or expensive computations,, consider implementing a throttle based on PoolCounter to limit overall server load.

Example:


 * Special:Contributions exposes a database read query that can be slow. This is rate limited by PoolCounter. See also T234450 and T160985.

Long-running queries
Long-running queries that do reads should be on a dedicated server, as Wikimedia does with analytics. MySQL uses snapshots for SELECT queries, and the snaphotting persists until COMMIT if BEGIN was used. Snapshots implement REPEATABLE-READ by making sure that, in the transaction, the client sees the database as it existed in single point in time (the time of the first SELECT). Keeping one transaction open for more than (ideally) seconds is a bad idea on production servers. As long as a REPEATABLE-READ transaction is open (that did at least one query), MySQL has to keep the old versions of rows around in the index that were since deleted or changed because the long-running transaction should see them in any relevant SELECT queries. These rows can clutter up the index of hot tables that have nothing to do with the long-running query. There are research databases - use those. Special pages can use the "vslow" query group to be mapped to dedicated databases.

Locking
Wikimedia's MySQL/MariaDB servers use InnoDB, which supports repeatable read transactions. Gap locking is part of "Next-key Locks", which is how InnoDB implements REPEATABLE READ transaction isolation level. At Wikimedia, repeatable read transaction isolation is on by default (unless the code is running in Command-Line Interaction (CLI) mode, as with the maintenance scripts), so all the SQL SELECT queries you do within one request will automatically get bundled into a transaction. For more information, see the Wikipedia articles on Isolation (database systems) and look up repeatable read (snapshot isolation), to understand why it's best to avoid phantom reads and other phenomena.
 * Anytime you are doing a write/delete/update query that updates something, it will have gap locks on it unless it is by a unique index. Even if you are not in repeatable read, even if you are doing one SELECT, it will be internally consistent if, for example, it returns multiple rows.Thus: do your operations, e.g., DELETE or UPDATE or REPLACE, on a unique index, such as a primary key. The situations where you were causing gap locks and you want to switch to doing operations on a primary key are ones where you want to do a SELECT first to find the ID to operate on; this can't be SELECT FOR UPDATE since it has the same locking problems. This means you might have to deal with race condition problems, so you may want to use INSERT IGNORE instead of INSERT.

Here's a common mistake that causes inappropriate locking: take a look at, for instance, the table  (line 208 of tables.sql), in which you have a three-column table that follows the "Entity-value-attribute" pattern.
 * 1) Column 1: the object/entity (here, UserID)
 * 2) Column 2: the name of a property for that object
 * 3) Column 3: the value associated with that property for the object

That is, you have a bunch of key-values for each entity that are all in one table. (This table schema is kind of an antipattern. But at least this is a reasonably specific table that just holds user preferences.)In this situation, it's tempting to create a workflow for user preference changes that deletes all the rows for that userID, then reinserts new ones. But this causes a lot of contention for the database. Instead, change the query so you only delete by the primary key. SELECT it first, and then, when you INSERT new values, you can use INSERT IGNORE (which ignores the insert if the row already exists). This is more efficient. Alternatively, you can use a JSON blob, but this is hard to use in JOINs or WHERE clauses in single entries. See "On MySQL locks" for some explanation of gap locks.

Transactions
Every web request and every database operation, in general, should occur within a transaction. However, be careful when mixing a database transaction with an operation on something else, such as another database transaction or accessing an external service like Swift. Be particularly careful with locking order. Every time you update or delete or insert anything, ask:
 * what you are locking?
 * are there other callers?
 * what are you doing, after making the query, all the way to making the commit?

Avoid excessive contention. Avoid locking things in an unnecessary order, especially when you're doing something slow and committing at the end. For instance, if you have a counter column that you increment every time something happens, then DON'T increment it in a hook just before you parse a page for 10 seconds.

Do not use READ UNCOMMIT (if someone updates a row in a transaction and has not committed it, another request can still see it) or SERIALIZABLE (every time you do SELECT, it's as if you did SELECT FOR UPDATE, a.k.a. lock-and-share mode -- locking every row you select until you commit the transaction -leads to lock-wait timeouts and deadlocks).

Examples
Good example:. When message blobs (JSON collections of several translations of specific messages) change, it can lead to updates of database rows, and the update attempts can happen concurrently. In a previous version of the code, the code locked a row in order to write to it and avoid overwrites, but this could lead to contention. In contrast, in the current codebase, the  method performs a repeated attempt at update until it determines (by checking timestamps) that there will be no conflict. See lines 212-214 for an explanation and see line 208-234 for the outer do-while loop that processes  until it is empty.

Bad example: The former structure of the ArticleFeedbackv5 extension. Code included:

Bad practices here include the multiple counter rows with id = '0' updated every time feedback is given on any page, and the use of DELETE + INSERT IGNORE to update a single row. Both result in locks that prevent >1 feedback submission saving at a time (due to the use of transactions, these locks persist beyond than the time needed by the individual statements). See minutes 11-13 of Asher Feldman's performance talk & page 17 of his slides for more explanation.

Indexing
The tables you create will be shared by other code. Every database query must be able to use one of the indexes (including write queries!).

Unless you're dealing with a tiny table, you need to index writes (similarly to reads). Watch out for deadlocks and for lock-wait timeouts. Try to do updates and deletes by primary query, rather than some secondary key. Try to avoid UPDATE/DELETE queries on rows that do not exist. Make sure join conditions are cleanly indexed.

You cannot index blobs, but you can index blob prefixes (the substring comprising the first several characters of the blob).

Compound keys - namespace-title pairs are all over the database. You need to order your query by asking for namespace first, then title!

Use EXPLAIN & MYSQL DESCRIBE query to find out which indexes are affected by a specific query. If it says "Using temporary table" or "Using filesort" in the EXTRA column, that's often bad! If "possible_keys" is NULL, that's often bad (small sorts and temporary tables are tolerable though). An "obvious" index may not actually be used due to poor "selectivity". See the Measure backend performance in production guide, and for more details, see Roan Kattouw's 2010 talk on security, scalability and performance for extension developers, Roan's MySQL optimization tutorial from 2012 (slides), and Tim Starling's 2013 performance talk.

Indexing is not a silver bullet; more isn't always better. Once an index gets big enough that it doesn't fit into RAM anymore, it slows down dramatically. Additionally, an index can make reads faster, but writes slower.

Good example: See the  and   tables. One of them also offers a reverse index, which gives you a cheap alternative to SORT BY.

Bad example: See this changeset (a fix). As the note states, "needs to be id/type, not type/id, according to the definition of the relevant index in :  ". Rather than using the index that was built on the id-and-type combination, the previous code (that this is fixing) attempted to specify an index that was "type-and-id", which did not exist. Thus, MariaDB did not use the index, and thus instead tried to order the table without using the index, which caused the database to try to sort 20 million rows with no index.

Persistence layer
Choose the right persistence layer for your needs: job queue (like Redis), database (like MariaDB), or file store (like Swift). In some cases, a cache can be used instead of a persistence layer.

Wikimedia sites makes use of local services including Redis, MariaDB, Swift, and memcached. (Also things like Parsoid that plug in for specific things like VisualEditor.) They are expected to reside on a low-latency network. They are local services, as opposed to remote services like Varnish.

People often put things into databases that ought to be in a cache or a queue. Here's when to use which:
 * 1) MySQL/MariaDB database - longterm storage of structured data and blobs.
 * 2) Swift file store - longterm storage for binary files that may be large. See Media storage for details.
 * 3) Redis jobqueue - you add a job to be performed, the job is done, and then the job is gone. You don't want to lose the jobs before they are run. But you are ok with there being a delay.
 * (in the future maybe MediaWiki should support having a high-latency and a low-latency queue.)

A cache, such as memcached, is storage for things that persist between requests, and that you don't need to keep - you're fine with losing any one thing. Use memcached to store objects if the database could recreate them but it would be computationally expensive to do so, so you don't want to recreate them too often. You can imagine a spectrum between caches and stores, varying on how long developers expect objects to live in the service before getting evicted; see the Caching layers section for more.

Permanent names: In general, store resources under names that won't change. In MediaWiki, files are stored under their "pretty names", which was probably a mistake - if you click Move, it ought to be fast (renaming title), but other versions of the file also have to be renamed. And Swift is distributed, so you can't just change the metadata on one volume of one system.

Object size: Memcached sometimes gets abused by putting big objects in there, or where it would be cheaper to recalculate than to retrieve. So don't put things in memcached that are TOO trivial - that causes an extra network fetch for very little gain. A very simple lookup, like "is a page watched by current user", does not go in the cache, because it's indexed well so it's a fast database lookup.

When to use the job queue: If the thing to be done is fast (~5 milliseconds) or needs to happen synchronously, then do it synchronously. Otherwise, put it in the job queue. You do not want an HTTP request that a user is waiting on to take more than a few seconds. Examples using the job queue:


 * Updating link table on pages modified by a template change
 * Transcoding a video that has been uploaded

HTMLCacheUpdate is synchronous if there are very few backlinks. Developers also moved large file uploads to an asynchronous workflow because users started experiencing timeouts.

In some cases it may be valuable to create separate classes of job queues -- for instance video transcoding done by Extension:TimedMediaHandler is stored in the job queue, but a dedicated runner is used to keep the very long jobs from flooding other servers. Currently this requires some manual intervention to accomplish (see TMH as an example).

Extensions that use the job queue include RenameUser, TranslationNotification, Translate, GWToolset, and MassMessage.

Additional examples:
 * large uploads. UploadWizard has API core modules and core jobs take care of taking chunks of file, reassembling, turning into a file the user can view. The user starts defining the description, metadata, etc., and the data is sent 1 chunk at a time.
 * purging all the pages that use a template from Varnish & bumping the  column in the database, which tells parser cache it's invalid and needs to be regenerated
 * refreshing links: when a page links to many pages, or it has categories, it's better to refresh links or update categories after saving, then propagate the change. (For instance, adding a category to a template or removing it, which means every page that uses that template needs to be linked to the category -- likewise with files, externals, etc.)

How slow or contentious is the thing you are causing? Maybe your code can't do it on the same web request the user initiated. You do not want an HTTP request that a user is waiting on to take more than a few seconds.

Example: You create a new kind of notification. Good idea: put the actual notification action (emailing people) or adding the flags (user id n now has a new notification!) into the jobqueue. Bad idea: putting it into a database transaction that has to commit and finish before the user gets a response.

Good example: The Beta features extension lets a user opt in for a "Beta feature" and displays, to the user, how many users have opted in to each of the currently available Beta features. The preferences themselves are stored in  table. However, directly counting the number of opted-in users every time that count is displayed would not have acceptable performance. Thus, MediaWiki stores those counts in the database in the  table, but they are also stored in memcached. It's important to immediately update the user's own preference and be able to display the updated preference on page reload, but it's not important to immediately report to the user the increase or decrease in the count, and this information doesn't get reported in Special:Statistics.

Therefore, BetaFeatures updates those user counts every half hour or so, and no more. Specifically, the extension creates a job that does a SELECT query. Running this query takes a long time - upwards of 5 minutes! So it's done once, and then on the next user request, the result gets cached in memcached for the page https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-betafeatures. (They won't get updated at all if no one tries to fetch them, but that is unlikely.) If a researcher needs a realtime count, they can directly query the database outside of MediaWiki application flow.

Code: UpdateBetaFeatureUserCountsJob.php and BetaFeaturesHooks.php.

Bad example: add one?

Multiple datacenters
See Database transactions

Once CDN requests reach (non-proxy) origin servers, the responding service (such as Apache/MediaWiki, Thumbor, or HyperSwitch) must limit its own read operations from persistence layers to only involve the local datacenter. The same applies to write operations to caching layers, except for allowing asynchronous purging broadcasts or asynchronous replication of caches that are profoundly expensive to regenerate from scratch (e.g. parser cache in MySQL). Write operations to source data persistence layers (MySQL, Swift, Cassandra) are more complex, but generally should only happen on HTTP POST or PUT requests from end-users and should be synchronous in the local datacenter, with asynchronous replication to remote datacenters. Updates to search index persistence layers (Elastic, BlazeGraph) can use either this approach, the Job queue, or Change propagation. The enqueue operations to the job/propagation systems are themselves synchronous in the local datacenter (with asynchronous replication to the remote ones).

HTTP POST/PUT requests to MediaWiki will be routed to the master datacenter and the MediaWiki job queue workers only run there as well (e.g. where the logic of  executes). An independent non-MediaWiki API service might be able to run write APIs correctly in multiple datacenters at once if it has very limited semantics and has no relational integrity dependencies on other source data persistence layers. For example, if the service simply takes end-user input and stores blobs keyed under new UUIDs, there is no way that writes can conflict. If updates or deletions are later added as a feature, then Last-Write-Wins might be considered a "correct" approach to handling write conflicts between datacenters (e.g. if only one user has permission to change any given blob then all conflicts are self-inflicted). If write conflicts are not manageable, then such API requests should be routed to the master datacenter.

Work involved during cache misses
Wikimedia uses and depends heavily on many different caching layers, so your code needs to work in that environment! (But it also must work if everything misses cache.)

Cache-on-save: Wikimedia sites use a preemptive cache-repopulation strategy: if your code will create or modify a large object when the user hits "save" or "submit", then along with saving the modified object in the database/filestore, populate the right cache with it (or schedule a job in the job queue to do so). This will give users faster results than if those large things were regenerated dynamically when someone hit the cache. Localization (i18n) messages, SpamBlacklist data, and parsed text (upon save) are all aggressively cached. (See "Caching layers" for more.)

At the moment, this strategy does not work well for memcached for Wikimedia's multi-datacenter use case. A workaround when using WANObjectCache is to use  as normal, but with "lockTSE" set and with a "check" key passed in. The key can be "bumped" via  to perform invalidations instead of using. This avoids cache stampedes on purge for hot keys, which is usually the main goal.

If something is VERY expensive to recompute, then use a cache that is somewhat closer to a store. For instance, you might use the backend (secondary) Varnishes, which are often called a cache, but are really closer to a store, because objects tend to persist longer there (on disk).

Cache misses are normal: Avoid writing code that, on cache miss, is ridiculously slow. (For instance, it's not okay to  and assume that a memcache between the database and the user will make it all right; cache misses and timeouts eat a lot of resources. Caches are not magic.) The cluster has a limit of 180 seconds per script (see the limit in Puppet); if your code is so slow that a function exceeds the max execution time, it will be killed.

Write your queries such that an uncached computation will take a reasonable amount of time. To figure out what is reasonable for your circumstance, ask the Site performance and architecture team.

If you can't make it fast, see if you can do it in the background. For example, see some of the statistics special pages that run expensive queries. These can then be run on a dedicated time on large installations. But again, this requires manual setup work -- only do this if you have to.

Watch out for cached HTML: HTML output may sit around for a long time and still needs to be supported by the CSS and JS. Problems where old JS/CSS hang around are in some ways more obvious, so it's easier to find them early in testing, but stale HTML can be insidious!

Good example: See the TwnMainPage extension. It offloads the recalculation of statistics (site stats and user stats) to the job queue, adding jobs to the queue before the cache expires. In case of cache miss, it does not show anything; see CachedStat.php. It also sets a limit of 1 second for calculating message group stats; see SpecialTwnMainPage.php.

Bad example: a change "disabled varnish cache, where previously it was set to cache in varnish for 10 seconds. Given the amount of hits that page gets, even a 10 second cache is probably helpful."

Caching layers
The cache hit ratio should be as high as possible; watch out if you're introducing new cookies, shared resources, bundled requests or calls, or other changes that will vary requests and reduce cache hit ratio.

Caching layers that you need to care about:
 * 1) Browser caches
 * 2) native browser cache
 * 3) LocalStorage. See meta:Research:Module storage performance to see the statistical proof that storing ResourceLoader storage in LocalStorage speeds page load times and causes users to browse more.
 * 4) CDN cache (Varnish frontend)
 * The Varnish caches cache entire HTTP responses, including thumbnails of images, frequently-requested pages, ResourceLoader modules, and similar items that can be retrieved by URL. The front-end Varnishes keep these in memory. A weighted-random load balancer (LVS) distributes web requests to the front-end Varnishes.
 * Because Wikimedia distributes its front-end Varnishes geographically (in the Amsterdam & San Francisco caching centers as well as the Texas and Virginia data centers) to reduce latency to users, some engineers refer to those front-end Varnishes as "edge caching" and sometimes as a CDN (content delivery network). See MediaWiki at WMF for some details.
 * 1) object cache (backed by Memcached at WMF)
 * The object cache is a generic service used for many things, e.g., the user object cache. It's a generic service that many services can stash things in. You can also use that service as a layer in a larger caching strategy, such as what we do with ParserCache at WMF, which is stored with high-retention in a SQL database (no LRU eviction), with an additional layer of ParserCache in the object cache (Memcached).
 * 1) database's buffer pool and query cache (not directly controllable)

To learn more about how we use object cache, and to help decide which abstraction layer to use, refer to Manual:Object cache.

Think about how you will invalidate or expire content from the various caching layers. Is it by purging? Directly pushing updates (setting keys into cache)?, Or by bumping the timestamp or version number of a URL or cache key? Your application needs will determine your cache purging strategy.

Since the CDN cache serves content by URL, URLs ought to be deterministic -- that is, they should not serve different content from the same URL. Different content belongs at a different URL. This should be true especially for anonymous users. (For logged-in users, WMF's configuration contains additional wrinkles involving session cookies).

Good example: (from the mw.cookie change) of not poisoning the cache with request-specific data (when cache is not split on that variable). Background:  will use MediaWiki's cookie settings, so client-side developers don't think about this. These are passed via the ResourceLoader startup module. Issue: However, it doesn't use Manual:$wgCookieSecure (instead, this is documented not to be supported), since the default value (' ') varies by the request protocol, and the startup module does not vary by protocol. Thus, the first hit could poison the module with data that will be inapplicable for other requests.

Bad examples:
 * Don't use the word "Token" in your cookie name. We learned from the GettingStarted extension, in this case, the cookie name hit a regular expression that Varnish CDN recognises as being for logged-in page views to never cache. See the code, an initial revert, another early fix, another revert commit, the Varnish layer workaround, the followup fix, the GettingStarted fix part 1 and part 2, and the regex fix.
 * WikidataClient was fetching a large object from memcached just to decide which project group it was on, when it would have been more efficient to simply recompute it by putting the very few values needed into a global variable. (See the changeset that fixed the bug.)
 * Template parse on every page view is a bad thing, as it obviates the advantage of the parser cache (the cache that caches parsed wikitext).

Multiple data centers
WMF runs multiple data centers ("eqiad", "codfw", etc.). The plan is to move to a master/slave data center configuration (see RFC), where users read pages from caches at the closest data center, while all update activity flows to the master data center. Most MediaWiki code need not be directly aware of this, but it does have implications for how developers write code; see RFC's Design implications.
 * TODO: bring guidelines from RFC to here and other pages.

Cookies
For cookies, besides the concerns having to do with caching (see "Caching layers", above), there is also the issue that cookies bloat the payload of every request, that is, they result in more data sent back and forth, often unnecessarily. While the effect of bloated header payloads in page performance is less immediate than the impact of blowing up Varnish cache ratios, it is not less measurable or important. Please consider the usage of localStorage or sessionStorage as an alternative to cookies. Client-side storage works well in non-IE browsers, and in IE from IE8 onward.

See also Google's advice on minimizing request overhead.

Articles

 * "Measuring Site Performance at the Wikimedia Foundation", Asher Feldman, March 2012
 * "How the Technical Operations team stops problems in their tracks", Sumana Harihareswara, February 2013


 * "Scalable Web Architecture and Distributed Systems" (book chapter), Kate Matsudaira, May 2012
 * "80% of end-user response time is spent on the frontend", Marcus Ljungblad, April 2014

Talks

 * "Why your extension will not be enabled on Wikimedia wikis in its current state and what you can do about it", Roan Kattouw, Wikimania, July 2010
 * Notes from Tim Starling's security and performance talk, WMF training session, July 2011
 * MediaWiki MySQL optimization tutorial (slides), Roan Kattouw, Berlin Hackathon, June 2012
 * "MediaWiki Performance Profiling" (video) (slides), Asher Feldman, WMF Tech Days, September 2012
 * "MediaWiki Performance Techniques", Tim Starling, Amsterdam Hackathon, May 2013
 * "Let's talk about web performance" (video), Peter Hedenskog, WMF tech talk, August 2015

Meta
Sources that helped influence these guidelines, and future drafts and ideas:


 * RRC: Performance standards for new features, Ori Livneh, December 2013.
 * Notes from performance discussion, Wikimedia Architecture Summit 2014, January 2014.