Wikimedia Performance Team/Backend performance/zh

这些是MediaWiki后端开发的性能指南，旨在部署到维基媒体基金会的维基网站.

该怎么做（概要）

 * 准备好对你的代码的性能感到惊讶；预测往往是坏的.
 * 严格测量性能（在你的开发环境和生产环境中），并知道时间花在哪里.
 * 当发现延迟时，要负责任并将其作为优先事项；你对使用模式和测试内容有最好的想法.
 * 性能往往与不良工程的其他症状有关；想想根本原因.
 * MediaWiki很复杂，可以以意想不到的方式互动. 你的代码可能会在其他地方暴露出性能问题，你需要识别.
 * 没被缓存的昂贵但有价值的操作应该最多需要5秒，2秒更好.
 * 如果这还不够，可以考虑使用作业队列在后台服务器上执行任务.

一般性能原则
MediaWiki应用程序开发：


 * 延迟加载那些不会影响页面初始渲染的模块，特别是“折叠内容之上”（用户屏幕上最初可见的页面顶部部分）. 因此，尽可能少地加载JavaScript，而是按需加载更多的组件. 有关更多信息，请参见加载模块.
 * 用户应该有一个流畅的体验；不同的组件应该逐步呈现. 保留元素的定位（例如，避免在页面重排时把内容挤到后面）.

维基媒体基础设施：
 * 你的代码是在一个共享环境中运行的. 因此，长期运行的SQL查询不能作为网络请求的一部分运行. 相反，应该让它们在专门的服务器上运行（使用JobQueue），并注意死锁和锁定等待超时的问题.


 * 你创建的表将被其他代码所共享. 每个数据库查询必须能够使用其中一个索引（包括写查询！）. EXPLAIN，并在需要时创建新的索引.


 * 为你的需求选择合适的持久层. Redis作业队列，MariaDB数据库，或Swift文件存储. 只有在你的代码能够始终有效地处理缓存数据消失的情况下，才进行缓存；否则，就持久化数据.
 * 维基媒体使用并严重依赖许多不同的缓存层，所以你的代码需要在这种环境下工作！（但如果所有东西都没被缓存命中，也必须正常运行. ）
 * 缓存命中率应尽可能高；注意你是否引入了新的cookies、共享资源、捆绑请求或调用，或其他会改变请求并降低缓存命中率的改变.

测量
测量你的代码运行的速度，这样你就可以基于事实而不是迷信或感觉来做决定. 将这些原则与架构指南和安全指南一起使用. 性能（你的代码运行得（相对）快）和可扩展性（你的代码在更大的维基上和多次并发实例化时不会变得更慢）都很重要；都要测量.

百分比
总是考虑高的百分位数值而不是中位数.

网络中的性能数据有两个不同的“信号”是很常见的：一个是用户通过热缓存访问应用程序，另一个是用户通过冷缓存访问应用程序. 在有这两个信号的数据集上计算平均数是毫无意义的. 为了对数据进行快速检查，请确保你至少有10,000个数据点，并计算出50%和90%的统计量. 这两个数字可能相差很大，这可能表明您可以修复性能问题. 例如，如果网络往返速度相当慢，而且你有很多资源被获取，你会看到用户来到你的网站时，有缓存资源（从而避免了所有那些缓慢的往返）和没有缓存资源的巨大差异. 如果你有足够的数据那就更好了，你可以计算出1、50、90和99的百分位数. 一个好的经验法则是，要有统计学意义，你需要10,000个数据点来计算第90个百分点，100,000个数据点来计算第99个百分点，100万个数据点来计算第99.9个百分点.

这条经验法则使事情有点过于简单化，但对性能分析很有效. （关于这个的一些文献）

Latency
The software should run at an acceptable speed regardless of network latency, but some operations may have surprisingly variable network latency, such as looking up image files when Instant Commons is enabled. Remember that latency also depends on the user's connection. Wikimedia sites serve many people on mobile or dialup connections which are both slow and have a high round-trip time. There are also fast connections with long RTT, for example satellite modem where 2 second RTT is not unusual.

There can be some ways to manage latency:
 * first, be aware of which code paths are meant to always be fast (database, memcache) and which may be slow (fetching File info or spam blacklists that might be cross-wiki and go over the internet)
 * when creating a code path that may be intermittently slow, document this fact.
 * be careful not to pile on requests -- for instance an external search engine might be slow to return under poor conditions while it's normally fast. Bottlenecking may cause all web servers to get caught up.
 * consider breaking operations into smaller pieces which can be separated
 * alternately, consider running operations in parallel -- this can be tricky though, as MediaWiki currently does not have good primitives for doing multiple HTTP fetches at once

(Latency of course depends on the user's connection somewhat. Wikimedia sites serve many people on mobile or dialup connections. This goal is reasonable up to 300 milliseconds round-trip time or so. If someone's on satellite with 2000ms RTT, then they can expect everything to be slow, but that's a small minority of users.)

In the worst case a request that is expensive but valuable and misses or cannot be cached should take at most 5 seconds of server compute time. Strive for two seconds.


 * example: saving a new edit to a page
 * example: rendering a video thumbnail

How often will my code run?
It's important to think about how often the site or the browser will have to execute your code. Here are the main cases:
 * Always. This is obviously the most critical.
 * On page views (when actually sending HTML) -- that is, nearly always, unless the user gets a  code or some such. Nearly every time an anonymous (not logged in) reader reads a Wikipedia page, they will get canned, pre-rendered HTML sent to them. If you add new code that runs every time anyone views a page, watch out.
 * When rendering page content. MediaWiki (as configured on Wikimedia sites) usually has to render page content (on the server side) only after an edit or after a cache miss, so renders are far less frequent than page views. For that reason, more expensive operations are acceptable. Rendering is typically not done while a user is waiting -- unless the user just edited the page, which leads to...
 * When saving an edit. This is the rarest code path, and the one on which the largest delays are acceptable. Users tend to accept a longer wait after performing an action that "feels heavy", like editing a page. (But Wikimedia wants to encourage more people to edit and upload, so this norm may change.)

Also watch out for failure code paths. Watch out, for instance, for a 'tight retry loop', which could cause hundreds of servers to get stuck in an error cycle. If possible, after failure, you should instead reschedule and/or cache the error for a short time, before trying again. (Incorrectly cached errors are also dangerous.)

You're not alone
Work with Performance Team to understand general performance targets before you start architecting your system. For example, a user-facing application might have an acceptable latency of 200 ms while a database might have something like 20 ms or less, especially if further access is decided based on the results of previous queries. You don't want to prematurely optimize, but you need to understand if your targets are physically possible.

You might not need to design your own backend; consider using an existing service, or having someone design an interface for you. Consider modularization. Performance is hard; do not try to reinvent the wheel.

Use of shared resources and execution context
Be mindful that your code is running in an environment that uses shared resources such as databases, queues, and cache servers. Thus, long-running queries (e.g. 5+ seconds) should run on dedicated servers. For example, regeneration of complex special page lists use the "vslow" database servers. Watch out query patterns that are prone to deadlocks and lock-wait timeouts; long-running transactions, inefficient WHERE clauses in either write or locking SELECT queries, insertion queries preceded by "gap locking" queries in the same transaction. When assessing whether your queries will take "a long time" or cause contention, profile them. These numbers will always be relative to the performance of the server in general, and to how often it will be run.

The main execution context is that of the response to a single web request, with the other context being CLI mode (e.g. Maintenance scripts). Be mindful of the fact that various extensions can add extra queries and updates via hooks. To minimize the risk of timeouts, deadlocks, and half completed updates due to interactions between core and extensions, one should strive for speed and simplicity of RDBMS and object store writes during the main transaction round. Updates that take non-trivial time or are complex should use DeferredUpdates or the JobQueue when possible, to better isolate different modules from one another. Use simple cache purges over re-computations when data entries change to avoid slowdowns (also to avoid problems with race conditions as well as multi-datacenter replication).

Rate limiting
If your product exposes new user actions that make database modifications beyond the standard page creation / page editing mechanism, then firstly consider whether this is appropiate and scalable. You're going to have a lot less maintenance overhead and operationa risk if you adopt "Everything is a wiki page". See also Choose boring technology by Dan McKinley.

If you do have to expose new "write" actions, make sure a rate limit is applied.

Example:


 * UrlShortener exposes API to create new short URLs, which needs a rate limit. Typically powered by . T133109

For expensive computations that are not write actions, such as power user features that may expose slow or expensive computations,, consider implementing a throttle based on PoolCounter to limit overall server load.

Example:


 * Special:Contributions exposes a database read query that can be slow. This is rate limited by PoolCounter. See also T234450 and T160985.

Long-running queries
Long-running queries that do reads should be on a dedicated server, as Wikimedia does with analytics. MySQL uses snapshots for SELECT queries, and the snaphotting persists until COMMIT if BEGIN was used. Snapshots implement REPEATABLE-READ by making sure that, in the transaction, the client sees the database as it existed in single point in time (the time of the first SELECT). Keeping one transaction open for more than (ideally) seconds is a bad idea on production servers. As long as a REPEATABLE-READ transaction is open (that did at least one query), MySQL has to keep the old versions of rows around in the index that were since deleted or changed because the long-running transaction should see them in any relevant SELECT queries. These rows can clutter up the index of hot tables that have nothing to do with the long-running query. There are research databases - use those. Special pages can use the "vslow" query group to be mapped to dedicated databases.

Locking
Wikimedia's MySQL/MariaDB servers use InnoDB, which supports repeatable read transactions. Gap locking is part of "Next-key Locks", which is how InnoDB implements REPEATABLE READ transaction isolation level. At Wikimedia, repeatable read transaction isolation is on by default (unless the code is running in Command-Line Interaction (CLI) mode, as with the maintenance scripts), so all the SQL SELECT queries you do within one request will automatically get bundled into a transaction. For more information, see the Wikipedia articles on en:Isolation (database systems) and look up repeatable read (snapshot isolation), to understand why it's best to avoid phantom reads and other phenomena.
 * Anytime you are doing a write/delete/update query that updates something, it will have gap locks on it unless it is by a unique index. Even if you are not in repeatable read, even if you are doing one SELECT, it will be internally consistent if, for example, it returns multiple rows.Thus: do your operations, e.g., DELETE or UPDATE or REPLACE, on a unique index, such as a primary key. The situations where you were causing gap locks and you want to switch to doing operations on a primary key are ones where you want to do a SELECT first to find the ID to operate on; this can't be SELECT FOR UPDATE since it has the same locking problems. This means you might have to deal with race condition problems, so you may want to use INSERT IGNORE instead of INSERT.

Here's a common mistake that causes inappropriate locking: take a look at, for instance, the table  (line 208 of tables.sql), in which you have a three-column table that follows the "Entity-value-attribute" pattern.
 * 1) Column 1: the object/entity (here, UserID)
 * 2) Column 2: the name of a property for that object
 * 3) Column 3: the value associated with that property for the object

That is, you have a bunch of key-values for each entity that are all in one table. (This table schema is kind of an antipattern. But at least this is a reasonably specific table that just holds user preferences.)In this situation, it's tempting to create a workflow for user preference changes that deletes all the rows for that userID, then reinserts new ones. But this causes a lot of contention for the database. Instead, change the query so you only delete by the primary key. SELECT it first, and then, when you INSERT new values, you can use INSERT IGNORE (which ignores the insert if the row already exists). This is more efficient. Alternatively, you can use a JSON blob, but this is hard to use in JOINs or WHERE clauses in single entries. See "On MySQL locks" for some explanation of gap locks.

Transactions
Every web request and every database operation, in general, should occur within a transaction. However, be careful when mixing a database transaction with an operation on something else, such as another database transaction or accessing an external service like Swift. Be particularly careful with locking order. Every time you update or delete or insert anything, ask:
 * what you are locking?
 * are there other callers?
 * what are you doing, after making the query, all the way to making the commit?

Avoid excessive contention. Avoid locking things in an unnecessary order, especially when you're doing something slow and committing at the end. For instance, if you have a counter column that you increment every time something happens, then DON'T increment it in a hook just before you parse a page for 10 seconds.

Do not use READ UNCOMMIT (if someone updates a row in a transaction and has not committed it, another request can still see it) or SERIALIZABLE (every time you do SELECT, it's as if you did SELECT FOR UPDATE, a.k.a. lock-and-share mode -- locking every row you select until you commit the transaction -leads to lock-wait timeouts and deadlocks).

Examples
Good example:. When message blobs (JSON collections of several translations of specific messages) change, it can lead to updates of database rows, and the update attempts can happen concurrently. In a previous version of the code, the code locked a row in order to write to it and avoid overwrites, but this could lead to contention. In contrast, in the current codebase, the  method performs a repeated attempt at update until it determines (by checking timestamps) that there will be no conflict. See lines 212-214 for an explanation and see line 208-234 for the outer do-while loop that processes  until it is empty.

Bad example: The former structure of the ArticleFeedbackv5 extension. Code included:

Bad practices here include the multiple counter rows with id = '0' updated every time feedback is given on any page, and the use of DELETE + INSERT IGNORE to update a single row. Both result in locks that prevent >1 feedback submission saving at a time (due to the use of transactions, these locks persist beyond than the time needed by the individual statements). See minutes 11-13 of Asher Feldman's performance talk & page 17 of his slides for more explanation.

Indexing
The tables you create will be shared by other code. Every database query must be able to use one of the indexes (including write queries!).

Unless you're dealing with a tiny table, you need to index writes (similarly to reads). Watch out for deadlocks and for lock-wait timeouts. Try to do updates and deletes by primary query, rather than some secondary key. Try to avoid UPDATE/DELETE queries on rows that do not exist. Make sure join conditions are cleanly indexed.

You cannot index blobs, but you can index blob prefixes (the substring comprising the first several characters of the blob).

Compound keys - namespace-title pairs are all over the database. You need to order your query by asking for namespace first, then title!

Use EXPLAIN & MYSQL DESCRIBE query to find out which indexes are affected by a specific query. If it says "Using temporary table" or "Using filesort" in the EXTRA column, that's often bad! If "possible_keys" is NULL, that's often bad (small sorts and temporary tables are tolerable though). An "obvious" index may not actually be used due to poor "selectivity". See the Performance profiling for Wikimedia code guide, and for more details, see Roan Kattouw's 2010 talk on security, scalability and performance for extension developers, Roan's MySQL optimization tutorial from 2012 (slides), and Tim Starling's 2013 performance talk.

Indexing is not a silver bullet; more isn't always better. Once an index gets big enough that it doesn't fit into RAM anymore, it slows down dramatically. Additionally, an index can make reads faster, but writes slower.

Good example: See the  and   tables. One of them also offers a reverse index, which gives you a cheap alternative to SORT BY.

Bad example: See this changeset (a fix). As the note states, "needs to be id/type, not type/id, according to the definition of the relevant index in :  ". Rather than using the index that was built on the id-and-type combination, the previous code (that this is fixing) attempted to specify an index that was "type-and-id", which did not exist. Thus, MariaDB did not use the index, and thus instead tried to order the table without using the index, which caused the database to try to sort 20 million rows with no index.

Persistence layer
Choose the right persistence layer for your needs: job queue (like Redis), database (like MariaDB), or file store (like Swift). In some cases, a cache can be used instead of a persistence layer.

Wikimedia sites makes use of local services including Redis, MariaDB, Swift, and memcached. (Also things like Parsoid that plug in for specific things like VisualEditor.) They are expected to reside on a low-latency network. They are local services, as opposed to remote services like Varnish.

People often put things into databases that ought to be in a cache or a queue. Here's when to use which:
 * 1) MySQL/MariaDB database - longterm storage of structured data and blobs.
 * 2) Swift file store - longterm storage for binary files that may be large. See Media storage for details.
 * 3) Redis jobqueue - you add a job to be performed, the job is done, and then the job is gone. You don't want to lose the jobs before they are run. But you are ok with there being a delay.
 * (in the future maybe MediaWiki should support having a high-latency and a low-latency queue.)

A cache, such as memcached, is storage for things that persist between requests, and that you don't need to keep - you're fine with losing any one thing. Use memcached to store objects if the database could recreate them but it would be computationally expensive to do so, so you don't want to recreate them too often. You can imagine a spectrum between caches and stores, varying on how long developers expect objects to live in the service before getting evicted; see the Caching layers section for more.

Permanent names: In general, store resources under names that won't change. In MediaWiki, files are stored under their "pretty names", which was probably a mistake - if you click Move, it ought to be fast (renaming title), but other versions of the file also have to be renamed. And Swift is distributed, so you can't just change the metadata on one volume of one system.

Object size: Memcached sometimes gets abused by putting big objects in there, or where it would be cheaper to recalculate than to retrieve. So don't put things in memcached that are TOO trivial - that causes an extra network fetch for very little gain. A very simple lookup, like "is a page watched by current user", does not go in the cache, because it's indexed well so it's a fast database lookup.

When to use the job queue: If the thing to be done is fast (~5 milliseconds) or needs to happen synchronously, then do it synchronously. Otherwise, put it in the job queue. You do not want an HTTP request that a user is waiting on to take more than a few seconds. Examples using the job queue:


 * Updating link table on pages modified by a template change
 * Transcoding a video that has been uploaded

HTMLCacheUpdate is synchronous if there are very few backlinks. Developers also moved large file uploads to an asynchronous workflow because users started experiencing timeouts.

In some cases it may be valuable to create separate classes of job queues -- for instance video transcoding done by Extension:TimedMediaHandler is stored in the job queue, but a dedicated runner is used to keep the very long jobs from flooding other servers. Currently this requires some manual intervention to accomplish (see TMH as an example).

Extensions that use the job queue include RenameUser, TranslationNotification, Translate, GWToolset, and MassMessage.

Additional examples:
 * large uploads. UploadWizard has API core modules and core jobs take care of taking chunks of file, reassembling, turning into a file the user can view. The user starts defining the description, metadata, etc., and the data is sent 1 chunk at a time.
 * purging all the pages that use a template from Varnish & bumping the  column in the database, which tells parser cache it's invalid and needs to be regenerated
 * refreshing links: when a page links to many pages, or it has categories, it's better to refresh links or update categories after saving, then propagate the change. (For instance, adding a category to a template or removing it, which means every page that uses that template needs to be linked to the category -- likewise with files, externals, etc.)

How slow or contentious is the thing you are causing? Maybe your code can't do it on the same web request the user initiated. You do not want an HTTP request that a user is waiting on to take more than a few seconds.

Example: You create a new kind of notification. Good idea: put the actual notification action (emailing people) or adding the flags (user id n now has a new notification!) into the jobqueue. Bad idea: putting it into a database transaction that has to commit and finish before the user gets a response.

Good example: The Beta features extension lets a user opt in for a "Beta feature" and displays, to the user, how many users have opted in to each of the currently available Beta features. The preferences themselves are stored in  table. However, directly counting the number of opted-in users every time that count is displayed would not have acceptable performance. Thus, MediaWiki stores those counts in the database in the  table, but they are also stored in memcached. It's important to immediately update the user's own preference and be able to display the updated preference on page reload, but it's not important to immediately report to the user the increase or decrease in the count, and this information doesn't get reported in Special:Statistics.

Therefore, BetaFeatures updates those user counts every half hour or so, and no more. Specifically, the extension creates a job that does a SELECT query. Running this query takes a long time - upwards of 5 minutes! So it's done once, and then on the next user request, the result gets cached in memcached for the page https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-betafeatures. (They won't get updated at all if no one tries to fetch them, but that is unlikely.) If a researcher needs a realtime count, they can directly query the database outside of MediaWiki application flow.

Code: UpdateBetaFeatureUserCountsJob.php and BetaFeaturesHooks.php.

Bad example: add one?

Multiple datacenters
See Database transactions

Once CDN requests reach (non-proxy) origin servers, the responding service (such as Apache/MediaWiki, Thumbor, or HyperSwitch) must limit its own read operations from persistence layers to only involve the local datacenter. The same applies to write operations to caching layers, with the exception of allowing asynchronous purging broadcasts or asynchronous replication of caches that are profoundly expensive to regenerate from scratch (e.g. parser cache in MySQL). Write operations to source data persistence layers (MySQL, Swift, Cassandra) are more complex, but generally should only happen on HTTP POST or PUT requests from end-users and should be synchronous in the local datacenter, with asynchronous replication to remote datacenters. Updates to search index persistence layers (Elastic, BlazeGraph) can use either this approach, the Job queue, or Change propagation. The enqueue operations to the job/propagation systems are themselves synchronous in the local datacenter (with asynchronous replication to the remote ones).

HTTP POST/PUT requests to MediaWiki will be routed to the master datacenter and the MediaWiki job queue workers only run there as well (e.g. where the logic of  executes). An independent non-MediaWiki API service might be able to run write APIs correctly in multiple datacenters at once if it has very limited semantics and has no relational integrity dependencies on other source data persistence layers. For example, if the service simply takes end-user input and stores blobs keyed under new UUIDs, there is no way that writes can conflict. If updates or deletions are later added as a feature, then Last-Write-Wins might be considered a "correct" approach to handling write conflicts between datacenters (e.g. if only one user has permission to change any given blob then all conflicts are self-inflicted). If write conflicts are not manageable, then such API requests should be routed to the master datacenter.

Work involved during cache misses
Wikimedia uses and depends heavily on many different caching layers, so your code needs to work in that environment! (But it also must work if everything misses cache.)

Cache-on-save: Wikimedia sites use a preemptive cache-repopulation strategy: if your code will create or modify a large object when the user hits "save" or "submit", then along with saving the modified object in the database/filestore, populate the right cache with it (or schedule a job in the job queue to do so). This will give users faster results than if those large things were regenerated dynamically when someone hit the cache. Localization (i18n) messages, SpamBlacklist data, and parsed text (upon save) are all aggressively cached. (See "Caching layers" for more.)

At the moment, this strategy does not work well for memcached for Wikimedia's multi-datacenter use case. A workaround when using WANObjectCache is to use  as normal, but with "lockTSE" set and with a "check" key passed in. The key can be "bumped" via  to perform invalidations instead of using. This avoids cache stampedes on purge for hot keys, which is usually the main goal.

If something is VERY expensive to recompute, then use a cache that is somewhat closer to a store. For instance, you might use the backend (secondary) Varnishes, which are often called a cache, but are really closer to a store, because objects tend to persist longer there (on disk).

Cache misses are normal: Avoid writing code that, on cache miss, is ridiculously slow. (For instance, it's not okay to  and assume that a memcache between the database and the user will make it all right; cache misses and timeouts eat a lot of resources. Caches are not magic.) The cluster has a limit of 180 seconds per script (see the limit in Puppet); if your code is so slow that a function exceeds the max execution time, it will be killed.

Write your queries such that an uncached computation will take a reasonable amount of time. To figure out what is reasonable for your circumstance, ask the Site performance and architecture team.

If you can't make it fast, see if you can do it in the background. For example, see some of the statistics special pages that run expensive queries. These can then be run on a dedicated time on large installations. But again, this requires manual setup work -- only do this if you have to.

Watch out for cached HTML: HTML output may sit around for a long time and still needs to be supported by the CSS and JS. Problems where old JS/CSS hang around are in some ways more obvious, so it's easier to find them early in testing, but stale HTML can be insidious!

Good example: See the TwnMainPage extension. It offloads the recalculation of statistics (site stats and user stats) to the job queue, adding jobs to the queue before the cache expires. In case of cache miss, it does not show anything; see CachedStat.php. It also sets a limit of 1 second for calculating message group stats; see SpecialTwnMainPage.php.

Bad example: a change "disabled varnish cache, where previously it was set to cache in varnish for 10 seconds. Given the amount of hits that page gets, even a 10 second cache is probably helpful."

Caching layers
The cache hit ratio should be as high as possible; watch out if you're introducing new cookies, shared resources, bundled requests or calls, or other changes that will vary requests and reduce cache hit ratio.

Caching layers that you need to care about:
 * 1) Browser caches
 * 2) native browser cache
 * 3) LocalStorage. See meta:Research:Module storage performance to see the statistical proof that storing ResourceLoader storage in LocalStorage speeds page load times and causes users to browse more.
 * 4) Front-end Varnishes
 * The Varnish caches cache entire HTTP responses, including thumbnails of images, frequently-requested pages, ResourceLoader modules, and similar items that can be retrieved by URL. The front-end Varnishes keep these in memory. A weighted-random load balancer (LVS) distributes web requests to the front-end Varnishes.
 * Because Wikimedia distributes its front-end Varnishes geographically (in the Amsterdam & San Francisco caching centers as well as the Texas and Virginia data centers) to reduce latency to users, some engineers refer to those front-end Varnishes as "edge caching" and sometimes as a CDN (content delivery network). See MediaWiki at WMF for some details.
 * 1) Back-end Varnishes
 * If a frontend Varnish doesn't have a response cached, it passes the request to the back-end Varnishes via hash-based load balancing (on the hash of the URI). The backend Varnishes hold more responses, storing them ondisk. Every URL is on at most one backend Varnish.
 * 1) object cache (implemented in memcached in WMF production, but other implementations include Redis, APC, etc.)
 * The object cache is a generic service used for many things, e.g., the user object cache. It's a generic service that many services can stash things in. You can also use that service as a layer in a larger caching strategy, which is what the parser cache does in Wikimedia's setup. One layer of the parser cache lives in the object cache.
 * Generally, don't disable the parser cache. See: How to use the parser cache.
 * 1) database's buffer pool and query cache (not directly controllable)

How do you choose which cache(s) to use, and how to watch out for putting inappropriate objects into a cache? See "Picking the right cache: a guide for MediaWiki developers".

Figure out how to appropriately invalidate content from caching by purging, directly updating (pushing data into cache), or otherwise bumping timestamps or versionIDs. Your application needs will determine your Cache purging strategy.

Since the Varnishes serve content per URL, URLs ought to be deterministic -- that is, they should not serve different content from the same URL. Different content belongs at a different URL. This should be true for anonymous users; for logged-in users, Wikimedia's configuration contains additional wrinkles involving cookies and the caching layers.

Good example: (from the mw.cookie change) of not poisoning the cache with request-specific data (when cache is not split on that variable). Background:  will use MediaWiki's cookie settings, so client-side developers don't think about this. These are passed via the ResourceLoader startup module. Issue: However, it doesn't use Manual:$wgCookieSecure (instead, this is documented not to be supported), since the default value (' ') varies by the request protocol, and the startup module does not vary by protocol. Thus, the first hit could poison the module with data that will be inapplicable for other requests.

Bad examples:
 * GettingStarted error: Don't use Token in your cookie name. In this case, the cookie name hit a regular expression that Varnish uses to know what to cache and not cache. See the code, an initial revert, another early fix, another revert commit, the Varnish layer workaround, the followup fix, the GettingStarted fix part 1 and part 2, and the regex fix.
 * WikidataClient was fetching a large object from memcached just to decide which project group it was on, when it would have been more efficient to simply recompute it by putting the very few values needed into a global variable. (See the changeset that fixed the bug.)
 * Template parse on every page view is a bad thing, as it obviates the advantage of the parser cache (the cache that caches parsed wikitext).

Multiple data centers
WMF runs multiple data centers ("eqiad", "codfw", etc.). The plan is to move to a master/slave data center configuration (see RFC), where users read pages from caches at the closest data center, while all update activity flows to the master data center. Most MediaWiki code need not be directly aware of this, but it does have implications for how developers write code; see RFC's Design implications.
 * ''TODO: bring guidelines from RFC to here and other pages.

Cookies
For cookies, besides the concerns having to do with caching (see "Caching layers", above), there is also the issue that cookies bloat the payload of every request, that is, they result in more data sent back and forth, often unnecessarily. While the effect of bloated header payloads in page performance is less immediate than the impact of blowing up Varnish cache ratios, it is not less measurable or important. Please consider the usage of localStorage or sessionStorage as an alternative to cookies. Client-side storage works well in non-IE browsers, and in IE from IE8 onward.

See also Google's advice on minimizing request overhead.

Technical documents

 * WMF usage of Graphite
 * MediaWiki & Wikimedia use cases for Redis
 * API:Etiquette
 * Job class reference
 * Manual:Job queue (and Manual:Job queue/For developers)
 * Manual:How to debug
 * Manual:Profiling
 * Performance profiling for Wikimedia code

Talks

 * "Why your extension will not be enabled on Wikimedia wikis in its current state and what you can do about it", Roan Kattouw, Wikimania, July 2010
 * Notes from Tim Starling's security and performance talk, WMF training session, July 2011
 * MediaWiki MySQL optimization tutorial (slides), Roan Kattouw, Berlin Hackathon, June 2012
 * "MediaWiki Performance Profiling" (video) (slides), Asher Feldman, WMF Tech Days, September 2012
 * "MediaWiki Performance Techniques", Tim Starling, Amsterdam Hackathon, May 2013
 * "Let's talk about web performance" (video), Peter Hedenskog, WMF tech talk, August 2015

Posts and discussions

 * "Measuring Site Performance at the Wikimedia Foundation", Asher Feldman, March 2012
 * "How the Technical Operations team stops problems in their tracks", Sumana Harihareswara, February 2013
 * Requests for comment/Performance standards for new features, December 2013
 * Notes from performance discussion, Architecture Summit 2014, January 2014

General web performance

 * "Scalable Web Architecture and Distributed Systems" (book chapter), Kate Matsudaira, May 2012
 * "80% of end-user response time is spent on the frontend", Marcus Ljungblad, April 2014