Picking the right cache
|This page is currently a draft.|
More information and discussion about changes to this draft may be on the discussion page.
MediaWiki has a variety of caching and persistence layers. Each layer has its own advantages and disadvantages, uses and misuses. In general, when choosing where to store data in MediaWiki, you should take into consideration the following:
- Is the data original, or is it generated from other data?
- How long does it take to regenerate the data?
- How big is the data, and what format is it in?
- For how long is the data expected to be stored and retrievable?
The goal of this document is to force you to answer these questions, and then provide suggestions on where your data should be stored in MediaWiki.
|LocalStorage||This is a client-side storage layer in the user's browser. It is not guaranteed to be present, since it is not implemented in all browsers.||No||Yes||Yes|
|Browser cache||This is an HTML-only cache implemented in the user's browser. It caches web pages delivered by MediaWiki, and can be controlled via caching headers sent to the browser||No||No||Yes|
|HTTP cache||This is an HTML-only server-side reverse proxy that is implemented by a software separate from MediaWiki, usually Varnish or Squid. It caches web pages delivered by MediaWiki, and while it can be controlled somewhat from MediaWiki, in general is acts autonomously in front of your web server.||No||No||No|
|Object cache||The first application-level cache in MediaWiki, and is completely server-side. It is accessible via $wgMemc or wfGetMainCache(), and is an instance of the BagOStuff class. Useful for temporary key-value storage.||No||No||No|
|Session data||Similar to the object cache, but more persistent and on a per-user level. In some cases this layer is not persistent, but in any situation where this layer is used, it is unimportant.||No||Yes||Yes|
|A key-based persistence layer that stores unstructured data in a distributed manner.||No||Yes||No|
|SQL store||The database.||Yes||Yes||No|
Interface v. implementation
It is important to note that there is a fundamental separation between the interface of a caching layer and its implementation. The difference between the two is that the interface is the contract between the programmer and MediaWiki. When a caching layer has a certain property in its interface, it is guaranteed to follow that property.
For example, the object cache layer is listed in the above summary as being "not persistent". This is an interface detail. It means that when you use that caching layer, you should expect data to be erased at any moment. However, it is implementation-defined as to when and how data is erased. In other words, it is actually possible for a given implementation of the object cache layer to be persistent! All it has to do is never erase data. However, nonetheless, programmers should still consider it to be non-persistent, and use the layer as if data could disappear at any moment.
In MediaWiki, every caching layer has a number of implementations. Each implementation functions differently, but is guaranteed to follow the contract of the layer. Here are some example implementations for each layer:
- LocalStorage and Browser cache
- Google Chrome
- HTTP cache
- Object cache
- Session data
- Built-in PHP file-based sessions
- Key-value store
- SQL store
Notice that some implementations can be used for different layers, e.g., Redis functions as both an object cache, a session data store, and a key-value store. This is because of the idea explained above. Since an object cache can be persistent (it is just assumed not to be), Redis can be used to implement it.
The first step in caching is the user's browser. The browser has LocalStorage (among others, such as WebSQL, which MediaWiki does not use at the moment) and an HTML page cache. Some properties of browser caches are:
- They are not just per-user, but per-browser, meaning a user can change computers and the cache will be different. This cache should only be used for data that can be re-fetched from the server if necessary.
- They are entirely client-side. Thus there is no download necessary, and accessing the cache is very fast for the user. Of course, this means that you do not have access to any server-side resources.
- The vary greatly between implementations. Unfortunately, not every browser implements caching the same way. There are standard interfaces, as set forth by W3, and in many cases the browser will stick to the standard, but always expect the unexpected.
The object cache is perhaps the most used caching layer in MediaWiki. It is usually implemented by either Memcached or Redis, and functions as a quick way to cache results of expensive operations. This layer is not persistent, and you should expect data stored in this layer to disappear at any time, without warning! Of course, since it is a cache, the data will not disappear immediately, and in general you can expect the object cache to store data for a fairly long period of time.