Picking the right cache

From MediaWiki.org
Jump to navigation Jump to search

MediaWiki has a variety of caching and persistence layers. Each layer has its own advantages and disadvantages, uses and misuses. In general, when choosing where to store data in MediaWiki, you should take into consideration the following:

  • Is the data original, or is it generated from other data?
    • How long does it take to regenerate the data?
  • How big is the data, and what format is it in?
  • For how long is the data expected to be stored and retrievable?

The goal of this document is to force you to answer these questions, and then provide suggestions on where your data should be stored in MediaWiki.

Summary[edit]

Layer name Description Structured Persistent Per-user
LocalStorage This is a client-side storage layer in the user's browser. It is not guaranteed to be present, since it is not implemented in all browsers. No Yes Yes
Browser cache This is an HTML-only cache implemented in the user's browser. It caches web pages delivered by MediaWiki, and can be controlled via caching headers sent to the browser No No Yes
HTTP cache This is an HTML-only server-side reverse proxy that is implemented by a software separate from MediaWiki, usually Varnish or Squid. It caches web pages delivered by MediaWiki, and while it can be controlled somewhat from MediaWiki, in general is acts autonomously in front of your web server. No No No
Object cache The first application-level cache in MediaWiki, and is completely server-side. It is accessible via $wgMemc or wfGetMainCache(), and is an instance of the BagOStuff class. Useful for temporary key-value storage. No No No
Session data Similar to the object cache, but more persistent and on a per-user level. In some cases this layer is not persistent, but in any situation where this layer is used, it is unimportant. No Yes Yes
Key-value store
(not implemented)
A key-based persistence layer that stores unstructured data in a distributed manner. No Yes No
SQL store The database. Yes Yes No

Interface v. implementation[edit]

It is important to note that there is a fundamental separation between the interface of a caching layer and its implementation. The difference between the two is that the interface is the contract between the programmer and MediaWiki. When a caching layer has a certain property in its interface, it is guaranteed to follow that property.

For example, the object cache layer is listed in the above summary as being "not persistent". This is an interface detail. It means that when you use that caching layer, you should expect data to be erased at any moment. However, it is implementation-defined as to when and how data is erased. In other words, it is actually possible for a given implementation of the object cache layer to be persistent! All it has to do is never erase data. However, nonetheless, programmers should still consider it to be non-persistent, and use the layer as if data could disappear at any moment.

In MediaWiki, every caching layer has a number of implementations. Each implementation functions differently, but is guaranteed to follow the contract of the layer. Here are some example implementations for each layer:

LocalStorage and Browser cache
Google Chrome
Firefox
HTTP cache
Varnish
Squid
Object cache
Memcached
Redis
Session data
Built-in PHP file-based sessions
Redis
Key-value store
Redis
MemcacheDB
DynamoDB
SQL store
MySQL
MariaDB
PostgreSQL

Notice that some implementations can be used for different layers, e.g., Redis functions as both an object cache, a session data store, and a key-value store. This is because of the idea explained above. Since an object cache can be persistent (it is just assumed not to be), Redis can be used to implement it.

Browser-level storage[edit]

The first step in caching is the user's browser. The browser has LocalStorage (among others, such as WebSQL, which MediaWiki does not use at the moment) and an HTML page cache. Some properties of browser caches are:

  • They are not just per-user, but per-browser, meaning a user can change computers and the cache will be different. This cache should only be used for data that can be re-fetched from the server if necessary.
  • They are entirely client-side. Thus there is no download necessary, and accessing the cache is very fast for the user. Of course, this means that you do not have access to any server-side resources.
  • The vary greatly between implementations. Unfortunately, not every browser implements caching the same way. There are standard interfaces, as set forth by W3, and in many cases the browser will stick to the standard, but always expect the unexpected.

MediaWiki internally uses LocalStorage as a means of caching JavaScript Resource Loader modules, that way they do not have to be re-downloaded for every page the user visits. Accordingly, this layer is useful for data that is not expected to change often, and thus it is not necessary to contact the server every time the data is needed.

Object cache[edit]

The object cache is perhaps the most used caching layer in MediaWiki. It is usually implemented by either Memcached or Redis, and functions as a quick way to cache results of expensive operations. This layer is not persistent, and you should expect data stored in this layer to disappear at any time, without warning! Of course, since it is a cache, the data will not disappear immediately, and in general you can expect the object cache to store data for a fairly long period of time.