Object cache/cs

MediaWiki používá kešování u mnoha komponent a v několika vrstvách. Tahle stránka dokumentuje jaké způsoby kešování využíváme v PHP aplikaci MediaWiki.

V kostce
V rámci kešování objektů MediaWiki pracuje se dvěma typy úložišť:


 * 1) Keš, je místo kam se ukládá výsledek výpočtu a data získaná z externího zdroje (abychom dosáhli co nejvyšší rychlosti). Z pohledu počítačů jde o skutečnou “keš”.
 * 2) Druhým místem je steš (angl. stash znamená v češtině skrýš), kam se ukládají nenáročná data, která nikde jinde uložená nejsou. Někdy se také můžete setkat z angl. označením hoard, neboli zásobárna objektů. Je v ní také to, co není žádoucí opakovaně generovat, protože by se tím zbytečně zatěžoval server.

Terminologie
Klíč, přes který se přistupuje do keše (mezipaměti) musí být "verifiable", neboli lehce „ověřitelný”, aby si mohla aplikace snadno a rychle ověřit, že jeho hodnota ještě není zastaralá.

To platí takovém v případě, kdy klíč zastupuje pouze jedinou možnou hodnotu. Například výpočet čísla π na 100 desetiných míst, lze uložit do keše pod klíčem. Výsledek této operace můžeme bez problému uložid do úložiště s vysokorychlostním přístupem bez jakékoliv další koordinace s jinými komponentami, jelikož ho už nikdy víc nebude potřeba aktualizovat či odstranit. A pokud jeho platnost v mezipaměti vyprší, dá se vypočítat znovu a výsledek bude stále stejný. Totéž platí i pro ukládání wikitextu určité revize stránky. Obsah revize 123 bude také provždy stejný. Pokud tedy aplikace zná ID revize, kterou hledá, lze považovat i klíč typu  za snadno ověřitelný kešovací klíč.



Ukládání strukturovaných dat
MediaWiki podporuje ukládání jednoduchých objektů (jako jsou booleanské hodnoty, čísla a řetězce), tak složitě strukturovaných, do sebe zanořených polí. Technicky je možné ukládat také surové objekty (stdClass) a instance libovolných tříd, pokud jsou serializované přes PHP, ovšem tenhle zastaralý přístup nedoporučujeme používat. Jednal z bezpečnostních důvodů (T161647) a ale také s ohledem na stabilitu kódu, protože je pak velmi obtížné změnit třídu tak, aniž by tím nebyla narušena dopředná či zpětná kompatibilita kódu s objekty této třídy které jsou uloženy keši (např. T264257 atd.).

Kód, který se zapisuje do keše nebo z ní načítá, musí být vždy kompatibilní jak dopředu, tak zpětně. Typically, the code reading cached data will have the same or a newer than the code that wrote the cached data (requiring backward compatible read logic, or forwards-compatible writing ahead of time), but there are two important scenarios where the opposite is also needed:
 * 1) During a deployment, different servers and data centers briefly run old and new versions side-by-side with the same shared database and caching services.  As such, a cache may very well be written to and read from both old and new versions concurrently during this time.
 * 2) Site operators must be able to roll back the last deployment or upgrade of the software to the previous version.

Best practice:


 * Avoid placing version constants inside cache keys. Make use of the  idiom and its "version" option, which automatically takes care of forward- and backward compatibility, including invalidating cache keys across versions of the software.
 * Avoid storing class objects. Store primitives or (nested) arrays of primitives. Classes should be converted to and from simple arrays, and stored either as those simple arrays or as a string of JSON. The encoding and serialising for this must be done by the consumer and is not done by e.g., the BagOStuff or WANObjectCache interfaces. (In the future, MediaWiki may do this automatically for classes that implement JsonUnserializable, which was introduced in MediaWiki 1.36).

Services
These are the abstract stores available to MediaWiki features, see the Uses section for examples.

Local server

 * Accessed through.
 * Configurable: No (automatically detected).
 * Behaviour: very fast (<0.1ms, from local memory), low capacity, not shared between application servers.

Values in this store are only kept in the local RAM of any given web server (typically using php-apcu). These are not replicated to the other servers or clusters, and have no update or purge coordination options.

If the web server does not have php-apcu (or equivalent) installed, this interface falls back to an empty placeholder where no keys are stored. It is also set to an empty interface for maintenance scripts and other command-line modes. MediaWiki supports APCu, and WinCache.

Local cluster

 * Accessed through.
 * Configurable: Yes, via $wgMainCacheType.
 * Behaviour: fast (~1ms, from service memory), medium capacity, shared between application servers but not replicated across data centers.

Mostly for internal use only, to offer limited coordination of actions within a given data centre. This uses the same storage backend as WAN cache, but under a different key namespace, and without any ability to broadcast purges to other data centres.

The local cluster cache is typically backed by Memcached, but may also use the database.

WAN cache

 * Accessed through.
 * Configurable: Yes, via $wgMainWANCache, which defaults to $wgMainCacheType.
 * Behaviour: fast (~1ms, from service memory), medium capacity, shared between application servers, with invalidation events being replicated across data centers

Values in this store are stored centrally in the current data centre (typically using Memcached as backend). While values are not replicated to other clusters, "delete" and "purge" events for keys are broadcasted to other data centres for cache invalidation. See WANObjectCache class reference for how to use this.

In short: Compute and store values via the  method. To invalidate caches, use key purging (not by setting a key directly).

See also WANObjectCache on wikitech.wikimedia.org.

Main stash

 * Accessed through.
 * Configurable: Yes, via $wgMainStash.
 * Behaviour: may involve disk read (1-10ms), semi-persistent, shared between application servers and replicated across data centers.

Values in this store are read and written in the same data centre, with writes expected to be replicated to and from other data centres. It typically uses MySQL as backend. (See MariaDB for Wikipedia's configuration.) By default, the table is used. It must be tolerated that reads can potentially be stale, for example due to bried unavailability of cache writes, or race conditions where overlapping requests finish out of order, or due to writes from another data center taking a second to replicate.

This store is expected to have strong persistence and is often used for data that cannot be regenerated and is not stored elsewhere. However, data stored in the MainStash must be non-critical and result in minimal user impact if lost, thus allowing for the backend to sometimes be partially unavailable or wiped if under operational pressure without causing incidents.

Session store
This is not really a cache, in the sense that the data is not stored elsewhere.
 * Accessed via  objects, which itself is accessed via SessionManager, or
 * Configured via.

Interwiki cache
See Interwiki cache for details, and also.

Parser cache
See Manual:Parser cache for details. See also purgeParserCache.php.
 * Accessed via the  class.
 * Backend configured by (typically MySQL).
 * Keys are canonical by page ID and populated when a page is parsed.
 * Revision ID is verified on retrieval.

Message cache

 * Access via.
 * Backend configurable by $wgMessageCacheType (defaults to $wgMainCacheType, with fallback to MySQL).

Revision text

 * Accessed via.
 * Stored in the WAN cache, using key class.
 * Keys are verifiable and values immutable. Cache is populated on demand.

Background
The main use case for caching revision text (as opposed to fetching directly from the  table or External Storage) is for handling cases where the text of many different pages is needed by a single web request.
 * Originally implemented in 2006 (, commit 376014e).
 * Process cache added in 2016.
 * Adopted by MessageCache in 2017.

This is primarily used by:


 * Parsing wikitext. When parsing a given wiki page, the Parser needs the source of the current page, but also recursively needs the source of all transcluded template pages (and Lua module pages). It is not unusual for a popular article to indirectly transclude over 300 such pages. The use of Memcached saves time when saving edits and rendering page views.
 * MessageCache. This is a wiki-specific layer on top of LocalisationCache, which consists primarily of message overrides from "MediaWiki:"-namespace pages on the given wiki. When building this blob, the source text of many different pages needs to be fetched. This is cached per-cluster in Memcached, and locally per-server (to reduce Memcached bandwidth ;, commit 6d82fa2).

Example
Key.

"content address" refers to the  on the wiki's main database (e.g. "tt:1123"). This in turn refers to the table or (External Storage).

To reverse engineer which page/revision this relates to, Find  for the content address, then find the revision ID for that content slot.

The revision ID can then be used on-wiki in a url like https://en.wikipedia.org/w/index.php?oldid=951705319, or you can look it up in the revision and page tables.

Revision meta data

 * Accessed via.
 * Stored in the WAN cache, using key class.
 * Keys are verifiable (by page and revision ID) and values immutable. Cache is populated on demand.

MessageBlobStore
Stores interface text used by ResourceLoader modules. It is similar to LocalisationCache, but includes the wiki-specific overrides. (LocalisationCache is wiki-agnostic). These overrides come from the database as wiki pages in the MediaWiki-namespace.


 * Accessed via.
 * Stored in the WAN cache, using key class.
 * Keys are verifiable (by ResourceLoader module name and hash of message keys). Values are mutable and expire after a week. Cache populated on demand.
 * All keys are purged when LocalisationCache is rebuild. When a user save a change to a MediaWiki-namespace page on the wiki, a subset of the keys are also purged.

Minification cache
ResourceLoader caches the minified versions of raw JavaScript and CSS input files.
 * Accessed via.
 * Stored locally on the server (APCu).
 * Keys are verifiable (deterministic value). No purge strategy needed. Cache populated on demand.

LESS compilation cache
ResourceLoader caches the meta data and parser output of LESS files it has compiled.


 * Accessed via.
 * Stored locally on the server (APCu).

File content hasher
ResourceLoader caches the checksum of any file directly or indirectly used by a module. When serving the startup manifest to users, it needs the hashes of many thousands of files. To reduce I/O overhead, it caches this content hash locally, keyed by path and mtime.


 * Accessed via.
 * Stored locally on the server (APCu).