Manual:Parser cache

From mediawiki.org

The parser cache is responsible for caching the rendered output of a wiki page. It is the primary caching mechanism for serving page views in MediaWiki (not counting any web-cache such as Varnish in front of MediaWiki). The main ParserCache stores output of the latest revision of a page.

Terminology[edit]

Some terms relevant to the documentation on this page:

  • rendered output or just output is the HTML generated from a wiki page's content for viewing, along with any additional data that may be attached to it. In the case of wikitext, the output is generated by parsing, but other kinds of pages may use other mechanisms to generate output (that is render the page's content) via a ContentHandler. In addition to the page content itself, the output may depend on other resources, most notably templates.
  • options can be used to modify the output generated for a page, based on user preferences or the context in which the output is used.
  • varying keys means using different cache keys based on (some) options, allowing multiple cache entries (variants) to co-exist for the same page and revision. Varying keys splits the cache and may thus lead to undesirable fragmentation which can exhaust cache capacity.
  • invalidation refers to the process of marking a cache entry as outdated (dirty). An entry in the parser cache becomes invalid when the content of the page changes. When resources the page depends on change, the cache entry will be invalidated asynchronously (see Manual:Job queue for more on this mechanism).
  • expiry occurs when a cache entry is older than some pre-defined maximum age. The maximum age may be defined for the entire cache or individually for each entry, or both.
  • eviction refers to the removal of entries from a cache to make room for new entries (see Cache replacement policies on Wikipedia).
  • pruning refers to the removal of expired entries from a cache to free up capacity

Types[edit]

There are two kinds of caches of rendered page output.

ParserCache[edit]

The ParserCache class caches rendered output (HTML plus associated data) for the latest revision of a page. It serves as a semi-permanent store of a wiki's current content as seen by readers. The ParserCache supports varying keys based on options, and uses a two-tiered system to avoid unnecessary cache fragmentation.

Since MediaWiki version 1.36, it is possible to have multiple ParserCache instances side by side. This can be used in situations in which entirely different kinds of output need to be stored for each page, or the output varies on factors beyond what is covered by ParserOptions. One example for this is the FlaggedRevs extension which uses a separate ParserCache to store the rendered output of the "stable" revision of each page, rather than the current revision. Another example is migration to a different parser (such as Parsoid ), which makes it necessary for a while to have caches for the output of the new as well as of the old parser.

Different ParserCache instances are managed by the ParserCacheFactory which can be obtained from MediaWikiServices.

RevisionOutputCache[edit]

MediaWiki version:
1.36

The RevisionOutputCache class, introduced in MediaWiki version 1.36, implements caching for the rendered output of old revisions of a page. Like ParserCache, it supports varying keys based on options, but it uses a simpler system since it is not designed for long term persistence.

The intent of this cache is to protect against load spikes caused by certain old revisions being viewed by a large number of users, typically due to an external "deep" link to that revision.

As with ParserCache, instances of RevisionOutputCache are managed by the ParserCacheFactory .

Metadata and Payload Data[edit]

The primary content of the parser cache is rendered output (HTML) generated from the page content (typically wikitext). In addition, the ParserOutput in the cache contains the following kinds of information:

  • Cache meta-data: this includes information about which revision the output was generated for, when it was generated, and when it should expire. In addition, ParserOutput records which options were used when generating the output (that is, which options the output varies on).
  • Derived data: this includes information about links and dependencies, e.g. which pages does the output link to, which templates were used in its creation, which images are included on the page. This also includes any special scripts or style sheets required to display the page, as well as arbitrary "page properties" that are to be placed in the page_props table.
  • Extension data: extensions can attach arbitrary data to the ParserOutput object which will be cached along with the rendered output. This provides extensions with a way to pass information from code executed during parsing, to code executed during page display in a later request.

Since MediaWiki version 1.36, data stored in the parser cache is encoded as JSON. For this reason, only primitive data and objects implementing the JsonUnserializable interface can be stored in the cache using setExtensionData(). Earlier versions of MediaWiki relied on PHP's built-in serialization mechanism and allowed for arbitrary objects to be stored, at the cost of robustness and security (see phab:T161647).

Cache Structure and Key Space[edit]

The ParserCache class supports storing multiple ParserOutput objects for each page, based on the ParserOptions used when generating the output. To avoid duplicating cache entries by varying the cache key on options that were not actually used, a two-tiered system is employed:

The first tier is keyed by the page ID and stores a CacheTime object, which contains information about cache expiration and the list of options used during the parse of the page. For example, if only the dateformat and userlang options were accessed by the parser when producing output for the page, this fact will be stored in the metadata cache.

The second tier of the cache contains the actual ParserOutput objects. The key for the second tier is constructed from the page ID and values of any options that affected the output. Upon cache lookup, the list of used option names is retrieved from the first tier, and only the values of those options are used together with the page ID to produce a key, while the rest of the options are ignored. Following the example above where only the dateformat and userlang options affected the output for the page, the key may look something like page_id!dateformat=default:userlang=ru. Thus any cache lookup with dateformat=default and userlang=ru will hit the same cache entry regardless of the values of the rest of the options, since we know from the information in the first cache tier that they did not affect the output.

The RevisionOutputCache also varies cache keys based on parser options, but always considers all options. This simplifies the system and speeds up access, but may lead to fragmentation. This is acceptable since RevisionOutputCache entries generally have a low expiry time, making a large number of variants unlikely.

Population, Invalidation, Expiry, and Eviction[edit]

The main ParserCache instance serves as a semi-permanent store of a wiki's content as seen by readers. The default ("canonical") rendering of the page is generated immediately when the page is edited, or when any template or other dependency of the output changes (see LinksUpdate ). Output using different options is generated and cached on demand.

ParserCache uses a passive invalidation model based on timestamps: When the content of a page changes, a timestamp is updated in the database (specifically, the page_touched field in the page table). If a cached ParserOutput object is found to be older than this timestamp, it is considered outdated (dirty). Outdated content may still be served to the user depending on context.

In addition to invalidation, entries in a ParserCache will expire after a set period (see Manual:$wgParserCacheExpireTime ). The expiry time can be lowered on a per-page basis, depending on the content of the page by calling updateCacheExpiry() on the ParserOutput object. Extensions that allow the inclusion of dynamic content may use this to ensure that the dynamic content is re-evaluated at an appropriate rate. Beyond this, the Manual:$wgCacheEpoch setting provides a way to expire all cache entries older than a specific point in time, e.g. to ensure that changes in the site's setup or configuration take effect.

Depending on the configuration of the cache's storage backend (see Manual:$wgParserCacheType ), cache entries may or may not be evicted from the cache prior to expiry, or may or may not be pruned from storage once expired. In general, the parser cache should be configured to ensure a very good hit rate, since it directly affects the time it takes to load a page for reading.

For information about the setup of the parser cache backend for Wikimedia sites, see wikitech:Parser cache.

The RevisionOutputCache in contrast is much simpler: it is populated opportunistically when renderings become available, and stores data in the Manual:Object cache#WAN cache using a relatively short expiry time (see Manual:$wgOldRevisionParserCacheExpireTime ). Low hit rates are expected under normal operations, since it is generally rare for the same old revision to be visited a lot in a short time span.

Configuration[edit]

See Manual:Configuration settings#Parser Cache

See also[edit]