Manual:Varnish caching

Why Varnish?
Why not? Varnish is a lightweight, efficient reverse proxy server which reduces the time taken to serve often-requested pages by providing an HTTP accelerator for a web server.

Like Squid, Varnish stores copies of the pages served by the web server. The next time the same page is requested, Varnish will serve the copy instead of requesting the page from the web server. This "caching" process removes the need for MediaWiki to regenerate that same page again, resulting in a tremendous performance boost.

Varnish has the advantage of being designed specifically for use as an HTTP accelerator (reverse proxy). It stores much of its cached data in memory, creating fewer disk files and fewer accesses to the filesystem than the larger, more multi-purpose Squid package. Like Squid, it serves often-requested pages to anonymous-IP users from cache instead of requesting them from the origin web server. This reduces both CPU usage and database access by the base MediaWiki server.

Because of this performance gain, MediaWiki has been designed to integrate closely with a web cache and will notify Squid or Varnish when a page should be purged from the cache in order to be regenerated.

From MediaWiki's point of view, a correctly-configured Varnish installation is interchangeable with its Squid counterpart.

The architecture
An example setup of Varnish, Apache and MediaWiki on a single server is outlined below. A more complex caching strategy may use multiple web servers behind the same Varnish caches (all of which can be made to appear to be a single host) or use independent servers to deliver wiki or image content.

To the outside world, Varnish appears to act as the web server. In reality it passes on requests to the Apache web server, but only when necessary. An Apache running on the same server only listens to requests from localhost (127.0.0.1) while Varnish only listens to requests on the server's external IP address. Both services run on port 80 without conflict as each is bound to different IP addresses.

Configuring Varnish 2.x
===

Configuring MediaWiki
When configuring MediaWiki act as if there is no Squid. Meaning, use the servername the outside world would use instead of the internal IP-address. E.g., use "meta.wikimedia.org" for servername instead of "127.0.0.1".

Since Squid is doing the requests from localhost, Apache will receive "127.0.0.1" as the direct remote address. However, as Squid forwards the requests to Apache, it adds the "X-Forwarded-For" header containing the direct remote address as received by Squid. This way the remote address from the outside world is preserved.

By default MediaWiki will use the direct remote address for changes etcetera, so it must be configured to use the "X-Forwarded-For" header instead in order to function correctly. Make sure the LocalSettings.php file contains the following lines:

See also Manual:Configuration settings for all configuration settings related to Squid/Varnish caching.

Some notes
As most of the traffic is handled by the Varnish cache, a statistics package will not give meaningful data if configured to analyse Apache's access_log. There are packages available to log Varnish access data to a file for analysis if needed. Counters on individual wiki pages will also severely underestimate the number of views to each page (and to the site overall) if a web cache is deployed. Many large sites will turn off the counters with $wgDisableCounters.

The display of the user's IP address in the user interface must also be disabled by setting $wgShowIPinHeader = false;

Note that Varnish is an alternative to Squid, but does not replace other portions of a complete MediaWiki caching strategy such as:


 * Pre-compiled PHP code: The default behaviour of PHP under Apache is to load and interpret PHP web scripts each time they are accessed. Installation of a cache such as APC (yum install php-pecl-apc, then allocate memory by setting apc.shm_size=128 or better in /etc/php.d/apc.ini) can greatly reduce the amount of CPU time required by Apache to serve PHP content.
 * Localisation/Internationalisation: By default, MediaWiki (as of version 1.16+) will create a huge l10n_cache database table and access it constantly - possibly more than doubling the load on the database server after an "upgrade" to the latest MediaWiki version. Set $wgLocalisationCacheConf to force the localisation information to be stored to the file system to remedy this.
 * Variables and session data: Storing variable data such as the MediaWiki sidebar, the list of namespaces or the spam blacklist to a memory cache will substantially increase the speed of a MediaWiki installation. Forcing user login data to be stored in a common location is also essential to any installation in which multiple, interchangeable Apache servers are hidden behind the same Varnish caches to serve pages for the same wikis. Install the memcached package and set the following options in LocalSettings.php to force both user login information and cached variables to use memcache:
 * $wgMainCacheType = CACHE_MEMCACHED;
 * $wgMemCachedServers = array ( '127.0.0.1:11211' );
 * $wgSessionsInMemcached = true;
 * $wgUseMemCached = true;</tt>
 * Note that, if you have multiple servers, the localhost address needs to be replaced with that of the shared memcached server(s), which must be the same for all of the matching web servers at your site. This ensures that logging a user into one server in the cluster logs them into the wiki on all the interchangeable web servers.

Apache 2.x-Logfile Settings
The Apache web server log, by default, shows only the address of the Varnish cache server, in this example "127.0.0.1:80"

Apache may be configured to log the original user's address by capturing "x-forwarded-for" information under a custom log file format.

An example for Apache's httpd.conf to configure logging of x-forwarded-for is:

 LogFormat "%{X-Forwarded-for}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" cached </tt>