Manual:Squid caching

Why Squid?
Squid is a high-performance proxy server that can also be used as a HTTP accelerator for the webserver. Explained in layman terms, Squid will store a copy of the pages served by webserver and the next time the same page is requested, Squid will serve the copy. This process is called "caching" and it removes the need for the webserver to regenerate that same page again, resulting in a tremendous performance boost for the webserver.

Since MediaWiki websites are generated dynamically entirely, there is a substantial performance gain in running Squid as a HTTP accelerator for your webserver. In fact, sites like WikiPedia use several Squid caches to enhance their performance.

Because of this performance gain, MediaWiki has been designed to integrate closely with Squid. For example, MediaWiki will notify Squid when a page should be purged from the cache in order to be regenerated.

The architecture
How to set up a combo of Squid, Apache and MediaWiki is lined out below, taking as a premise that all three are running on the same server. It is possible to use a more complex caching strategy or use different port numbers and IP-addresses, but for this simple example we strive for the following one server architecture:

+-+                    | Server                                  | | +--+      +--+ |                     | | Squid        |       | Apache       | | Outside world <---> | | accellerator | <---> | webserver   | | | | w.x.y.z:80  |       | 127.0.0.1:80 | | | +--+      +--+ |                     +-+

To the outside world, Squid will act as the webserver. In reality, it passes on the requests to the Apache webserver only when necessary. Apache runs on the same server, but it only listens to requests from localhost (127.0.0.1). Rest assured, running both services on port 80 will not cause conflicts, since both services are bound to different IP-addresses.

Setting it up like this means Apache cannot be accessed from the outside world directly, only through Squid. Apache is flexible enough to also allow direct access should you need it, but that is beyond the scope of this example.

Configuring Squid
Due to its versatility Squid has a very large "squid.conf" configuration file. There are however only a few settings relevant when using Squid in accelerator mode.

First and foremost, Squid needs to know which IP-address and port to listen to: http_port 207.142.131.205:80
 * 1) Use your own external IP-address

Then, Squid needs to know what host to accelerate. In this case, Apache will be listening to port 80 on the localhost:

httpd_accel_host 127.0.0.1 httpd_accel_port 80 httpd_accel_single_host on
 * 1) Accelerating only Apache on localhost

If you run virtual domains, you also want this:

httpd_accel_uses_host_header on

Now it is time to define what may be accessed and from where. This is done by defining access control lists ("acl"s) and allowing or denying http_access. Basically, there are a few things we need to allow in order for our setup to work:


 * 1) Access to the web port (80) must be allowed
 * 2) For maintenance purposes access to the cachemanager will be allowed for the localhost
 * 3) MediaWiki's requests to purge pages will be allowed for the localhost

All other access will be denied. This results in the following configuration:

acl all src 0.0.0.0/0.0.0.0 acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl web_ports port 80 http_access allow web_ports http_access allow manager localhost http_access deny manager acl purge method PURGE http_access allow purge localhost http_access deny purge http_access deny all
 * 1) Minimum setup
 * 1) Allow access to the web ports
 * 1) Only allow cachemgr access from localhost
 * 1) Allow purge
 * 1) And finally deny all other access to this proxy

Note: There is a mention in the function sendCacheControl in OutputPage.php of more rules that should be added in to replace Cache-Control headers.

Configuring Apache
The Apache webserver now needs to be configured to listen only to the localhost port 80. The file httpd.conf should contain the following line:

Listen 127.0.0.1:80

and if you are using virtual hosts also lines like:

NameVirtualHost 127.0.0.1:80  ServerName meta.wikimedia.org ... 

Configuring MediaWiki
When configuring MediaWiki act as if there is no Squid. Meaning, use the servername the outside world would use instead of the internal IP-address. E.g., use "meta.wikimedia.org" for servername instead of "127.0.0.1".

Since Squid is doing the requests from localhost, Apache will receive "127.0.0.1" as the direct remote address. However, as Squid forwards the requests to Apache, it adds the "X-Forwarded-For" header containing the direct remote address as received by Squid. This way the remote address from the outside world is preserved.

By default MediaWiki will use the direct remote address for changes etcetera, so it must be configured to use the "X-Forwarded-For" header instead in order to function correctly. Make sure the LocalSettings.php file contains the following lines:

$wgUseSquid = true; $wgSquidServers = array('127.0.0.1');

Some notes
In this setup, Squid will shield off most of the traffic to Apache. Therefore, if you need reliable web statistics from a statistics package like e.g. Webalizer, you will need to set it up to analyze Squid's access_log instead of Apache's.