Manual:CloudFlare

From MediaWiki.org
Jump to navigation Jump to search

CloudFlare (cloudflare.com) is a commercial content delivery network with integrated distributed denial of service (DDoS) defence. As it acts as a reverse proxy and domain name server to your website, it can provide a useful IPv6 transition mechanism if your hosting provider doesn't provide native IPv6.

As with other reverse proxies, such as Squid and Varnish, there are configuration issues with any server that places itself in-line between the user and Mediawiki. CloudFlare needs to be told what to cache (and when to discard outdated content) and needs to be told not to replace MediaWiki's custom 404 page (which invites the user to create a missing article) with its own "SmartErrors" page. Likewise, MediaWiki (or Apache) needs to be aware of any intervening trusted proxies to ensure that the user's IP address (and not that of CloudFlare's servers) is listed in Special:RecentChanges or logged by Extension:CheckUser.

Advantages and limitations[edit]

CloudFlare does not currently charge per megabit/second or per gigabyte or terabyte of data transferred; (as of 2014) the basic service, with some limitations, is free. For sites receiving high-volume requests for the same unchanged content (such as images, which typically account for more than 80% of bandwidth costs) placing a busy domain behind a service like CloudFlare can substantially reduce its cost of operation. As fewer requests are being made to the origin servers, in some cases the site may run faster.

Individual page view counters are broken under CloudFlare, as most requests never reach the origin servers and the individual wiki site owner does not have access to CloudFlare's logs. CloudFlare does provide analytics to tell how many visit your site, but not on a per-page basis.

The free version of CloudFlare is limited in its SSL capability. A paid version ($20/month per domain, no limit on number of subdomains) provides somewhat better SSL support, but content would still be decrypted and re-encrypted at CloudFlare's servers - a potential 'man in the middle' vulnerability.

CloudFlare takes control over DNS for your entire domain; this may be a problem if you use domains in which individual subdomains are shared between users with a service like freedns.afraid.org or if you use other services which expect to be your DNS provider.

CloudFlare places your site behind multiple anycast servers in multiple countries. This can be an advantage if being geographically closer to your users makes your site appear faster, but can be a legal disadvantage for sites which are targets of libel tourism or are dealing in politically-sensitive matters. A Wikileaks-like site might not want a US server in the data path if discussing sensitive information about US intelligence agencies; a site discussing activities lawful in its own country but illegal in one or more of the CloudFlare server locations might also wish to avoid using the service and keep their content distribution network at home.

Like any third-party free or paid service, there is also the risk that what's free today becomes an expensive paid service tomorrow, becomes slow or unstable or simply disappears. Be prepared to switch your domain registration's DNS entries back to your original service (or another provider) if the 'freemium' CloudFlare service goes away in the future.

CloudFlare does not perform well in China as the Great Firewall often blocks traffic from CloudFlare servers.

Integration with MediaWiki[edit]

MediaWiki's caching strategy is designed to work with open-source reverse proxy servers such as Squid and Varnish. On a single-server configuration, this looked like:

Outside world <--->

Server

Squid/Varnish accelerator
w.x.y.z:80

<--->

Apache webserver
127.0.0.1:80

When an anonymous user requested a page, Squid (or Varnish) checked whether it had a copy already stored. If one was available, it was served directly without even querying the origin server. If not, the request would pass through to Apache and MediaWiki.

This saved significant amounts of processor time (as MediaWiki was no longer repeatedly regenerating the same HTML for frequently-read pages) but had various impacts on MediaWiki server configuration:

  • The address of the Squid proxy had to be kept out of Special:RecentChanges, instead using the user's IP address from the X-Forwarded-For header.
  • The proxy servers had to be notified when a page was changed, so they could discard the outdated version. This was done from LocalSettings.php with entries like:
$wgUseSquid = true;
$wgSquidServers = array('<your IPv4 address>');
$wgSquidServersNoPurge = array('127.0.0.1');
  • The proxy had to avoid caching dynamic content which frequently or pseudo-randomly changes, such as special pages or output from some extensions such as Extension:DynamicPageList or Extension:RandomSelection.
  • The user's IP address had to be removed from the upper-right corner for anonymous IP users, $wgShowIPinHeader=false;
  • Page view counters should be disabled in LocalSettings.php, $wgDisableCounters=true, as caching means not all page views reach the MediaWiki/Apache server
  • Pages displaying a "You have new messages" banner or similar flags could not be cached
  • Pages for logged-in users (as they contain login names or identifying info) could not be cached
  • Protection against image hotlinking had to move out of Apache and onto whichever server was facing the user
  • IPv6 support also moved onto whichever server was facing the user.

MediaWiki sends HTTP headers (such as "cache-control: private" if something should not be cached, or an expiry time if something can be cached for a limited time) which are recognised by Squid. Squid sends "X-forwarded-for:" to indicate the IP of a user to a MediaWiki installation behind a Squid proxy. MediaWiki also sends HTTP PURGE requests to notify Squid that an item has changed and needs to be discarded.

As MediaWiki was designed to work with Squid (and Varnish can be configured to act in the same manner), the MediaWiki installation and the cache are tightly integrated.

CloudFlare in some cases provides a function which could serve a similar role, but implements it in a non-standard or incompatible manner. A MediaWiki installation could operate behind CloudFlare, but there are some configuration issues to be addressed.

Anonymous IP user identification[edit]

If a user connects directly to a MediaWiki installation (on Apache), the user's IP address is reported by PHP in $_SERVER['REMOTE_ADDR'] and no further configuration is required to get the information into Special:RecentChanges

If a MediaWiki installation is behind Squid or Varnish, MediaWiki must be configured to use the "X-Forwarded-For:" header and a list of trusted proxies configured in $wgSquidServers or $wgSquidServersNoPurge

CloudFlare's proposal to patch MW 1.18 (not recommended)[edit]

If a MediaWiki installation is behind CloudFlare, that company's servers will present the user's IP in a non-standard format with "CF-Connecting-IP:" and "CF-IPCountry:" alongside the standard "X-Forwarded-For:"

CloudFlare at one point recommended changing core MediaWiki code to use $_SERVER['HTTP_CF_CONNECTING_IP'] but their proposed patch is, by their own admission, only suitable for "MediaWiki versions 1.18.0 and older". This is obsolete.

Installing mod_cloudflare in Apache[edit]

This leaves mod_cloudflare as the simplest currently-available option. This module, downloadable pre-compiled for various distributions from CloudFlare's site, is installed directly into Apache (not MediaWiki). It restores the original user's IPv4 or IPv6 address if CloudFlare is connecting to Apache directly. (If the MediaWiki/Apache server is sitting behind Squid, and that in turn is behind CloudFlare, the IP addresses in recentchanges will break).

A list of (originally) fourteen IPv4 and IPv6 ranges used by CloudFlare's servers was hard-coded in the module's source code. Cloudflare added additional IPv4 blocks to this list in late 2016 for new servers; sites with the outdated list still installed are now seeing CloudFlare's address in place of the user's address on a very intermittent basis.

Two recent RHEL/CentOS versions are supported, as are a couple of Debian versions and a few versions of Ubuntu. For these, Cloudflare distributes an .rpm or .deb file which installs mod_cloudflare as a precompiled binary. CloudFlare is no longer updating the RHEL/CentOS 5 .rpm's and the server list in these .rpm's is outdated.

Older OS versions may no longer be receiving updates to the mod_cloudflare packages; new major releases of distributions (such as CentOS 7, when it was initially released in 2015) are sometimes slow to be supported. Install an earlier version of mod_cloudflare into the Apache server bundled with a newer OS distribution and the web server likely will fail to start, throwing errors.

If you're running a compatible distribution, but the package manager refuses to install mod-cloudflare with "incorrect GPG key", reinstall the key and remove the old key using (for .rpm distributions):

rpm --import http://pkg.cloudflare.com/pubkey.gpg
rpm -e gpg-pubkey-8e5f9a5d-*

Using the TrustedXFF MediaWiki extension[edit]

The other option is to use Extension:TrustedXFF and list every one of CloudFlare's IP addresses manually. The list is a long one and may change with time. See [1] and [2] for more information on recovering the original user address.

Unfortunately, TrustedXFF hard-codes its IPv6 trusted proxy addresses (instead of putting them in configuration files) and TrustedXFF/generate.php will refuse an IPv4 range above a certain size (an /18 or wider will fail).

The IPv6 issue can be avoided by having CloudFlare connect to your server in IPv4 (the user still sees either IPv4 or IPv6, as CloudFlare is a usable IPv6 transition mechanism). The restriction on large IPv4 ranges is a bit more problematic.

One could add them to TrustedXFF.body.php in much the same way as IPv6 ranges were kludged/hard-coded into this file.

Drop the list of CloudFlare's IPv4 addresses into the code alongside the existing IPv6 list at the beginning of the class definition:

        // FIXME: IPv4 ranges of /18 or wider won't compile with generate.php, so list them unexpanded
        static $ipv4Ranges = array(
        199.27.128.0/21,
        173.245.48.0/20,
        103.21.244.0/22,
        103.22.200.0/22,
        103.31.4.0/22,
        141.101.64.0/18,
        108.162.192.0/18,
        190.93.240.0/20,
        188.114.96.0/20,
        197.234.240.0/22,
        198.41.128.0/17,
        162.158.0.0/15,
        104.16.0.0/12,
        172.64.0.0/13,
        131.0.72.0/22);

        // FIXME: IPv6 ranges need to be put here for now...

Scroll down to function isTrusted ($ip) and add a check for the hard-coded IPv4's ahead of the IPv6 check, as:

                // try IPv4 ranges
                foreach (self::$ipv4Ranges as $range)
                  if ( IP::isInRange ($ip, $range) )
                    return true;

                // Try IPv6 ranges...

Not a clean solution, but (until a proper CloudFlare extension is created for MediaWiki) this will get valid IP's into Special:Recentchanges and keep the addresses of the CloudFlare servers out of the logs... at least until the next time CloudFlare adds to its list of servers.

Removing old varnish[edit]

Any existing local caching servers (such as Squid or Varnish) will need to be removed when using mod_cloudflare; otherwise, mod_cloudflare will see Squid's address (which it doesn't recognise as a trusted proxy) and pass the request unchanged, at which point MediaWiki replaces the address of Squid with the next address upstream... CloudFlare. You can also try adding Squid to the list of trusted proxies by using the CloudFlareRemoteIPTrustedProxy directive.

As MediaWiki typically blocks abuse by IP address, showing CloudFlare as the address in Special:Recentchanges will interfere with efforts to selectively block spam and vandalism.

Cache control[edit]

To prevent CloudFlare from caching certain parts of your site, it is possible to use Page Rules.

By default, MediaWiki sets certain HTTP headers (such as "cache-control: private" for logged-in users, or "expires:" for cacheable pages) to control whether its dynamically-generated content is cached and when it should expire. For purely static content, such as image files, the headers are generated by the web server.

CloudFlare may respect these, depending on how it has been configured.[3] From My Websites → settings → Page rules it is possible to create three rules per domain (more for paid users). Each matches a pattern (such as *example.org/wiki/* for all wiki pages on example.org and its subdomains, in standard view) and allows configuration of:

  • Custom caching - allows the cache control headers to be overridden, "Cache everything" is the most aggressive and may be OK for static image files but certainly not wanted for anything else. Special:Recentchanges or other dynamic content should use the least aggressive settings.
  • Edge cache expire TTL - in "cache everything" mode, overrides the time after which CloudFlare requests a new copy of a page or file from the origin server. If used on anything but static content (images), this will cache things which shouldn't be cached as it can override cache control directives from the origin server.[4]
  • Browser cache expire TTL - indicates the time after which the user's browser should request a new copy of a page from CloudFlare

It's best to cache static image files aggressively ("cache everything" would be valid for images, were a means provided to purge them when a new version is uploaded). Ordinary wiki pages should use settings which respect cache-control headers ("cache aggressively" is OK, "cache everything" is problematic) and special: pages should not be cached or cached only briefly (so that they remain current).

HTTP purge[edit]

When new content is uploaded to a MediaWiki, the wiki software will request that each cache server (as configured in $wgSquidServers) discard the outdated version of the page or image. This notification is not sent to third-party web proxies (such as those listed in Extension:TrustedXFF) and appears to be hard-coded in SquidPurgeClient.php in core MediaWiki code with no hooks to allow an extension to change this behaviour.

The message, as sent by MediaWiki, looks like: PURGE http://wiki.example.org/images/0/01/Some_image_recently_replaced.jpg

Squid will honour this request if it comes from a trusted IP address. CloudFlare likely will not.

There is a means to manually request one outdated file be discarded (from 'My Websites' settings for one domain → CloudFlare Settings → Cache Purge, click "purge single file" and enter URL of file to purge) but this is cumbersome and wildcards are not supported.[5]

CloudFlare's API provides a function which can purge an individual file,[6] but there is currently nothing available to invoke this API automatically from MediaWiki. As a result, a wiki behind CloudFlare will continue to display outdated versions of content until the data expires from cache normally. See bugzilla:62356.

CloudFlare does not offer a direct equivalent to Squid's negative_ttl parameter, which controls how long an error code returned for a URL from an origin server remains in cache (negative caching) until the request is retried. If CloudFlare does "cache everything" with a long time-to-live, it may be necessary to manually purge URL's where CloudFlare has stored an error message but the error has since been corrected upstream.

Error 404 handling[edit]

By default, CloudFlare enables a "feature" which it brands as SmartErrors. This replaces the originating site's 'error 404' pages with a CloudFlare search page which lists other, related pages which do exist on the same site and allows the user to search the web. This "smart error" page, which displays in US English instead of your site's local language, may contain advertising or direct users to obscure, little-known external web search engines.

MediaWiki, when given a request for a page which doesn't exist http://en.wikipedia.org/wiki/Like_this_one will display a custom error page which has the 404 code but invites the user to create the page (or indicates the page was deleted, protected, or "no such special page"). MediaWiki sites need this page left as-is so that clicking on a red link invites a user to create a new article. SmartErrors on every 404 page breaks this.

SmartErrors may be disabled for an entire domain on initial import of the domain to CloudFlare or deactivated later on a per-domain basis from the list of domains by clicking 'Apps', scrolling down to "SmartErrors" and turning the app 'off'. They may also be turned off for specific paths using "page rules" in the settings for each domain.

Image hotlinking[edit]

Many web-based forums are infamous for encouraging users to link directly to images hosted on other sites. Their site appears to be displaying the image, but is using bandwidth that the 'hot-linked' target site's operators must pay for. A common defence is for Apache webmasters to deploy mod_rewrite to look at the "referer:" (sic) line of each request and reject any that hotlink images (.jpg, .png, .gif et al.) for use on pages on some other site.

With any form of caching server, the origin servers are no longer user-facing. Any anti-hotlinking code needs to be removed from the origin webserver (such as Apache); anti-hotlink protection is available on CloudFlare if needed.

HTML document comments[edit]

MediaWiki will normally add comments such as <!-- Served by wonky-server.example.org in 666 seconds --> to HTML output for troubleshooting purposes; if a site has multiple Apache's delivering the same content, this is invaluable in determining which server is causing a specific issue. CloudFlare is prone to "optimising" delivered webpages by stripping these comments, a function which may need to be turned off at times for debugging purposes.

IPv6[edit]

All current MediaWiki versions are able to log an IPv6 user address to Special:RecentChanges if one is provided. No changes to MediaWiki's LocalSettings.php is required, although MediaWiki (and any extensions which rely on user IP's, such as CheckUser) needs to be a currently-supported version as some older (pre-MW1.19) revisions were buggy in logging IPv6 user addresses to Special:RecentChanges

IPv6 support is enabled on whichever server is facing the user; Apache for a stand-alone install, Squid/Varnish for sites using these as a reverse proxy, CloudFlare's server for sites where CloudFlare is user-facing. There is no requirement that the communication from the user-facing server back to Apache support both IPv4 and IPv6 and no benefit to enabling both for any back-end link.

My websites → settings → CloudFlare settings → Automatic IPv6 may be set to "full" for each domain to enable IPv6 for your site.

No other configuration is required for IPv6.

See also[edit]