Manual:CloudFlare

From MediaWiki.org
Jump to navigation Jump to search

CloudFlare (cloudflare.com) is a commercial content delivery network with integrated distributed denial of service (DDoS) defence. As it acts as a reverse proxy and domain name server to your website, it can provide a useful IPv6 transition mechanism if your hosting provider doesn't provide native IPv6.

As with other reverse proxies, such as Squid and Varnish, there are configuration issues with any server that places itself in-line between the user and Mediawiki. CloudFlare needs to be told what to cache (and when to discard outdated content). Likewise, MediaWiki (or Apache) needs to be aware of any intervening trusted proxies to ensure that the user's IP address (and not that of CloudFlare's servers) is listed in Special:RecentChanges or logged by Extension:CheckUser.

Advantages and limitations[edit]

CloudFlare does not currently charge per megabit/second or per gigabyte or terabyte of data transferred; (as of 2014) the basic service, with some limitations, is free. For sites receiving high-volume requests for the same unchanged content (such as images, which typically account for more than 80% of bandwidth costs) placing a busy domain behind a service like CloudFlare can substantially reduce its cost of operation. As fewer requests are being made to the origin servers, in some cases the site may run faster.

Individual page view counters are broken under CloudFlare, as most requests never reach the origin servers and the individual wiki site owner does not have access to CloudFlare's logs. CloudFlare does provide analytics to tell how many visit your site, but not on a per-page basis.

The free version of CloudFlare is limited in its SSL capability. A paid version ($20/month per domain, no limit on number of subdomains) provides somewhat better SSL support, but content would still be decrypted and re-encrypted at CloudFlare's servers - a potential 'man in the middle' vulnerability.

CloudFlare takes control over DNS for your entire domain; this may be a problem if you use domains in which individual subdomains are shared between users with a service like freedns.afraid.org or if you use other services which expect to be your DNS provider. Cloudflare provides DDoS protection for web, but not for e-mail or other services; unless you remove any MX records pointing to your own domain, the addresses in these records can still be used to find your underlying origin server and target it for DDoS or abuse.

CloudFlare places your site behind multiple anycast servers in multiple countries. This can be an advantage if being geographically closer to your users makes your site appear faster, but can be a legal disadvantage for sites which are targets of libel tourism or are dealing in politically-sensitive matters. While censorship of content by CloudFlare is uncommon, it is not completely insulated from its home country's political climate; incidents in which CloudFlare has removed sites because of their content include Switter (a microblogging site for sex workers) and the Daily Stormer (an extreme-right political site). Likewise, a Wikileaks-like site might not want a US server in the data path if discussing sensitive information about US intelligence agencies; as Cloudflare expands infrastructure into other countries (including Russia) the number of governments to which it is potentially exposed only increases. A site discussing activities lawful in its own country but illegal in one or more of the CloudFlare server locations might wish to avoid using the service and keep their content distribution network at home.

CloudFlare does not perform well in China as the Great Firewall often blocks traffic from CloudFlare servers.

Like any third-party free or paid service, there is also the risk that what's free today becomes an expensive paid service tomorrow, becomes slow or unstable or simply disappears. Be prepared to switch your domain registration's DNS entries back to your original service (or another provider) if the 'freemium' CloudFlare service goes away in the future.

Integration with MediaWiki[edit]

MediaWiki's caching strategy is designed to work with open-source reverse proxy servers such as Squid and Varnish. On a single-server configuration, this looked like:

Outside world <--->

Server

Squid/Varnish accelerator
w.x.y.z:80

<--->

Apache webserver
127.0.0.1:80

When an anonymous user requested a page, Squid (or Varnish) checked whether it had a copy already stored. If one was available, it was served directly without even querying the origin server. If not, the request would pass through to Apache and MediaWiki.

This saved significant amounts of processor time (as MediaWiki was no longer repeatedly regenerating the same HTML for frequently-read pages) but had various impacts on MediaWiki server configuration:

  • The address of the Squid proxy had to be kept out of Special:RecentChanges, instead using the user's IP address from the X-Forwarded-For header.
  • The proxy servers had to be notified when a page was changed, so they could discard the outdated version. This was done from LocalSettings.php with entries like:
$wgUseSquid = true;
$wgSquidServers = array('<your IPv4 address>');
$wgSquidServersNoPurge = array('127.0.0.1');
  • The proxy had to avoid caching dynamic content which frequently or pseudo-randomly changes, such as special pages or output from some extensions such as Extension:DynamicPageList or Extension:RandomSelection.
  • The user's IP address had to be removed from the upper-right corner for anonymous IP users, $wgShowIPinHeader=false;
  • Page view counters should be disabled in LocalSettings.php, $wgDisableCounters=true, as caching means not all page views reach the MediaWiki/Apache server
  • Pages displaying a "You have new messages" banner or similar flags could not be cached
  • Pages for logged-in users (as they contain login names or identifying info) could not be cached
  • Protection against image hotlinking had to move out of Apache and onto whichever server was facing the user
  • IPv6 and HTTPS support also moved onto whichever server was facing the user.

MediaWiki sends HTTP headers (such as "cache-control: private" if something should not be cached, or an expiry time if something can be cached for a limited time) which are recognised by Squid. Squid sends "X-forwarded-for:" to indicate the IP of a user to a MediaWiki installation behind a Squid proxy. MediaWiki also sends HTTP PURGE requests to notify Squid that an item has changed and needs to be discarded.

As MediaWiki was designed to work with Squid (and Varnish can be configured to act in the same manner), the MediaWiki installation and the cache are tightly integrated.

CloudFlare in some cases provides a function which could serve a similar role, but implements it in a non-standard or incompatible manner. A MediaWiki installation could operate behind CloudFlare, but there are some configuration issues to be addressed.

Anonymous IP user identification[edit]

If a user connects directly to a MediaWiki installation (on Apache), the user's IP address is reported by PHP in $_SERVER['REMOTE_ADDR'] and no further configuration is required to get the information into Special:RecentChanges

If a MediaWiki installation is behind Squid or Varnish, MediaWiki must be configured to use the "X-Forwarded-For:" header and a list of trusted proxies configured in $wgSquidServers or $wgSquidServersNoPurge

CloudFlare's proposal to patch MW 1.18 (not recommended)[edit]

If a MediaWiki installation is behind CloudFlare, that company's servers will present the user's IP in a non-standard format with "CF-Connecting-IP:" and "CF-IPCountry:" alongside the standard "X-Forwarded-For:"

CloudFlare at one point recommended changing core MediaWiki code to use $_SERVER['HTTP_CF_CONNECTING_IP'] but their proposed patch is, by their own admission, only suitable for "MediaWiki versions 1.18.0 and older". This is obsolete.

Installing mod_cloudflare in Apache[edit]

This leaves mod_cloudflare as the simplest currently-available option. This module, downloadable pre-compiled for various distributions from CloudFlare's site, is installed directly into Apache (not MediaWiki). It restores the original user's IPv4 or IPv6 address if CloudFlare is connecting to Apache directly. (If the MediaWiki/Apache server is sitting behind Squid, and that in turn is behind CloudFlare, the IP addresses in recentchanges will break).

A list of (originally) fourteen IPv4 and IPv6 ranges used by CloudFlare's servers was hard-coded in the module's source code. Cloudflare added additional IPv4 blocks to this list in late 2016 for new servers; sites with the outdated list still installed are now seeing CloudFlare's address in place of the user's address on a very intermittent basis.

Two recent RHEL/CentOS versions (6,7) are supported, as are a couple of Debian versions (7,8) and a few versions of Ubuntu (12,14,16). For these, Cloudflare distributes an .rpm or .deb file which installs mod_cloudflare as a precompiled binary. CloudFlare is no longer updating the RHEL/CentOS 5 .rpm's and the server list in these .rpm's is outdated.

Older OS versions may no longer be receiving updates to the mod_cloudflare packages; new major releases of distributions (such as CentOS 7, when it was initially released in 2015) are sometimes slow to be supported. Install an earlier version of mod_cloudflare into the Apache server bundled with a newer OS distribution and the web server likely will fail to start, throwing errors.

Cloudflare has no plans to offer .deb packages for Ubuntu18/Debian9, the current versions (as of 2018). Affected sites will need to compile mod-cloudflare from source files.

If you're running a compatible distribution, but the package manager refuses to install mod-cloudflare with "incorrect GPG key", reinstall the key and remove the old key using (for .rpm distributions):

rpm --import http://pkg.cloudflare.com/pubkey.gpg
rpm -e gpg-pubkey-8e5f9a5d-*

Using the TrustedXFF MediaWiki extension[edit]

The other option is to use Extension:TrustedXFF and list every one of CloudFlare's IP addresses manually. The list is a long one and may change with time. See [1] and [2] for more information on recovering the original user address.

Unfortunately, TrustedXFF hard-codes its IPv6 trusted proxy addresses (instead of putting them in configuration files) and TrustedXFF/generate.php will refuse an IPv4 range above a certain size (an /18 or wider will fail).

The IPv6 issue can be avoided by having CloudFlare connect to your server in IPv4 (the user still sees either IPv4 or IPv6, as CloudFlare is a usable IPv6 transition mechanism). The restriction on large IPv4 ranges is a bit more problematic.

One could add them to TrustedXFF.body.php in much the same way as IPv6 ranges were kludged/hard-coded into this file.

Drop the list of CloudFlare's IPv4 addresses into the code alongside the existing IPv6 list at the beginning of the class definition:

        // FIXME: IPv4 ranges of /18 or wider won't compile with generate.php, so list them unexpanded
        static $ipv4Ranges = array(
        199.27.128.0/21,
        173.245.48.0/20,
        103.21.244.0/22,
        103.22.200.0/22,
        103.31.4.0/22,
        141.101.64.0/18,
        108.162.192.0/18,
        190.93.240.0/20,
        188.114.96.0/20,
        197.234.240.0/22,
        198.41.128.0/17,
        162.158.0.0/15,
        104.16.0.0/12,
        172.64.0.0/13,
        131.0.72.0/22);

        // FIXME: IPv6 ranges need to be put here for now...

Scroll down to function isTrusted ($ip) and add a check for the hard-coded IPv4's ahead of the IPv6 check, as:

                // try IPv4 ranges
                foreach (self::$ipv4Ranges as $range)
                  if ( IP::isInRange ($ip, $range) )
                    return true;

                // Try IPv6 ranges...

Not a clean solution, but (until a proper CloudFlare extension is created for MediaWiki) this will get valid IP's into Special:Recentchanges and keep the addresses of the CloudFlare servers out of the logs... at least until the next time CloudFlare adds to its list of servers.

Removing old varnish[edit]

Any existing local caching servers (such as Squid or Varnish) will need to be removed when using mod_cloudflare; otherwise, mod_cloudflare will see Squid's address (which it doesn't recognise as a trusted proxy) and pass the request unchanged, at which point MediaWiki replaces the address of Squid with the next address upstream... CloudFlare. You can also try adding Squid to the list of trusted proxies by using the CloudFlareRemoteIPTrustedProxy directive.

As MediaWiki typically blocks abuse by IP address, showing CloudFlare as the address in Special:Recentchanges will interfere with efforts to selectively block spam and vandalism.

Cache control[edit]

To prevent CloudFlare from caching certain parts of your site, it is possible to use Page Rules.

By default, MediaWiki sets certain HTTP headers (such as "cache-control: private" for logged-in users, or "expires:" for cacheable pages) to control whether its dynamically-generated content is cached and when it should expire. For purely static content, such as image files, the headers are generated by the web server.

CloudFlare may respect these, depending on how it has been configured.[3] From My Websites → settings → Page rules it is possible to create three rules per domain (more for paid users). Each matches a pattern (such as *example.org/wiki/* for all wiki pages on example.org and its subdomains, in standard view) and allows configuration of:

  • Custom caching - allows the cache control headers to be overridden, "Cache everything" is the most aggressive and may be OK for static image files but certainly not wanted for anything else. Special:Recentchanges or other dynamic content should use the least aggressive settings.
  • Edge cache expire TTL - in "cache everything" mode, overrides the time after which CloudFlare requests a new copy of a page or file from the origin server. If used on anything but static content (images), this will cache things which shouldn't be cached as it can override cache control directives from the origin server.[4]
  • Browser cache expire TTL - indicates the time after which the user's browser should request a new copy of a page from CloudFlare

There's also an "Always Online" (which causes Cloudflare to serve cached copies of your pages if the origin server is down) and an "Origin Cache Control" (which tells Cloudflare to honour cache control headers sent by the origin server, such as the "no-cache" on pages served to logged-in users).

It's best to cache static image files aggressively ("cache everything" would be valid for images, were a means provided to purge them when a new version is uploaded). Ordinary wiki pages should use settings which respect cache-control headers ("standard, origin cache control on" is OK, "cache everything" is problematic) and special: pages should not be cached or cached only briefly (so that they remain current).

For example, these settings:

  1. /load.php?*
    Always Online: On, Cache Level: Cache Everything, Browser Cache TTL: a year, Edge Cache TTL: a month
  2. /wiki/*:*
    Cache Level: Bypass
  3. /wiki/*
    Always Online: On, Cache Level: Standard, Origin Cache Control: On

would cause:

  1. load.php to always be cached (which could easily reduce load on the origin server by 25-50%, as load.php serves CSS/JS which is constantly loaded on every page). Default settings are overridden to aggressively keep these pages in cache for the maximum amount of time.
  2. wiki pages outside mainspace (such as Special: Talk: Project:) to never be cached, so that Special:RecentChanges and the like show current data
  3. lastly, wiki pages in mainspace (which weren't matched by the previous rule) are cached as directed by the origin server (MediaWiki gives them a long expiry time with a "must-revalidate" status; pages served to logged-in users or displaying "you have new messages" banners must honour "no-cache" so they won't be served to subsequent visitors).

These settings presume that the standard/default behaviour for Cloudflare will cache obvious static files (.jpg, .png, .gif) but that Cloudflare needs to be explicitly told to cache output from load.php (which it otherwise would have mistaken for highly dynamic, variable content).

They are fairly aggressive, and may exhibit a couple of flaws:

  • Logged-in users may still be served cached mainspace pages in normal view as if they were logged out. While Cloudflare mentions a Bypass Cache on Cookie setting which could detect MediaWiki's login cookies and serve a fresh page to all logged-in users, that setting is only available on the most expensive paid plans, which (at hundreds of dollars per domain per month) are not a reasonable alternative for most small sites.
  • Uploads which replace an image by overwriting with another of the same name will not be detected; Cloudflare presumes this to be static content and caches aggressively. The standard MediaWiki+Squid/Varnish setup allows MediaWiki to send a HTTP PURGE message to discard the outdated content; no equivalent extension currently exists to send a purge message to Cloudflare's proprietary API.

Effectively, it's possible to reduce load on origin servers by up to 75% by caching content with Cloudflare, but this comes with a risk of serving outdated pages or images.

HTTP purge[edit]

When new content is uploaded to a MediaWiki, the wiki software will request that each cache server (as configured in $wgSquidServers) discard the outdated version of the page or image. This notification is not sent to third-party web proxies (such as those listed in Extension:TrustedXFF) and appears to be hard-coded in SquidPurgeClient.php in core MediaWiki code with no hooks to allow an extension to change this behaviour.

The message, as sent by MediaWiki, looks like: PURGE http://wiki.example.org/images/0/01/Some_image_recently_replaced.jpg

Squid will honour this request if it comes from a trusted IP address. CloudFlare likely will not.

There is a means to manually request one outdated file be discarded (from 'My Websites' settings for one domain → CloudFlare Settings → Cache Purge, click "purge single file" and enter URL of file to purge) but this is cumbersome and wildcards are not supported.[5]

CloudFlare's API provides a function which can purge an individual file,[6] but there is currently nothing available to invoke this API automatically from MediaWiki. As a result, a wiki behind CloudFlare will continue to display outdated versions of content until the data expires from cache normally. See bugzilla:62356.

CloudFlare does not offer a direct equivalent to Squid's negative_ttl parameter, which controls how long an error code returned for a URL from an origin server remains in cache (negative caching) until the request is retried. If CloudFlare does "cache everything" with a long time-to-live, it may be necessary to manually purge URL's where CloudFlare has stored an error message but the error has since been corrected upstream.

Mobile pages[edit]

Extension:MobileFrontend provides a configuration in which mobile browsers are autodetected. This can play havoc with external caching schemes; while Varnish may be configured to auto-detect a mobile browser and serve a different version of the page, the CloudFlare cache may be blissfully unaware that this is happening – or even serve the wrong version. It's best to turn off auto-detection and serve the mobile version from a different subdomain (ie: mobile.www.example.org for the mobile version, if the desktop site is www.example.org).

To turn off Mediawiki's mobile browser auto-detection, try:

$wgServer = "//www.example.org";
$wgCanonicalServer = "https://www.example.org";
$wgMobileUrlTemplate = "mobile.www.example.org";
$wgMFAutodetectMobileView = false;

This puts the mobile and desktop versions on separate subdomains.

Cloudflare can then be configured to provide mobile browser autodetection on their servers. Unfortunately, Cloudflare only provides this autodetection for the base domain (example.org) and the www. subdomain (www.example.org); any handheld mobile devices which visit these two portions of the site will be automatically redirected to some other subdomain (such as mobile.www.example.org) of the same domain. Any other subdomains (such as wiki.example.org or en.wiki.example.org) can be given mobile versions (m.wiki.example.org or m.en.wiki.example.org) but mobile browsers will not be autodetected by Cloudflare on anything but the base domain or the www subdomain. Visitors to en.wiki.example.org will instead need to click on the "mobile version" or "desktop version" links at the bottom of each webpage to select the desired version (an awkward limitation if you're serving multiple language wikis (or an entire wiki family) as subdomains of one main domain.

From the performance ("speed") tab on Cloudflare's web dashboard for each individual site, go to "Mobile Redirect", select a subdomain which contains the mobile version (for example, mobile.www.example.org), select "keep path" and turn the feature "on".

In the DNS and the Apache webserver configurations, the desktop and mobile sites point to the same MediaWiki installation; MediaWiki then examines the URL to determine whether Cloudflare is requesting the mobile version.

If MediaWiki doesn't correctly force the $wgMobileUrlTemplate subdomain into always-mobile mode, modifying ./extensions/MobileFrontend/includes/MobileContext.php to insert this additional check at the beginning of the usingMobileDomain() function may help:

        public function usingMobileDomain() {
                if (isset($_SERVER['SERVER_NAME']))
                   if ((substr($_SERVER['SERVER_NAME'],0,2)=='m.') || (substr($_SERVER['SERVER_NAME'],0,7)=='mobile.'))
                        return true;

...with the rest of the code unchanged. This should only be needed if traffic is reaching your m. or mobile. subdomain from clients which aren't being detected as mobile browsers.

Error 404 handling[edit]

By default, CloudFlare used to enable a "feature" which it brands as SmartErrors. This replaced the originating site's 'error 404' pages with a CloudFlare search page which listed other, related pages which do exist on the same site and allowed the user to search the web. This "smart error" page, which displayed in US English instead of the site's local language, may have contained advertising or directed users to obscure, little-known external web search engines.

MediaWiki, when given a request for a page which doesn't exist https://en.wikipedia.org/wiki/Like_this_one will display a custom error page. This page has the 404 code but invites the user to create the page (or indicates the page was deleted, protected, or "no such special page"). MediaWiki sites need this page left as-is so that clicking on a red link invites a user to create a new article. SmartErrors on every 404 page broke this.

Before 2017, SmartErrors could be disabled for an entire domain on initial import of the domain to CloudFlare or deactivated later on a per-domain basis from the list of domains by clicking 'Apps', scrolling down to "SmartErrors" and turning the app 'off'. They could also be turned off for specific paths using "page rules" in the settings for each domain. As of 2018, SmartErrors appears to have been removed from Cloudflare, resolving this issue.

Image hotlinking[edit]

Many web-based forums are infamous for encouraging users to link directly to images hosted on other sites. Their site appears to be displaying the image, but is using bandwidth that the 'hot-linked' target site's operators must pay for. A common defence is for Apache webmasters to deploy mod_rewrite to look at the "referer:" (sic) line of each request and reject any that hotlink images (.jpg, .png, .gif et al.) for use on pages on some other site.

With any form of caching server, the origin servers are no longer user-facing. Any anti-hotlinking code needs to be removed from the origin webserver (such as Apache); anti-hotlink protection is available on CloudFlare if needed.

HTML document comments[edit]

MediaWiki will normally add comments such as <!-- Served by wonky-server.example.org in 666 seconds --> to HTML output for troubleshooting purposes; if a site has multiple Apache's delivering the same content, this is invaluable in determining which server is causing a specific issue. CloudFlare is prone to "optimising" delivered webpages by stripping these comments, a function which may need to be turned off at times for debugging purposes.

HTTPS[edit]

CloudFlare may be used as a transition mechanism to convert between (insecure) http: and (secure) https: – while this does not provide end-to-end encryption (as it leaves CloudFlare's server in a "man-in-the-middle as a service" position) it's better than no https: at all.

All versions of MediaWiki from 1.18 onward support protocol-relative URLs, so that replacing a link like http://wiki.example.org with merely //wiki.example.org when loading images and resources will work properly in both HTTP and HTTPS. This change is not specific to CloudFlare, but affects any MediaWiki site available in both HTTP and HTTPS.

IPv6[edit]

All current MediaWiki versions are able to log an IPv6 user address to Special:RecentChanges if one is provided. No changes to MediaWiki's LocalSettings.php is required, although MediaWiki (and any extensions which rely on user IP's, such as CheckUser) needs to be a currently-supported version as some older (pre-MW1.19) revisions were buggy in logging IPv6 user addresses to Special:RecentChanges

IPv6 support is enabled on whichever server is facing the user; Apache for a stand-alone install, Squid/Varnish for sites using these as a reverse proxy, CloudFlare's server for sites where CloudFlare is user-facing. There is no requirement that the communication from the user-facing server back to Apache support both IPv4 and IPv6 and no benefit to enabling both for any back-end link.

My websites → settings → CloudFlare settings → Automatic IPv6 may be set to "full" for each domain to enable IPv6 for your site.

No other configuration is required for IPv6.

See also[edit]