Extension:WebCache/Security security settings

Security Overview
The WebCache extension operates in one of two distinct modes. The default mode is to link to an external cache and rely on that site for all administration and updates (spidering) of cached web pages. This mode is the default because it presents far fewer risks to both the server(s) and the users. This article deals with a second mode in which a cache is maintained local on the wiki server. This mode presents more risk and should only be implemented if the risks are well understood and appropriate diligence is applied in the configuration and administration of the Wiki server.

Extension Design Notes
First, a bit of background. This extension was initially designed for kiosk mode and primarily for mobile kiosk mode. As such, the core wiki server was always available, but general Internet access was variable. Sometimes the mobile kiosk would be set up at a site with full Internet access and could link directly to any page available on the Internet. Other times the mobile kiosk would be set up at a site with no Internet access and the desire was to provide similar, if not identical, user experience. It was for disconnected scenarios that this local system was implemented. For a standard Internet server, which will always be accessed over the public Internet, a local cache on the web server would be unnecessary as many well functioning web caches exist and serve this need better than this extension. Link rot would be the goal of web caches in this scenario and better handled by dedicated software.

To continue with the background, the mobile kiosk has tight controls on updates. A small group of authors has update privileges to the articles. All authors are educated on the risks that this cache extension's local mode introduces into a common web link on an article. This education servers as kind of a white list and black list, fully described below. A local cache would be wholly inappropriate for a publicly editable web site like Wikipedia. Some controls must exist on who can edit or contribute an article. So two requirements exist for this extension's local cache option to be contemplated: first, an uncertain connection to the Internet and, second, a controlled and disciplined set of authors.

The first implementation detail protects the server. While the files are stored locally on the server, they are never executed. This extension uses file copy APIs and also APIs to parse files as text. As such, the directories storing the cache do not require, and, thus, should not have, execute bit set. This is designed to protect the server from a complex, multi-pronged attack in which a web page contains some kind of malware which could be downloaded and then, by a separate attack, caused to run on the server. If execution is not enabled, presumably the hosting of the file on the server is lower risk.

Another design detail intended to protect the server is that all cached files resembling HTML files are stored using the extension .htm. This is designed to prevent later processing of the file by the server. This extension will merely read any file and write it to the browser with no parsing. By reducing the parsing on the server, the risk of unintentional code execution is reduced. So ASP files will be stored as HTML files and not further processed. On the other hand, embedded JavaScript or PHP will be stored with original file extension, so some exposure remains.

The next design detail intends to protect both the server as well as users. This cache extension provides web site white list and black list. Further, file types are also strictly managed in a white list manner, with explicit extensions needing to be recognized. Both these options are described in detail below. Again, a Wiki administrator should give great attention to both these lists before enabling a local cache.

The next design detail intends to protect the users. There is nothing in MediaWiki, nor in this extension's implementation, that requires elevated browser privileges. Since a web site using this software does not need to be in a higher security zone than any other server on the Internet. With the browser extending no elevated trust to the server, the risk to the browser is lower. If your web server hosts sites beyond the wiki, it would be worth while to evaluate deployment options such as port changes or DNS based options to isolate the cache on the wiki from the section of the web site requiring elevated privileges. Again, a Wiki administrator should evaluate browser security requirements before enabling a local cache.

The final design detail intends to give administrators the maximum level of control on their implementation through easy extension. While not fully implemented on the beta version, several easy extensions exist to allow administrators to extend the operation of this extension. Options include integration with external web security software or more advanced text parsing for security threats. All of this is in addition to integration with local web caches managed by a more robust web cache solution.

This extension can introduce security risks if proper procedures and configuration are not followed. These options are easily configurable and follow rather common sense rules. Enabling this cache will impose additional burdens on article authors and appropriate controls and education are appropriate. For the small niche of Wikis where this extension's local cache option is relevant, such controls easily mitigate the security risks.

Site Settings
When enabling local caching, the crucial feature for a Wiki administrator to set is the blacklist. This is a list of servers that the Wiki will not cache under any circumstance. Any sites the administrator does not want cached should appear on this list.

The list is configured using:. This an array of specific hosts. Note that blacklisting a top level domain (e.g. mediawiki.org) does not currently result in all web sites under that top level domain (e.g. www.mediawiki.org) refusing to be cached. While this is a desired improvement to this extension, individual hosts must be listed to be excluded. There are commercial services that can provide such list. A small amount of PHP in your LocalSettings.php would allow this list to be read from an external file. Extension points in this extension could be provided to simplify integration with external solutions and are a desired area of enhancement.

While not enabled by default, the local cache mode of this extension can also require a site be on a white list (i.e. an approved list of sites from which caching can occur). This is the more secure mode of operation as it requires you know, evaluate, and permit individual servers. The first step to enabling this mode is to set  to   in your LocalSettings.php file. Once enabled, the local cache subsystem of this extension will only allow hosts listed on the array of  in your LocalSettings.php to be cached locally on your Wiki server. As with black lists, the white list operates on a server by server basis and does not support top level domain white listing. Again, this could be a desired feature in future releases. A white list is appropriate for the kiosk mode goals described above if specific goals and administrative processes allow for this level of specificity in linking to pages external to the Wiki.

A corner case for site lists is where a host is both black listed and white listed. In this case, the black list will win. No site on the black list will ever be cached locally on the Wiki server. This should be an easy configuration change to LocalSettings.php.

Protocol Type Restrictions
While not specifically a security consideration, this extension supports only http and https links. This reduces some exposures like a spambot exploit that could use mailto: tags. Generally, http based protocols are some of the best defended on most networks, as transfers via ftp are generally more planned and exploits from gopher or similar are too old school to be defended. This could be changed in a future release and administrative configurability would be required to expand supported protocols.

File Type Settings
Likely the most difficult decisions to make will be related to file types allowed to be downloaded. The local cache part of this extension basically requires a white list. The first level of decision made is based on the link. Since a link can go to any number of file types, the local cache system starts by recognizing one of two categories: a direct file copy, for instance, an image file that will just be copied to be considered cached, or an HTML like file copy, in which stylesheets, JavaScript, images or other supporting files could be included. Files not meeting either of these two categories are not copied.

The first category of direct copy files includes those files that will not be parsed for additional links or supporting files and will be copied directly to the cache. The file extensions included by default are:

This array can be replaced or modified in LocalSettings.php. As a default, it is intentionally very conservative (although some could debate PDF's risk level). Many other types of documents could be allowed here including word processing documents like .doc or spreadsheets like .xls. This list is easily configurable and each administrator can easily customize to fit their risk assessment and usability needs.

The next category of files is those deemed as being HTML like. This decision is based exclusively on file extension. Files placed in this category (regardless of their original extension) are saved as .htm files to prevent later processing by the Wiki server. The decision of file being HTML like is based on the specific URL linked to from the Wiki. Note that those with no extension (presumed to be URL of a folder where an index.html type file is present) is allowed.

Any link possessing one of these extensions will be handed off to HTML DOM for processing. This array of file types can be replaced or amended in LocalSettings.php. Note that file extensions do not include the period (or full stop) that typically precedes them. It is just the extension as evaluated by PHP's pathinfo.

The final category of files that will be cached by the local cache option of this extension is those somehow linked to by an HTML page. The caching algorithm downloads three types of files, all based on the operation of the DOM. The first is those linked to by IMG tags. The second is those linked to by. The third is those linked to by  tags. Note that none of the files downloaded by these three types of links are parsed in any way. They are just copied. This is a security measure, but does mean that some pages using all features of these formats will not display correctly when served from the cache. One scenario would be where a stylesheet includes a link to an image. The image would not be downloaded and, therefore, not served or displayed.