Manual:File cache

From MediaWiki.org

(Redirected from File cache)
Jump to: navigation, search

MediaWiki has an optional simplistic scheme for caching the rendered HTML of article pages.

Contents

[edit] Operation and usage

File cache is enabled by setting three variables in LocalSettings.php:

$wgUseFileCache = true;
$wgFileCacheDirectory = "$IP/cache";
$wgShowIPinHeader = false;

This causes the rendered HTML webpage for each page of the wiki to be stored in an individual file on the hard disc. Any subsequent requests from anonymous users are met not by rendering the page again, but by simply sending the stored HTML version which is on disc. This saves time.

The generated HTML web text is stored to disc in directories under the $wgFileCacheDirectory and looks somewhat like:

-rw-r--r-- 1 apache apache 57421 Jan 7 01:58 cache/0/04/P%C3%A1gina_principal.html
-rw-r--r-- 1 apache apache 29608 Jan 4 16:37 cache/1/17/P%C3%A1gina_Riscada.html
-rw-r--r-- 1 apache apache 21592 Jan 3 07:27 cache/1/1c/P%C3%A1gina_Duplicada.html
-rw-r--r-- 1 apache apache 36088 Jan 7 02:03 cache/2/24/P%C3%A1gina_principal_alternativa.html
-rw-r--r-- 1 apache apache 44205 Jan 7 06:10 cache/a/a4/P%C3%A1gina_linkada.html
-rw-r--r-- 1 apache apache 24686 Jan 3 07:27 cache/d/db/P%C3%A1gina_Invertida.html
-rw-r--r-- 1 apache apache 17222 Jan 3 06:28 cache/f/f0/P%C3%A1gina_n%C3%A3o_encontrada.html

(In this example, the file cache directory is cache/ and the stored wiki pages are named Página ..., yielding one corresponding Página_....html file for each. One stored webpage for each page on the wiki, excluding special: pages and redirects.)

Enabling file cache does not affect logged-in users.

File cache tends to cache aggressively; there is no set expiry date for the cached pages and pages are cached unconditionally even if they contain variables, extensions and other changeable output.

In many cases, use of an external cache such as Squid is preferable to enabling file cache.

[edit] Domain and range

Caching is only done for users who:

  • are not logged in.
  • do not have their user_newtalk flag active.

This covers the vast majority of requests to the wiki!

Caching is only done for pages which:

  • are not special pages.
  • are not redirects.
  • are being viewed in current version, plain view, no diffs.

[edit] Validation

The modification time of the cached file is compared with the page_touched field for the given entry in the page table as well as a global $wgCacheEpoch timestamp set in LocalSettings.

If the file is at least as new as both of these, it is considered valid and is sent directly to the client. If it is older or does not exist, parsing and rendering continues and the results are saved for future use.

(This table column was named cur.cur_touched in MW1.4 and earlier).

[edit] Invalidation

The entire cache can be invalidated by setting $wgCacheEpoch to the current time, or of course one could delete all files in the cache.

Individual pages are invalidated by updating their cur_touched fields. This should be done on article creation, edit saves, renames, and creation and deletion of linked articles (in order to update edit links).

Some cases are not yet handled properly, which probably includes:

  • creation/deletion of talk pages
  • updating of images
  • templates and redlinks
  • browser 'reload' or 'refresh' (just reloads same cached page without updates)
  • output variables from extensions such as Extension:DynamicPageList or Extension:RandomSelection will not change on browser refresh
  • certain error messages, such as the ones when the database connection is lost during an enquiry, are cached indefinitely as if they were valid page content; only ?action=purge will remove these from the file cache once stored.
  • pages where variables in the URL itself are passed to extensions will not work correctly, for instance http://example.org/wiki/DPLforum:whatever?offset=30
  • ?

Items which are paginated (such as Category: pages if a category has more than 200 members) also display incorrectly. The category displays with (prev 200) (next 200) links but those links do not return additional category entries - they just return the page already being viewed.

[edit] Expiration

There should probably be some method of expiration of cache pages, particularly for pages containing variables (it is X date, we have X articles, etc).

In its present form, file cache in all MediaWiki versions (through to 1.12) ignores flags set by extensions to mark content as non-cacheable. There is no provision to set an expiry time, so all HTML for all pages is cached forever. An explicit ?action=purge command (or an edit to the page) will regenerate that one page, but neither the MediaWiki internal no-cache flags nor the browser refresh will remove outdated extension output once it has been stored as part of a file-cached page.

[edit] Refresh tab

It is possible to force one individual page to be invalidated and refreshed by using ?action=purge

Adding this code fragment to LocalSettings.php (or to a file included from there) will add a "refresh" tab to each page. Note that MediaWiki:Refresh must exist and be set to the name to be displayed as "refresh" in your wiki's local language:

#
# add page-refresh tab
#
$wgHooks['SkinTemplateContentActions'][] = 'wfContentRefreshHook';
 
function wfContentRefreshHook( &$content_actions ) {
    global $wgRequest, $wgRequest, $wgTitle;
 
    $action = $wgRequest->getText( 'action' );
 
    if ( $wgTitle->getNamespace() != NS_SPECIAL ) {
        $content_actions['purge'] = array(
            'class' => false,
            'text' => wfMsg( 'refresh' ),
            'href' => $wgTitle->getLocalUrl( 'action=purge' )
        );
    }
    return true;
}

[edit] Compression

Optionally, the cache may be compressed to save space and bandwidth. (This requires that zlib be enabled in the PHP config.)

If compression is enabled, the cache files are saved as .html.gz. Browsers that advertise support for gzip in their Accept-Encoding field will be given the gzipped version straight; for those browsers that don't, we unzip the data on the fly and send them the plaintext.

A "Vary: User-agent" header will be sent to tell proxy caches to be more careful about who it resends data to. ("Vary: Accept-encoding" would be more appropriate, but Internet Explorer refuses to cache pages so marked.)

[edit] Emergency fallback

If the wiki can't contact the database server, it will try to show the cached version of whatever page was requested, regardless of whether it may be current or not, with a "database is down" message tacked into it.

This has some limitations:

  • special pages are not covered in any way, there's just a warning message
  • redirect pages are not cached, so clicking a link to a redirect doesn't go through to the final destination
  • attempts to use non-view actions result in a plain page view, which may be confusing
  • there may be issues with the MySQL connection timeout which make it take prohibitively long before giving up, particularly if using persistent connections and the db dies later.


[edit] See also

Personal tools