memcached

From MediaWiki.org
Jump to: navigation, search

memcached is a memory-based object store, developed originally to lighten the load on LiveJournal's database servers. memcached has been supported in MediaWiki since pre v1.1.

Memcached is likely more trouble than a small site will need (for a single server, consider storing both code and data in APC - CACHE_ACCEL), but for a larger site with heavy load, like Wikipedia, it should help lighten the load on the database servers by caching data and objects in memory.

Requirements[edit | edit source]

Memcached and libevent are under BSD-style licenses.

The server should run on Linux and other Unix-like systems. You can run multiple servers on one machine or on multiple machines on a network; storage can be distributed across multiple servers, and multiple web servers can use the same cache cluster.

Security[edit | edit source]

Memcached has no security or authentication. Please ensure that your server is appropriately firewalled, and that the port(s) used for memcached servers are not publicly accessible. Otherwise, anyone on the internet can put data into and read data from your cache.

An attacker familiar with MediaWiki internals could use this to give themselves developer access and delete all data from the wiki's database, as well as getting all users' password hashes and e-mail addresses.

Setup[edit | edit source]

If you want to start small, just run one memcached on your web server:

 memcached -d -l 127.0.0.1 -p 11211 -m 64

(to run in daemon mode, accessible only via loopback interface, on port 11211, using up to 64MB of memory)

In your LocalSettings.php file, set:

$wgMainCacheType = CACHE_MEMCACHED;
$wgParserCacheType = CACHE_MEMCACHED; # optional
$wgMessageCacheType = CACHE_MEMCACHED; # optional
$wgMemCachedServers = array( "127.0.0.1:11211" );
 
$wgSessionsInMemcached = true; # optional

The wiki should then use memcached to cache various data. To use multiple servers (physically separate boxes or multiple caches on one machine on a large-memory x86 box), just add more items to the array. To increase the weight of a server (say, because it has twice the memory of the others and you want to spread usage evenly), make its entry a subarray:

$wgMemCachedServers = array(
    "127.0.0.1:11211", # one gig on this box
    array("192.168.0.1:11211", 2 ) # two gigs on the other box
);

PHP client for memcached[edit | edit source]

As of this writing, MediaWiki includes version 1.0.10 of the PHP memcached client by Ryan Gilfether <hotrodder@rocketmail.com>. You'll find some documentation for it in the 'docs/php-memcached' subdirectory.

If you want to check for the latest released version, see http://code.google.com/p/memcached/wiki/Clients

MediaWiki use three objects for memcaching:

  • $wgMemc, controlled by $wgMainCacheType
  • $parserMemc, controlled by $wgParserCacheType
  • $messageMemc, controlled by $wgMessageCacheType

If you set one of the three control variables to CACHE_NONE (the default value for $wgMainCacheType), MediaWiki still creates a MemCacheClient, but requests to it are no-ops and will always fall through to the database. If the cache daemon can't be contacted, it should also disable itself fairly smoothly.

Troubleshooting[edit | edit source]

If you are using a version of MediaWiki prior to 1.12 and see a blank page after saving an edit with memcached, debugging, and display_errors enabled, you maybe be using an old version of the memcached MediaWiki client with a bug in it. See this fix by Tim Starling or update to MediaWiki version 1.13.

Using memcached in your code[edit | edit source]

If you're writing an extension that does expensive database queries, it might be useful to cache the data in memcached. The following code snippet demonstrates how to cache the results of a database query into memcached for 15 minutes and query memcached first for the results instead of the database.

class MyExtensionFooBars {
	public function getPopularTen() {
		global $wgMemc;
 
		// Try cache first
		$key = wfMemcKey( 'myextension', 'foobars', 'popular', '10' );  // This array is used to construct the key for the
				// cache in memcached. It must be unique to the query you are saving. The first value is normally
				// the extension or component name, and following values tell you what query you are saving.
		$data = $wgMemc->get( $key );
 
		if ( $data !== false ) {
			wfDebugLog( 'MyExtension', 'Got the ten most popular foobars from cache' );
		} else {
			wfDebugLog( 'MyExtension', 'Querying database for ten most popular foobars' );
			$dbr = wfGetDB( DB_SLAVE );
			$res = $dbr->select(
				// your database query goes here
				// see Database::select for docs
			);
 
			$data = array();
 
			foreach ( $res as $row ) {
				$data[] = array(
					// Do something with the data we
					// just got from the database
					// For example, if we looked up
					// page_id from the page table,
					// we could do this:
					'id' => $row->page_id
				);
			}
 
			// Cache the results in memcached for 15 minutes
			$wgMemc->set( $key, $data, 60 * 15 );
		}
		return $data;
	}
}

To delete a memcached key, pass the $key to $wgMemc->delete().

Old Development Notes[edit | edit source]

Broadly speaking, we'd like to be able to dump lots of data in the cache, use it whenever we can, and automatically expire it when changes are made.

Expiration model[edit | edit source]

  • explicit expiration times: memcached lets us set an expiration time on an object when we store it. After the time is up, another request for the object will find that it has expired and return nothing to us.
    • pro: last-ditch fallback to let data that could be updated badly eventually fall out of the cache
    • con: we have to know ahead of time when it will cease to be invalid. hard to do when we're dealing with user edits!
  • delete cached objects when we know we're doing something that will cause them to be invalid but are not in a position to update them while we're at it
    • pro: fairly simple; the item will be reloaded from the database and recached when it's next needed
    • con: if this will affect a large number of related items (for instance, creating or deleting a page invalidates the links/brokenlinks tables and rendered HTML cache of pages that link to that page) we may have to hunt them all down and do a lot of updating
  • include timestamps on cached objects and do our own expiries based on dependencies
    • pro: can expire many objects at once by updating a single node they depend on
    • con: more things to load; multiple dependencies could be trickier to work with

Questions & Answers[edit | edit source]

Q: The current plan is to deploy six load balanced Apaches, the likeliness that one of them renders the same page twice should be 1/6 of the current value, right?
A: Memcached is a shared cache between all Apaches, communication is done with TCP.

Q: Squid will cache the majority of content, reducing repetitions drastically. What's the point in memcached then?
A: The squid only replaces the anonymous cache. Memcached has far wider applicability, both currently implemented and potentially. -- Tim Starling 02:12, 16 Jan 2004 (UTC)

Q: Does Memcached have anything to do with your browser or your browser's cache?
A: NO!

Q: Can I have multiple clients written in different programming languages access the same Memcached server?
A: Of course.

Q: Can I search on part of a key or a regular expression on a Memcached server?
A: No, you can only search for an exact key if you need more information on what you could possibly do you can check out the Memcached protocol: http://cvs.danga.com/browse.cgi/wcmtools/memcached/doc/protocol.txt (or https://github.com/memcached/memcached/blob/master/doc/protocol.txt)

See also[edit | edit source]