Tugela Cache

Intro
As large MediaWiki deployments may gain performance using Memcached, at some level cost of RAM to store all objects becomes too high. In order to balance resource usage and make more use of our Apache server disks, Tugela, the distributed cached on-disk hash database, has arrived.

Sources can be found at Mediawiki CVS repository as a module tugelacache.

Design
Tugela Cache is derived from Memcached. Much of the code remains the same, but notably, these changes:


 * Internal slab allocator replaced by BerkeleyDB B-Tree database.
 * Expiry policy management moved to external program tugela-expire
 * Much statistics code made obsolete.

Build
Make sure you've got  (a memcached dependancy as well) installed, and try running. In case of failure, take a look at DBLIB variable in  and tune it to your system (RedHat-like systems may have just 'db').

Tugela
command line parameters are quite same as memcached's and can be found with -h switch. Though, two things, that differ are: -m mbytes - specifies not total store size, but in-memory cache, used by BerkeleyDB -f file   - specifies a database file -s secs   - force database sync this often

Tugela-expire
The cache expiration program does not have any network interface yet, but it's output can be sent with external commands like telnet or netcat to listening socket of cache daemon. Telnet might need some more tricks, like: (tugela-expire;sleep 1) | telnet localhost 11211

Available parameters are: -f file   - database file -o days   - purge all entries older than specified days -p prefix - touch only keys, starting with prefix

Database management
As on-disk file is a regular BerkeleyDB database, standard suite programs may be used for data management, statistics and analysis:
 * db_stat
 * db_verify
 * db_dump
 * db_restore

Questions & Answers
Q: Currently isn't there a MySQL cluster backing the existing memcached system. It seems like in both the cases of hitting the mysql cluster or the Tugela-file-store you'd be pretty much doing a search on a b[+*]-tree index.

A: We still can allow losing data on Tugelas, which means that every node can have different portion. There's no replication so it's much more lightweight protocol. MySQL query cache can't be efficient at our rates of updates and use of transactions.

Q: Another option for having a file-backed Memcached is to simply set the Memcached server processes to a size *larger* than physical ram. The OS will swap out the infrequently used pages. How does this approach compare to BDB?

A: bad bad bad approach. swap overhead is too high compared to how it can be handled by proper library. the issue with memcached is that is not designed to be swapped out, it's memory access patterns are quite different from BDB access patterns. [This doesn't seem true. A few memcached articles suggest 2G caches on 1G machiness; and hash-lookups are actually quite friendly data access for swapping (O(1) lookups that directly find the disk block to be swapped in). In contrast, walking a B-tree (like BDB) requires O(log(n)) pages to be swapped in; so the current memcached's slab+hash is probably more swap-friendly than BDB.]

Q: What about using an approach to like Varnish does were it "allocate some virtual memory, it tells the operating system to back this memory with space from a disk file."?

A: ...

Q: Above, you state that much of the statistics code that came in Memcached is made obsolete. Which statistics are obsolete and, more importantly, which statistics are still applicable?

A: ...

Q: Does placing Tugelas DB file in a Linux tmpfs partition improve, damage or make no difference in cache performance? (added on 20071120)

A: ...

Q: Does the -m switch affect the maximum size of the DB file? (added on 20071120)

A: ...

Q: Are you OK? Hey, guys, check if he has a MedAlert bracelet.

A: ...

Contact
All questions can be directed to Domas Mituzas or standard developer contact methods.