Redis
Redis is an open-source, networked, in-memory, key-value data store with optional durability, written in ANSI C. It can be used as an object cache backend for MediaWiki sites in order to boost performance, enable faster page loads and reduce database load. To set up redis, follow these instructions:
Set up Redis caching
[edit]Set up a Redis instance
[edit]If you have not done so already, you'll need to configure a Redis instance and install a Redis client library for PHP. Most environments require the phpredis PHP extension. On Debian / Ubuntu, you can install the requirements with the following command:
$ apt-get install redis-server php-redis
Configure Redis cache in LocalSettings.php
[edit]In your "LocalSettings.php" file, set something like:
$wgObjectCaches['redis'] = [
'class' => 'RedisBagOStuff',
'servers' => [ '127.0.0.1:6379' ],
// 'connectTimeout' => 1,
// 'persistent' => false,
// 'password' => 'secret',
// 'automaticFailOver' => true,
];
- Parameters explained
servers: An array of server names. A server name may be a hostname, a hostname/port combination or the absolute path of a UNIX socket. If a hostname is specified but no port, the standard port number 6379 will be used. Arrays keys can be used to specify the tag to hash on in place of the host/port. Required.connectTimeout: The timeout for new connections, in seconds. Optional, default is 1 second.persistent: Set this to true to allow connections to persist across multiple web requests. False by default.password: The authentication password, will be sent to Redis in clear text. Optional, if it is unspecified, no AUTH command will be sent.automaticFailover: If this is false, then each key will be mapped to a single server, and if that server is down, any requests for that key will fail. If this is true, a connection failure will cause the client to immediately try the next server in the list (as determined by a consistent hashing algorithm). This has the potential to create consistency issues if a server is slow enough to flap, for example if it is in swap death. True by default.
You will now be able to acquire a Redis object cache object via ObjectCache::getInstance( 'redis' ). If you'd like to use Redis as the default cache for various data, you may set any of the following configuration options:
$wgMainCacheType = 'redis';
Set up the job runner
[edit]In addition to redis, you also need a special job runner service that allows redis' job storage queues to be processed in 'daemonized mode': this will be used to purge abandoned jobs from redis and to re-schedule failed and delayed jobs.
Install and configure the job runner service
[edit]Follow mediawiki/services/jobrunner instructions in the link for details on how to install and configure the job runner. What follows is a summary:
Clone the git repository https://github.com/wikimedia/mediawiki-services-jobrunner into an appropriate location on the server, outside the web root, and run composer install --no-dev.
Create a configuration file named config.json, for example:
{
"groups": {
"basic": {
"runners": 0
}
},
"limits": {
},
"redis": {
"aggregators": [
"127.0.0.1:6379"
],
"queues": [
"127.0.0.1:6379"
]
},
"dispatcher": "nothing"
}
- Parameters explained (full documentation is still wanting)
groups- "basic" is the default group. The JSON sample has a number of additional groups for Parsoid, GWToolset, file uploads and the TimedMediaHandler extension.basicrunners: number of runner processes in this groupinclude: job types to include ("*" means "all")exclude: job types to exempt, useful when combined with "*". The JSON sample excludes some types from the "basic" group so that they can be managed by a dedicated group.low-priority(array): jobs that should be de-prioritised (e.g. cirrusSearchLinksUpdate)
limitsattempts: how many times to let jobs be recycled before abandoning
redisaggregators(array): ready queue trackers. This should match the 'servers' set in your$wgObjectCaches['redis']config.queues(array): main queue servers. Again, this should match the 'servers' set in your$wgObjectCaches['redis']config.
dispatcher: leave this unconfigured. We will leave that to MediaWiki's job queue handling (see below).
Make sure redisJobRunnerService and redisJobChronService are continuously running
[edit]This service provides two PHP-based background scripts that perform an infinite 'while' loop to manage jobs from the Redis job queue:
- redisJobRunnerService: a worker script used to process jobs from the Redis queue.
- redisJobChronService: a scheduler script that takes care of any time-based logic, such as when to run, retry, delay or clear jobs.
Both redisJobRunnerService and redisJobChronService must run continuously in the background using the same config file location.
$ php redisJobRunnerService --config-file=config.json
$ php redisJobChronService --config-file=config.json
// Always write --config-file with an equals sign (no space) and avoid quotes around the file name or file path.
If you want to check in advance that they are running correctly, you can run these scripts with the --verbose flag. If you see the error message "Failed to do periodic tasks for some queues", your config file may refer to the wrong location for the Redis server.
Option 1: set up a daemon (recommended)
[edit]If you have sufficient control over the server, configure a daemon to run them at server start.
Option 2: set up cronjobs
[edit]Not everyone will have the appropriate permissions on the host to set up a daemon. Even if you can execute a script, long-running scripts may get terminated. An alternative is to set up cronjobs instead. You would need to set up each cronjob as a watchdog that periodically checks if the process is running and if it isn't, re-execute the script. Here is an example for redisJobRunnerService that runs every 10 minutes and writes to a log file:
/10 * * * *
/usr/bin/pgrep -f "^/usr/local/bin/php /<path>/redisJobRunnerService" > /dev/null || /usr/local/bin/php /<path>/redisJobRunnerService --config-file=/<path>/config.json >> /<path>/jobrunner.log 2>&1 &
- Be aware that compared to CLI commands, cronjobs may come with slightly different requirements as to the syntax used and typically require full paths.
- Where the above uses
/usr/bin/pgrepand/usr/local/bin/php, check your system for the appropriate paths. - pgrep is used to check if the process is running. Do not omit the caret (^) because it is used to prevent pgrep from matching on itself, or write your own approach.
- The cronjob for redisJobRunnerChronService would look rather similar.
Configure job queue storage in LocalSettings.php
[edit]Configure $wgJobTypeConf.
$wgJobTypeConf['default'] = [
'class' => 'JobQueueRedis',
'redisServer' => '127.0.0.1:6379',
'redisConfig' => [],
'daemonized' => true
];
- Parameters explained
redisConfig: An array of parameters to RedisConnectionPool::__construct(). Note that the serializer option is ignored as "none" is always used. If the same Redis server is used as for$wgObjectCaches, the Redis password needs to be set here as well (see$wgObjectCachesconfig above).redisServer: A hostname/port combination or the absolute path of a UNIX socket. If a hostname is specified but no port, the standard port number 6379 will be used. Required.compression: The type of compression to use; one of (none,gzip).daemonized: Currently it doesn't support setting it to false.
If daemonized mode is working, jobs will be delivered to the Redis instance on the specified server.
Configure job queue
[edit]Finally, manually configure your job queue service to make sure jobs are actively running.
Use cases
[edit]- History of job queue runners at WMF on Wikitech.
- Nad's docu on setting up the job queue (partially outdated).
- example config found in Star Citizen Wiki's GitHub repo.
Further reading
[edit]General
[edit]- Official site (see esp. Introduction to Redis)
- The Redis article on the English Wikipedia.
- Redis/INCR
- Getting to Know Redis
- Redis, from the Ground Up
- Redis and Relational Data
- Interview with Salvatore Sanfilippo (code-oriented but still useful)
- Redis DB (Google Group)
Analytics
[edit]Tooling
[edit]- redis-py is the library of choice for Python
- Redis and Python (presentation slides)
- Resque for jobs
- Redisco, a Python ORM for Redis
- py-analytics (I haven't used this)
- redis-bitops Ruby gem for sparse bitmap operations
Informed Opinions
[edit]Miscellaneous
[edit]- Storing hundreds of millions of simple key-value pairs (how Instagram uses Redis)
- Key performance metrics to monitor for Redis