Manual:Job queue/For developers

From MediaWiki.org
Jump to: navigation, search

Jobs are non-urgent tasks. For a general introduction and management of job queues, see Manual:Job queue.

Differences with DeferredUpdates[edit]

Deferred updates (also called deferrable updates) are functions executed at the end of a MediaWiki web request (or at the end of the execution of a job). For web requests, if supported by the web server, this work happens after the http response has been closed closed (see fastcgi_finish_request(), register_postsend_function(), and MediaWiki::doPostOutputShutdown()). Deferred updates are a useful way to postpone time-consuming tasks in order to speed up the main MediaWiki response. See the {class doclink|DeferredUpdates}} class for details, as well as Database transactions.

Deferrable updates will be executed at the end of the current request. They are memorised only within the web request.

Jobs will be executed at a later time, possibly several hours after the original web request. The job queue storage backend is configurable, but defaults to the job table in the wiki's main database. ($wgJobTypeConf)

Deferrable updates should be used for urgent things, and jobs for non-urgent things.

Some deferrable updates, implementing the EnqueueableDataUpdate interface, can be transformed to jobs. As of MediaWiki 1.29, any EnqueueableDataUpdate added during a web request, is automatically transformed to a job. Any EnqueueableDataUpdate added during the execution of another job, is initially stored as deferrable update to be executed immediately after current job is finished. However, if the job runner stores more than 100 deferred updates, any EnqueueableDataUpdate are converted to jobs and queued for later.

Registering a job[edit]

To use the job queue to do your non-urgent jobs, you need to do these things:

Create a Job subclass[edit]

You need to create a class, that, given parameters and a Title, will perform your deferred updates

<?php
class SynchroniseThreadArticleDataJob extends Job {
	public function __construct( $title, $params ) {
		// Replace synchroniseThreadArticleData with an identifier for your job.
		parent::__construct( 'synchroniseThreadArticleData', $title, $params );
	}

	/**
	 * Execute the job
	 *
	 * @return bool
	 */
	public function run() {
		// Load data from $this->params and $this->title
		$article = new Article( $this->title, 0 );
		$limit = $this->params['limit'];
		$cascade = $this->params['cascade'];

		// Perform your updates
		if ( $article ) {
			Threads::synchroniseArticleData( $article, $limit, $cascade );
		}

		return true;
	}
}

Add your Job class to the global list[edit]

Add the Job class to the global $wgJobClasses array. In extensions, this is usually done in the main extension file, e.g. /extensions/Echo/Echo.php. Make sure the key name is unique.

// The key is your job identifier (from the Job constructor), the value is your class name
$wgJobClasses['synchroniseThreadArticleData'] = 'SynchroniseThreadArticleDataJob';

If your extension uses extension.json descriptor, you can use its section JobClasses:

"JobClasses": {
	"synchroniseThreadArticleData": "SynchroniseThreadArticleDataJob"
},

How to invoke a job[edit]

/**
 * 1. Set any job parameters you want to have available when your job runs
 *
 *    this can also be an empty array()
 *    these values will be available to your job via $this->params['param_name']
 */
$jobParams = array( 'limit' => $limit, 'cascade' => true );


/**
 * 2. Get the article title that the job will use when running
 *
 *    if you will not use the title to create/modify a new/existing page, you can use :
 *    
 *    a vague, dumby title
 *    Title::newMainPage();
 *
 *    a more specific title
 *    Title::newFromText( 'User:UserName/SynchroniseThreadArticleData' )
 *
 *    a very specific title that includes a unique identifier. this can be useful
 *    when you create several batch jobs with the same base title
 *    Title::newFromText(
 *        User->getName() . '/' .
 *        'MyExtension/' .
 *        'My Batch Job/' .
 *        uniqid(),
 *        NS_USER
 *    ),
 *    
 *    the idea is for the db to have a title reference that will be used by your
 *    job to create/update a title or for troubleshooting by having a title
 *    reference that is not vague
 */
$title = $article->getTitle();


/**
 * 3. Instantiate a Job object
 */
$job = new SynchroniseThreadArticleDataJob( $title, $jobParams );


/**
 * 4. Insert the job into the database
 *    note the differences in the mediawiki versions
 *
 *    for performance reasons, if you plan on inserting several jobs into the queue,
 *    it’s best to add them to a single array and then push them all at once into the queue
 *
 *    for example, earlier in your code you have built up an array of $jobs with different 
 *    titles and jobParams
 *
 *    $jobs[] = new SynchroniseThreadArticleDataJob( $title, $jobParams );
 *    JobQueueGroup::singleton()->push( $jobs );
 */
$job->insert();                           // mediawiki < 1.21
JobQueueGroup::singleton()->push( $job ); // mediawiki >= 1.21

There is another function to push jobs, JobQueueGroup::lazyPush(), which will be executed at the very end, hence after jobs pushed with JobQueueGroup::push().

Other[edit]

Job queue type[edit]

A job queue type is the command name you give to the parent::__construct() method of your job class; e.g., using the example above, that would be synchroniseThreadArticleData.

getQueueSizes()[edit]

JobQueueGroup::singleton()->getQueueSizes() will return an array of all job queue types and their sizes.

Array
(
    [refreshLinks] => 1
    [refreshLinks2] => 3
    [synchroniseThreadArticleData] => 10
)

getSize()[edit]

While getQueueSizes() is handy for analysing the entire job queue, for performance reasons, it’s best to use JobQueueGroup::singleton()->get( <job type> )->getSize() when analysing a specific job type, which will only return the job queue size of that specific job type.

Array
(
    [synchroniseThreadArticleData] => 100
)

Internals[edit]

Pushing jobs[edit]

The primary function is JobQueueGroup::push(). It selects the job queue corresponding to the job type and, depending on the job queue implementation (database or Redis), it will be pushed either through a Redis connection (Redis case) either as a deferrable update (database case).

The lazy push function (JobQueueGroup::lazyPush()) keeps in memory the jobs. At the end of the current execution (end of MediaWiki request or end of the current job execution) the jobs kept in memory are pushed, as the last deferrable update (of type AutoCommitUpdate). As a deferrable update, the jobs are pushed at the end of the current execution, and as an AutoCommitUpdate the jobs are pushed as a single database transaction. See JobQueueGroup::lazyPush() and JobQueueGroup::pushLazyJobs() for details.

In CLI, note that deferrable updates (either from JobQueueGroup::push() (JobQueueDB implementation), either from JobQueueGroup::lazyPush()) are directly executed if the database transaction flag (LBFactory::hasTransactionRound()) is free. See DeferredUpdates::addUpdates() and DeferredUpdates::tryOpportunisticExecute() for details.

When some jobs are pushed through JobQueueGroup::lazyPush() but never really pushed (and hence lost), usually because an unhandled exception is thrown, the destructor of JobQueueGroup shows a warning in the debug log:

PHP Notice: JobQueueGroup::__destruct: 1 buffered job(s) never inserted

See task T100085 for an example of such a warning; this was before MediaWiki 1.29 release for Web-executed jobs, because when a job internally lazy-push a job and the former job is executed in the shutdown part of a MediaWiki request, the later job is not pushed (because JobQueueGroup::pushLazyJobs() was already called); the fix for this specific bug was to call JobQueueGroup::lazyPush() in JobRunner::executeJob() to always push lazily-pushed jobs after execution of each job.

Execution of jobs[edit]

Jobs are executed through two methods, depending on the parameter $wgJobRunRate (two cases: zero, greater than zero). If $wgJobRunRate > 0 MediaWiki executes some jobs at the end of the Web requests; if $wgJobRunRate == 0 nothing happens at the end of the Web requests. In all cases (but particularly important in the later case) jobs can be executed in CLI with maintenance/runJobs.php.

The jobs are run by the JobRunner class. Each job is given its own database transaction.

At the end of the job execution, deferrable updates are executed. Since MediaWiki 1.28.3/1.29 lazily-pushed jobs are pushed through a deferrable update in order to use a dedicated database transaction (with AutoCommitUpdate).