Manual:Job queue/For developers

From mediawiki.org

Jobs are non-urgent tasks. For a general introduction and management of job queues, see Manual:Job queue.

Deferred updates[edit]

Deferred updates (or deferrable updates) are a useful way to postpone time-consuming tasks in order to speed up the main MediaWiki response. Refer to DeferredUpdates class API and Database transactions for how to use these.

Deferred updates are represented as a callable functions that we queue in an array, and then call at the end of the MediaWiki PHP process. Typically the call will take place after finishing the response to a web request (e.g. echo and flush everything to the browser), but before we actually exit or return to the web server. This is internally powered by fastcgi_finish_request() in MediaWiki::doPostOutputShutdown().

Deferrable updates are executed at the end of the current process. They are only memorised within that same web request (or other process, such as CLI maintenance scripts).

This unlike jobs which are scheduled via a persistent storage backend, to then run some minutes or hours in the future, independent of and after the original request that queued the job. The job queue in MediaWiki is a pluggable service. The default backend is to add jobs to the job table in the wiki's main database. The default job runner is to execute upto one job at the end of random page views.

More information:

Which one to use?[edit]

Deferrable updates should be used for tasks that generally take only a few milliseconds to complete as a way to speed up the web response. By nature of being deferred, this means that failure is hidden from clients since the response has already been sent.

Examples of critical tasks that we don't run via deferred updates. Failure must be known to users, and more generally people should know how and when their action was completed, to then act further knowing that the change is completed. E.g. make further edits that depend on previous ones, possibly scripted or batched through some automation.

  • Database write that creates a page or saves an edit.
  • Create account, change password.
  • Explicit "send email" feature.

Examples of "urgent" tasks that we run via post-response deferred updates after saving an edit. These small transactions are expected to be reflected if the client looks for it afterward, but the result of these is not needed to render the response to the edit itself.

Examples of "non-urgent" tasks that we run via the job queue:

  • After saving an edit to a template, iterate through potentially millions of affected pages to re-parse and purge (known as "Refresh links" or LinksUpdate).
  • Periodically prune old rows from the recent changes table.
  • After uploading a photo, pre-render common thumbnail sizes.
  • After saving an edit to an article, send emails to the accounts that watch this page with email notifications enabled.

Fallback[edit]

Deferrable updates can choose to implement the EnqueueableDataUpdate interface. Such updates can be automatically converted to a job as-needed. For example, if the update fails, MediaWiki will convert it to a job and queue it to try again later. There are also other situations in which we improve reliability or optimise throughput by proactively converting updates to jobs where possible.

Since any MediaWiki code can queue deferred updates, it is also possible for a CLI maintenance script or job to implicitly built up a list of deferred updates. If these batch operations end up queuing a lot of updates, MediaWiki will proactively convert tasks to jobs where possible (handled by the DeferredUpdates class internally).

Use jobs if you need to save data in the context of a GET request[edit]

For scalability and performance reasons, MediaWiki developers should generally not perform database writes during page views or other GET requests. If this becomes difficult to avoid, check the Backend performance guidelines first and consider seeking advice from other developers or the Performance Team for how to approach the problem in a different way.

Note that large wiki farms (such as Wikimedia) may operate from multiple data centers and thus run GET requests (which don't expect database writes) from a secondary data center, which should be able to respond to such requests without relying on communicating to the primary DC.

If you're reasonably certain that your feature will only rarely discover during a GET request the need for a database write, and if the write is not urgent, then one option you do have is to queue a job during a GET request. Job queues can be buffered and synced across datacenters asynchronously and thus do not require immediate cross-DC communication. You can then rely on the job eventually being transmitted to the primary DC where it will then execute at some point in the future.

Deferred updates should not be used to perform database writes after a GET request. Attempting this will log a DBPerformance warning message.

Registering a job[edit]

To use the job queue to do your non-urgent jobs, you need to do these things:

Create a Job subclass[edit]

You need to create a class that will perform your deferred updates

<?php
namespace MediaWiki\Extension\MyExt\Job\SomeExpensiveOperationJob;

class SomeExpensiveOperationJob extends Job {
	public function __construct( array $params ) {
		// Replace someExpensiveOperation with an identifier for your job.
		parent::__construct( 'someExpensiveOperation', $params );
	}

	/**
	 * @inheritDoc
	 */
	public function run() {
		$lb = MediaWikiServices::getInstance()->getDBLoadBalancer();
		$dbw = $lb->getConnectionRef( DB_PRIMARY );

		$dbw->update(
			'mytable',
			[ 'foo' => $this->params['foo'] ],
			[ 'bar' => $this->params['bar'] ],
			__METHOD__
		);

		return true;
	}
}

Add your Job class to the global list[edit]

Add the Job class to the global $wgJobClasses array. In extensions, this is done in the extension.json file, and in core it's done in DefaultSettings.php. The key must be unique and match the value in the job's constructor, and the value is the class name.

How to queue a job[edit]

/**
 * 1. Access the JobQueueGroup for the current wiki
 *
 * For MW 1.36 and earlier, call JobQueueGroup::singleton() instead.
 */
$jobQueueGroup = MediaWikiServices::getInstance()->getJobQueueGroupFactory()->makeJobQueueGroup();

/**
 * 2. Create a Job object
 *
 * Construct the subclass, and pass the relevant parameters.
 *
 * These will be available as $this->params in your Job class when
 * it executes later.
 *
 * Some older jobs require a $title parameter even if they internally ignore it.
 * In that case, you may pass Title::newMainPage().
 */
$title = Title::newFromText( 'User:Example/Foobar' ); // or $example->getTitle()
$job = new MyDataJob( $title, [
  'example' => true,
  'mydata' => [ 'x' ],
] );

/**
 * 3. Push the job into the queue
 */
$jobQueueGroup->lazyPush( $job );

Queuing via JobQueueGroup::lazyPush() allows MediaWiki to send the job in a batch together with any other queued jobs at the end of the web response. If queuing is inseparable from your request's main purpose, and would like any queuing failure to result in (for example) any database writes to be rolled back and an error page presented to the user, then consider calling JobQueueGroup::push() instead.

Other[edit]

Job queue type[edit]

A job queue type is the command name you give to the parent::__construct() method of your job class; e.g., using the example above, that would be synchroniseThreadArticleData.

getQueueSizes()[edit]

JobQueueGroup::singleton()->getQueueSizes() will return an array of all job queue types and their sizes.

Array
(
    [refreshLinks] => 1
    [refreshLinks2] => 3
    [synchroniseThreadArticleData] => 10
)

getSize()[edit]

While getQueueSizes() is handy for analysing the entire job queue, for performance reasons, it’s best to use JobQueueGroup::singleton()->get( <job type> )->getSize() when analysing a specific job type, which will only return the job queue size of that specific job type.

Array
(
    [synchroniseThreadArticleData] => 100
)

Internals[edit]

Pushing jobs[edit]

The primary function is JobQueueGroup::push(). It selects the job queue corresponding to the job type and, depending on the job queue implementation (database or Redis), it will be pushed either through a Redis connection (Redis case) either as a deferrable update (database case).

The lazy push function (JobQueueGroup::lazyPush()) keeps in memory the jobs. At the end of the current execution (end of MediaWiki request or end of the current job execution) the jobs kept in memory are pushed, as the last deferrable update (of type AutoCommitUpdate). As a deferrable update, the jobs are pushed at the end of the current execution, and as an AutoCommitUpdate the jobs are pushed as a single database transaction. See JobQueueGroup::lazyPush() and JobQueueGroup::pushLazyJobs() for details.

In CLI, note that deferrable updates (either from JobQueueGroup::push() (JobQueueDB implementation), either from JobQueueGroup::lazyPush()) are directly executed if the database transaction flag (LBFactory::hasTransactionRound()) is free. See DeferredUpdates::addUpdates() and DeferredUpdates::tryOpportunisticExecute() for details.

When some jobs are pushed through JobQueueGroup::lazyPush() but never really pushed (and hence lost), usually because an unhandled exception is thrown, the destructor of JobQueueGroup shows a warning in the debug log:

PHP Notice: JobQueueGroup::__destruct: 1 buffered job(s) never inserted

See task T100085 for an example of such a warning; this was before MediaWiki 1.29 release for Web-executed jobs, because when a job internally lazy-push a job and the former job is executed in the shutdown part of a MediaWiki request, the later job is not pushed (because JobQueueGroup::pushLazyJobs() was already called); the fix for this specific bug was to call JobQueueGroup::lazyPush() in JobRunner::executeJob() to always push lazily-pushed jobs after execution of each job.

Execution of jobs[edit]

Jobs are ordinarily executed at the end of a web request, at the rate of $wgJobRunRate per request. If $wgJobRunRate == 0, no jobs are run at the end of a web request. The default value of $wgJobRunRate is 1.

All enqueued jobs can be executed at any time by running maintenance/runJobs.php. This is particularly important when $wgJobRunRate == 0.

The jobs are run by the JobRunner class. Each job is given its own database transaction.

At the end of the job execution, deferrable updates are executed. Since MediaWiki 1.28.3/1.29 lazily-pushed jobs are pushed through a deferrable update in order to use a dedicated database transaction (with AutoCommitUpdate).