Toolserver:Job scheduling


 * For the newtask command on Solaris, see batch project

Job scheduling is the primary method by which tools should be started on the Toolserver. Jobs (i.e., tools) are submitted to the scheduler, which then starts the job on an appropriate host, based on factors like current load. Using batch scheduling means you don't need to worry about where to start a job, or whether the job should be started during off-peak hours, etc. Job scheduling can be used for any sort of tools, whether they're one-off jobs, tools like bots which need to run permanently, or regular jobs run from cron.

While it's possible to run jobs on a server directly, without using job scheduling, this is strongly discouraged, since it makes it harder for the Toolserver administrators to manage server resources and load.

Job scheduling works using queues. When jobs are submitted, they are placed in a queue. When there are sufficient free system resources to execute a job, it is removed from the queue and starts running. If the system is busy, there might be no free resources, and jobs will be queued until more resources become available. At present, it's very unlikely that jobs will be queued in this way, since we have plenty of free resources.

Submitting jobs
To submit a job, use the qsub command: $ qsub $HOME/mytool.py Your job 80570 ("mytool.py") has been submitted The job ID is 80570, and the job name is "mytool.py". The scheduler will place the job in the default queue, and eventually run it on a suitable host. Once the job has finished, it will be removed from the system.

You can use qstat to see the job running: willow% qstat job-ID prior   name        user         state submit/start at     queue           slots ja-task-ID ---  80576 0.56000 mytool.py   rriver       r     11/17/2010 08:16:10 all.q@wolfsbane     1 If the job produced output, this will be saved in $HOME/mytool.py.o80570 for normal output (stdout), and $HOME/mytool.py.e80570 for errors (stderr). Having two separate files with effectively random names is not always very helpful, so you can force the output to go to a single file when submitting the job: $ qsub -j y -o $HOME/mytool.out $HOME/mytool.py The -j y argument forces all output to go to a single file, and -o specifies the location of the file.

Rather than specifying arguments to qsub every time the job is run, you can instead put them in the script itself, using special directives starting with #$</tt>: ... rest of script ...
 * 1) ! /usr/bin/python
 * 2) $ -j y
 * 3) $ -o $HOME/mytool.out

If you want to receive mail when a job finishes, use -m e</tt>. To receive mail when a job starts and when it finishes, use -m be</tt>.

By default, jobs are limited to 6 hours runtime. If a job runs for longer than this, it will be killed. This is done to prevent runaway jobs accidentally using a large amount of system resources. If you expect your job to need longer than 6 hours to complete, you can request more time using -l</tt>: $ qsub -l h_rt=24:00:00 slowjob.py  # allow up to 24 hours runtime You cannot request more than 120 hours (5 days).

Since jobs can be started on any host, it's possible that they will be started on either a Linux or Solaris server. If your tool can only run on Solaris, you can request that it only be started on a Solaris host: $ qsub -l arch=sol-amd64 soljob.py You can also request a Linux host using -l arch=lx24-amd64</tt>, but we will be converting the last Linux host to Solaris in January 2011, so this is not recommended.

Submitting jobs from cron
While it's sometimes useful to run a single job from the command line, most tools need to run regularly, using cron. To make it easier to run tools from cron, we provide a script called cronsub</tt>, which should be used like this: $ cronsub <command ...> For example, if you wanted test.py</tt> to run at 0300h UTC every day, you could use an entry like this in your crontab: 0 3 * * * cronsub mytool $HOME/mytool.py Among other things, cronsub</tt> will prevent a job from running if a job of the same name already exists. This means that if your job is queued, or takes longer to run than expected, a second duplicate job won't be started.

NB: on Linux (nightshade</tt>), you need to use /usr/local/bin/cronsub</tt> instead.

Submitting long-running jobs
Some tools, like bots, are meant to run continuously, and restart if they exit. These tools are not suitable for running in the default queue (all.q</tt>); instead, we provide a separate queue called longrun</tt>. To start a job in the longrun</tt> queue: $ qsub -q longrun $HOME/longtool.py However, a better way to start such tools is using cronsub</tt>. Since cronsub</tt> won't start duplicate jobs, you can try to start your long-running tools regularly (for example, every 10 minutes); if the job is running, nothing will happen, but if it has exited for some reason, it will be restarted. An example of using cronsub</tt> this way might be: 0,10,20,30,40,50 * * * * cronsub -l longtool $HOME/longtool.py This will run <tt>cronsub</tt> every 10 minutes. The <tt>-l</tt> argument instructs <tt>cronsub</tt> to start the job in the <tt>longrun</tt> queue.

Scheduling SQL queries
When writing batch jobs that perform SQL queries, the most important resource is often available SQL capacity rather than CPU or memory. In this case, it is possible to specify that your job needs to run an SQL query on one or more clusters:

mysql -h sql-s1 -BNe 'select count(*) from revision' enwiki_p
 * 1) ! /bin/sh
 * 2) $ -N sqltest
 * 3) $ -l sqlprocs-s1=1

The line <tt>#$ -l sqlprocs-s1=1</tt> indicates that this script needs 1 execution slot on the sql-s1 cluster. If free slots are available, the job will run immediately; otherwise, it will wait for a slot to become available. You can also configure this on the <tt>qsub</tt> command line:

% qsub -l sqlprocs-s1=1 sql.sh

Currently, 10 SQL slots are configured for each server, and each query running for longer than 60 seconds counts as using a slot. Replication lag is currently not taken into account, but this will probably change soon.

Note: For long-running jobs (as opposed to jobs which run once then exit), do not reserve any SQL slots; since the program runs continuously, it will take the slots forever and prevent other jobs from running.

Allowing jobs to be automatically restarted or migrated
By default, when a cluster node crashes or reboots, all jobs on it are terminated and will not be restarted, because it's not always safe to restart a job that was previously running. If you would like your job to be restarted when this happens, you can start it as a restartable job using <tt>-r y </tt>. There is no need to do this for jobs in the <tt>longrun</tt> queue, since jobs in that queue are restartable by default.

Migration allows jobs to be moved between nodes while they're running, which improves load distribution and results in better performance. Migration relies on checkpointing -- the ability of a job to save its state and resume when restarted.

We do not provide any automatic checkpointing system; if you wish your job to be migrated, you need to implement this yourself. Examples of jobs that are suitable for migration include: Most jobs in the <tt>longrun</tt> queue are probably suitable for migration, but it is not be enabled by default. To mark a job as a checkpointing (migratable) job, start it with the <tt>-ckpt default</tt> argument.
 * Jobs which work by removing work items from a queue and processing them; when migrated, the job just starts from the top of the queue
 * Jobs which are event-based and wait for work to do, e.g. most IRC bots or recentchanges bots
 * Jobs which regularly save their working state and can resume from the saved state if they are restarted

Binaries
Jobs are assumed to be textual scripts of some sort, e.g. shell scripts, Python, Perl, etc. However, it is possible to submit a binary executable as a job: $ qsub -b y $HOME/mybinary However, if you do this you cannot specify arguments to <tt>qsub</tt> in the executable. You might find it easier to create a shell script the wraps the executable.

Managing jobs
To list all your running jobs, use <tt>qstat</tt>:

job-ID prior   name        user         state submit/start at     queue           slots ja-task-ID ---  80576 0.56000 mytool.py   rriver       r     11/17/2010 08:16:10 all.q@wolfsbane     1

This indicates that the job is running (r) in <tt>all.q</tt> (the default queue) on <tt>wolfsbane</tt>.

Deleting jobs
To delete jobs, use the <tt>qdel</tt> command:

% qdel

If the job is currently running (rather than queued), this will terminate it.

Suspending and unsuspending jobs
Suspending a job allows it to be temporarily paused, and then resumed later. To suspend a job:

% qmod -sj

The job will be paused by sending it SIGSTOP, and will have the 's' state in <tt>qstat</tt>.

To unsuspend the job and let it continue running:

% qmod -usj

Advanced features
Sun Grid Engine has several more advanced features, such as array jobs (automatically submitting the same job many times with different arguments), and job dependencies (specifying that a job cannot run until a different job has completed). For more information on these, see the documentation.