Toolserver:Job scheduling

We are currently testing a new way to run tools on the Toolserver: batch job scheduling. Batch jobs will be familiar to many people who have used university, research or mainframe systems. The main difference between interactive jobs (which we currently use) and batch jobs is that when you run an interactive job, it starts immediately, runs to completion, then exits. A batch job is submitted to the job server; when sufficient system resources are available, the job server starts the job on a suitable (idle) server. The job might be suspended during execution if load is too high, and will resume when resources are available again. After submitting a batch job, you can log out and come back later when the job has finished to examine its output. If you like, you can ask to receive mail when the job starts or finished.

Batch jobs are primarily suited to regular scheduled jobs (e.g. tools which run from cron), and tools which are run occasionally and are not time-critical. While a batch job will normally be scheduled for execution immediately, if no system resources are free, it could be delayed. Long-running tasks (which are meant to run continuously for days or weeks) are not suitable for running as batch jobs. For users, the main advantage of batch jobs is that you do not need to worry about where to start a job, or whether the job needs to use a particular nice level, and so on; the job server will handle that for you. As we add more job execution servers to the cluster, your jobs will automatically take advantage of the new resources with no changes needed from you.

For admins, batch jobs give us tighter control over resources allocation and usage, and allow us to see more clearly how the Toolserver is being used.

The batch job software we have chosen is Sun Grid Engine. Full documentation for users is available here; some common examples are described below.

Queues and jobs
The basic method by which jobs are scheduled is the queue. A queue is a list of jobs which have been submitted. Each queue has a certain number of execution slots; when free slots are available, jobs are moved into them and begin running. When no more slots are available, jobs are queued, and will start when an execution slot is available.

Submitting a job
To submit a job, use the qsub command:

% qsub -N test test.sh

This submits the shell script "test.sh" as a job with the name "test". Giving the job a name is not required, but is recommended so you can easily identify the job. If you want to run a binary instead of a shell script, use qsub -b y.

If you want to receive mail when a job completes, use the -m e option:

% qsub -N test -m e test.sh

To receive mail when a job starts and when it finishes, use -m be.

When your job is finished, its output (stdout) will be written to the file test.oX, where test is the job name and X is the job ID. The job's standard error is written to test.eX. You can override this using the -o  and -e  options to qsub</tt>.

If you want the job to be scheduled immediately, rather than being queued, use the -now y</tt> option. If the job is unable to be scheduled immediately (because there are no available resources) it will fail.

Displaying jobs
To display your jobs, use the qstat</tt> command:

% qstat job-ID prior   name       user         state submit/start at     queue                          slots ja-task-ID -      8 0.55500 test       rriver       r     09/15/2009 16:15:51 all.q@willow.toolserver.org        1

Here, the job test is running (state r), in the all.q queue (the default) on willow.

Interactive jobs
An interactive job is a special kind of job which, instead of running a command, requests a shell on an idle system. To start an interactive job, use qlogin</tt>:

% qlogin Your job 11 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 11 has been successfully scheduled. Establishing builtin session to host willow ... Sun Microsystems Inc.  SunOS 5.10      Generic January 2005 %

To exit the interactive session, exit the shell (e.g. by typing CTRL+D).

Special queues
The default queue, all.q, includes all login servers in the cluster. If you have a job which can only be run on a particular type of host, you can request that it only be executed on a particular operating system.

To run a job on Solaris only:

% qsub -l arch=sol-amd64 -N test test.sh

To run a job on Linux only:

% qsub -l arch=lx24-amd64 -N test test.sh

You can also specify that a job runs on a particular server:

% qsub -l hostname=willow -N test test.sh

We strongly recommend that you write jobs which can run on either kind of server, and use the default queue. This will provide the most flexibility when scheduling your job. (In particular, we are unlikely to add more Linux servers, so writing jobs which also run on Solaris will increase the resources available to your jobs.)

Embedding options in the script
Instead of specifying options on the qsub</tt> command line, it is possible to embed these options in the script, using comment lines starting with #$</tt>. For example,

<rest of script...>
 * 1) ! /bin/sh
 * 2) Name the job "testing".
 * 3) $ -N testing
 * 4) Send email when job finishes.
 * 5) $ -m e
 * 6) Store output in a different place.
 * 7) $ -o /home/jsmith/testing.out
 * 8) Send errors to the normal output file instead of a separate error file.
 * 9) $ -j y

Scheduling SQL queries
When writing batch jobs that perform SQL queries, the most important resource is often available SQL capacity rather than CPU or memory. In this case, it is possible to specify that your job needs to run an SQL query on one or more clusters:

mysql -h sql-s1 -BNe 'select count(*) from revision' enwiki_p
 * 1) ! /bin/sh
 * 1) $ -N sqltest
 * 2) $ -l sqlprocs-s1=1

The line #$ -l sqlprocs-s1=1</tt> indicates that this script needs 1 execution slot on the sql-s1 cluster. If free slots are available, the job will run immediately; otherwise, it will wait for a slot to become available. You can also configure this on the qsub</tt> command line:

% qsub -l sqlprocs-s1=1 sql.sh

Currently, 10 SQL slots are configured for each server, and each query running for longer than 60 seconds counts as using a slot. Replication lag is currently not taken into account, but this will probably change soon.

Running jobs from cron
It is possible to invoke qsub</tt> from cron in order to schedule jobs regularly; this is preferable to simply running the job directly from cron, as it will handle resource allocation, and execute the job on an idle host.

However, if the job runs for a long time, or spends a long time in the queue, you need to avoid scheduling the job multiple times. (For example, if you run a job every 10 minutes, and it is queued for 15 minutes, another job will be scheduled before the first job has run.) Additionally, qsub</tt> needs some environment variables, such as $SGE_ROOT</tt>, which are not set in cron by default.

To avoid both these problems, we provide a script called cronsub</tt>, which you can run from cron like this:

0 3 * * * cronsub myjob $HOME/myjob.sh

On Linux, you need to specify the full path to cronsub</tt>:

0 3 * * * /usr/local/bin/cronsub myjob $HOME/myjob.sh

This will create a job called myjob</tt> which executes $HOME/myjob.sh</tt>; however, if a job with that name already exists, it will do nothing.

As an alternative to providing a script file, you can embed the script file in crontab, using %:

0 3 * * * /usr/local/bin/cronsub myjob % $HOME/dosomething % echo "Done!"

Each % is treated as a new line; lines after the first (the cronsub</tt> command) are sent as input to <tt>qsub</tt>, which treats it as a script to execute.

<tt>cronsub</tt> does not allow you to pass any additional options to <tt>qsub</tt>, but you can specify these in the script itself using the <tt>#$</tt> syntax described above.

Advanced features
Sun Grid Engine has several more advanced features, such as array jobs (automatically submitting the same job many times with different arguments), and job dependencies (specifying that a job cannot run until a different job has completed). For more information on these, see the documentation.