Toolserver:Jobserver

The jobserver is a centralised system that allows long-running jobs (i.e. tools) to be started easily, and restart on reboot, or if they crash.

The jobserver only runs on Solaris, so it's only available on willow (not nightshade).

Simple usage
Start a new job:

$ job -e add $HOME/myjob.pl New job ID is 3. $ job list ID NAME        USER             STATE      RSTATE     CMD 3 myjob.pl    jsmith           enabled    running    /home/jsmith/myjob.pl

Stop a running job:

$ job disable 3 $ job list ID NAME        USER             STATE      RSTATE     CMD 3 myjob.pl    jsmith           disabled   stopped    /home/jsmith/myjob.pl

Delete a job: $ job delete 3 $ job list No jobs defined.

By default, the job server will restart the job on reboot. If it exits or crashes, it will not be restarted. Currently this behaviour cannot be defined.

A crash is defined as the job, or any process it starts, dumping core or exiting with a crash signal (e.g. SIGSEGV). If the first job process exits, the job server will stop the entire job.

Stopping jobs
By default, the jobserver will stop a job by sending SIGTERM, waiting 30 seconds, then sending SIGKILL to any remaining processes. Most tools will exit sensibly when given SIGTERM, but some might require special handling. For these jobs, you can set a stop method:

$ job set 3 stop=$HOME/stop_tool.pl

The jobserver will execute your stop method instead of sending a signal. It will still send SIGKILL after 30 seconds if the job hasn't exited.

To remove a stop method:

$ job set 3 stop=

Logs
Output (stdout and stderr) from each job is sent to $HOME/.job/job_.log. The job server will automatically rotate the log file when it reaches 1MB in size.

TODO / feature requests
Add more features here, if you want.


 * Allow user to specify action on exit/crash/fail
 * Allow log rotation behaviour to be configured
 * Allow jobs to be scheduled (to replace cron)
 * Distributed Jobserver: start jobs across an array of machines

Bugs
Report any of those here.