Toolserver:Jobserver

The jobserver is a centralised system that allows long-running jobs (i.e. tools) to be started easily, and restart on reboot, or if they crash. It also allows jobs to be scheduled for regular execution, replacing 'cron' and 'at'.

The jobserver only runs on Solaris, so it's only available on willow (not nightshade).

Veryquickstart
For a proper introduction to the jobserver, please read the job_intro(1) manual page.

% job add $HOME/test.sh New job FMRI is job:/rriver/test_sh. % job list STATE     RSTATE    FMRI disabled  stopped   job:/rriver/test_sh % job show test_sh job:/rriver/test_sh: state: disabled rstate: stopped start method: /home/rriver/test.sh stop method: log rotation: size 1048576, keep 5 on exit: disable,mail on fail: disable,mail on crash: disable,mail project: default schedule: - limits: - % job enable test_sh % job list STATE    RSTATE    FMRI enabled  running   job:/rriver/test_sh % job disable test_sh % job sched test_sh 'every monday at 03:00' % job enable test_sh % job list STATE              RSTATE    FMRI scheduled/enabled  stopped   job:/rriver/test_sh

TODO / feature requests
Add more features here, if you want.


 * ACLs (for MMTs) (github #11)
 * Distributed Jobserver: start jobs across an array of machines
 * A way to limit the max wall clock time of a scheduled job (github #12)
 * A way to see the upcoming jobs in a specified time period (list all jobs set to run in the next day or next week) (github #8)
 * A way to replicate the */N system in cron (e.g. to schedule a job to run every 5 minutes)
 * run jobs monthly (some monthly statistic scripts should run at the first day of a month if procurable)
 * (minor) help for job add indicates -n is a valid option, but it isn't

Bugs
Report any of those here.

bug when setting the 'crash' exit action
When trying to modify the 'crash' exit action, it's the 'exit' property that gets modified.
 * [edit] : it seems the same thing happens when trying to set the 'fail' exit action.

stanlekub@willow:~$ job show job:/stanlekub/adqtable job:/stanlekub/adqtable: state: scheduled/enabled rstate: stopped start method: /home/stanlekub/adqtable.sh stop method: schedule: every day at 06:45 (in 15h50m) project: batch log format: %h/.job/%f.log log rotation: size 1048576, keep 5 on exit: restart on fail: restart,mail on crash: disable,mail limits: - stanlekub@willow:~$ job set job:/stanlekub/adqtable crash=disable stanlekub@willow:~$ job show job:/stanlekub/adqtable job:/stanlekub/adqtable: state: scheduled/enabled rstate: stopped start method: /home/stanlekub/adqtable.sh stop method: schedule: every day at 06:45 (in 15h50m) project: batch log format: %h/.job/%f.log log rotation: size 1048576, keep 5 on exit: disable on fail: restart,mail on crash: disable,mail limits: -