Toolserver:Jobserver

From mediawiki.org

This page was moved from the Toolserver wiki.
Toolserver has been replaced by Toolforge. As such, the instructions here may no longer work, but may still be of historical interest.
Please help by updating examples, links, template links, etc. If a page is still relevant, move it to a normal title and leave a redirect.

The jobserver is a centralised system that allows long-running jobs (i.e. tools) to be started easily, and restart on reboot, or if they crash. It also allows jobs to be scheduled for regular execution, replacing 'cron' and 'at'.

The jobserver only runs on Solaris, so it's only available on willow (not nightshade).

Veryquickstart[edit]

For a proper introduction to the jobserver, please read the job_intro(1) manual page.

% job add $HOME/test.sh
New job FMRI is job:/rriver/test_sh.
% job list
STATE      RSTATE    FMRI
disabled   stopped   job:/rriver/test_sh
% job show test_sh
job:/rriver/test_sh:
       state: disabled
      rstate: stopped
start method: /home/rriver/test.sh
 stop method: 
log rotation: size 1048576, keep 5
     on exit: disable,mail
     on fail: disable,mail
    on crash: disable,mail
     project: default
    schedule: -
      limits: -
% job enable test_sh
% job list
STATE     RSTATE    FMRI
enabled   running   job:/rriver/test_sh
% job disable test_sh
% job sched test_sh 'every monday at 03:00'
% job enable test_sh
% job list          
STATE               RSTATE    FMRI
scheduled/enabled   stopped   job:/rriver/test_sh

TODO / feature requests[edit]

Add more features here, if you want.

  • ACLs (for MMTs) (github #11)
  • Distributed Jobserver: start jobs across an array of machines
  • A way to limit the max wall clock time of a scheduled job (github #12)
  • A way to see the upcoming jobs in a specified time period (list all jobs set to run in the next day or next week) (github #8)
  • A way to replicate the */N system in cron (e.g. to schedule a job to run every 5 minutes)
  • run jobs monthly (some monthly statistic scripts should run at the first day of a month if procurable)
  • (minor) help for job add indicates -n is a valid option, but it isn't

Bugs[edit]

Report any of those here.

bug when setting the 'crash' exit action[edit]

When trying to modify the 'crash' exit action, it's the 'exit' property that gets modified.

[edit] : it seems the same thing happens when trying to set the 'fail' exit action.
stanlekub@willow:~$ job show job:/stanlekub/adqtable
job:/stanlekub/adqtable:
       state: scheduled/enabled
      rstate: stopped
start method: /home/stanlekub/adqtable.sh
 stop method:
    schedule: every day at 06:45
              (in 15h50m)
     project: batch
  log format: %h/.job/%f.log
log rotation: size 1048576, keep 5
     on exit: restart
     on fail: restart,mail
    on crash: disable,mail
      limits: -
stanlekub@willow:~$ job set job:/stanlekub/adqtable crash=disable
stanlekub@willow:~$ job show job:/stanlekub/adqtable
job:/stanlekub/adqtable:
       state: scheduled/enabled
      rstate: stopped
start method: /home/stanlekub/adqtable.sh
 stop method:
    schedule: every day at 06:45
              (in 15h50m)
     project: batch
  log format: %h/.job/%f.log
log rotation: size 1048576, keep 5
     on exit: disable
     on fail: restart,mail
    on crash: disable,mail
      limits: -

Category:Documentation