Facebook Open Academy/Cron

A common requirement in infrastructure maintenance is the ability to execute tasks at scheduled times and intervals. On Unix systems (and, by extension, Linux) this is traditionally handled by a cron daemon. Traditional crons, however, run on a single server and are therefore unscalable and create single points of failure. While there are a few open source alternatives to cron that provide for distributed scheduling, they either depend on a specific "cloud" management system or on other complex external dependencies; or are not generally compatible with cron.

Requirements
The Wikimedia Labs has a need for a scheduler that:


 * Is configurable by traditional crontabs;
 * Can run on more than one server, distributing execution between them; and
 * Guarantees that scheduled events execute as long as at least one server is operational.

The ideal distributed cron replacement would have as few external dependencies as possible.

Research
Some interesting avenues of investigations have already been mentioned in the related Bugzilla (which see), as well as possible alternatives and counterarguments.

What are the current solutions that exist and what lessons can be learned from them?
 * Chronos
 * Supports dependencies between jobs
 * Will retry failed jobs
 * One of multiple nodes is elected a master
 * Has many dependencies including Apache Mesos and Zookeeper
 * Cronie
 * If I understand the man page correctly:
 * It only allows jobs to be executed on one chosen server at a time
 * Must manually switch the chosen server if it goes down
 * Requires a network-mounted share for the directory containing the shared crontabs
 * (FYI, that requirement is met for Lab's particular use case, but would normally indeed be considered onerous) &mdash; Coren (talk)/(enwp) 06:22, 4 February 2014 (UTC)


 * Jenkins
 * Meant for continuous integration not job scheduling
 * Only allows one master (single point of failure)
 * Gearman
 * Framework for distributing tasks
 * Has fault tolerance and job retries
 * Would still require a scheduler and worker application for the APIs to be written
 * Seems to be better suited to executing jobs at arbitrary times rather than being scheduled

Language to use
Set to Python by fiat for expediency
 * Widely distributed, well known by a large development base
 * High availability of libraries
 * Many (most) Linux distributions default to it for system scripts
 * Python 2 or Python 3??? GLM

How to store and distribute the schedule between servers

 * Perhaps using a standby-sparing technique with multiple computers acting as hot spares to a single leader. JT

How to decide what server does in fact run the command when the time comes and how to synchronize that information

 * Elect quorum leader? Some other method?
 * Generate a random permutation of servers that gets distributed to each node for any given task. Notification of completion of a task is done linearly in order of that permutation. GLM
 * Quorum leader or first in queue runs the command unless it does not respond to messages sent by others. Else next in queue takes leadership and new hot spare is readied.

Are libraries available to solve subproblems?
Need to make a survey of:
 * Python's Distributed Wiki FC
 * dispy JT
 * Pyro JT
 * CronExpression FC
 * croniter FC
 * python-crontab FC
 * Celery JT

How bad is it if a job runs multiple times?

 * Even if the job's changes are idempotent, it should be avoided. By the time a job finishes, all other instances of the task should abort. JT

How late can a job run?

 * It's going to take some time for the servers to communicate that they've done a job, and if we're waiting on time outs and a server is scheduled last, it can be a while. GLM
 * If a job runs into the next scheduled time for the same job, the previous job should be aborted and the new job should run. If this happens continuously, it will be up to an administrator to increase the amount of time between same job schedules, or bound the maximum running time of the job. JT
 * Any troublesome jobs (job runs into the next scheduled time for the same job) can be have a hash/handle stored (I understand hashing may be slow), and can be dumped to a file with the number of times each job has collided with itself. This assumes we can detect such self-collisions. FC

How bad is it if a deleted job still gets ran?

 * If a server gets isolated and the crontab is updated, it'll run all the jobs on the old table in perpetuity for as long as it cannot connect to any of the other servers. GLM
 * If a job is deleted after its start time, it should run (unless deleted refers to "aborted"). Such cases would need to be handled manually by an administrator. JT
 * In a common case of deleting a job from the crontab during the job's idle period, it should not run on non-isolated servers (this definition depends on the technique we use for server selection). Otherwise, there is essentially no delete functionality provided. FC

Known TODO

 * 1) Parser API
 * 2) parse crontab (using python library)
 * 3) store time, command, last time attempted, which user account it belongs to, which workers are good in database
 * 4) Scheduler API
 * 5) scheduler/worker runs next job on heartbeat
 * 6) organize worker-to-database "scoreboard" communication

API

 * 1) Functions
 * 2) * getJobs [Obtains a list of Jobs that correspond 1:1 to each entry in every user's crontab]
 * 3) ** returns list of Jobs
 * 4) * getJobs(userId) [Obtains a list of Jobs that correspond 1:1 to each entry in a single specified user's crontab]
 * 5) **returns list of jobs
 * 6) * setJobs(Jobs) [Puts a list of Jobs into the datastore]
 * 7) * getSchedules(worker) [Obtains the Schedules that are currently assigned to a worker]
 * 8) ** returns list of Schedules
 * 9) * addSchedules(Schedules) [Puts a list of Schedule objects into the datastore]
 * 10) * removeSchedule(schedule) [Remove a schedule from the datastore]
 * 11) * getHeartbeat(worker) [Obtains the value of the heartbeat timestamp associated with the worker passed in as a parameter]
 * 12) ** returns timestamp
 * 13) * updateHeartbeat(worker) [Updates timestamp associated with worker passed in as a parameter to equal to current time from the datastore]
 * 14) * getWorkers [Obtains a list of all workers]
 * 15) ** returns list of workers
 * 16) * createWorker [Adds a new worker to the datastore]
 * 17) ** returns worker
 * 18) Objects
 * 19) * Job [A single parsed crontab row representing a single command to be run]
 * 20) ** String : interval [Ex: "* * * * *"]
 * 21) ** String : command [Ex: "foo.py"]
 * 22) ** String : userId
 * 23) ** Datetime : lastTimeRun
 * 24) * Schedule [A single element extracted from a Job, in a format that a worker can run]
 * 25) ** Worker : worker
 * 26) ** Datetime : timeToRun
 * 27) ** Job : job
 * 28) * Worker [A machine which Scheduler is a specific instance of, capable of performing crontab tasks]
 * 29) ** Datetime : heartbeat [timestamp of last contact]

Where can the code be tested in an isolated environment?
With appropriate credentials, you can ssh into @tools-login.wmflabs.org The command 'become megacron' changes the user to a shared account.

UPDATE Instead of ssh'ing to tools-login.wmflabs.org, Use the following server:

megacron-one.wmflabs.org

Then connect to: megacron-two.wmflabs.org megacron-three.wmflabs.org