Talk:Requests for comment/Job queue redesign

If this redesign was carried out, please add a note saying that this is the new design as of version # 1.xxx. Thanks.

Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator 20:49, 1 February 2012 (UTC)

Idea dump: HTTP-based lightweight jobs, async job runner

 * jobs can have a handful of priorities
 * each job has an insertion timestamp
 * default job is just a method (GET/POST), an URL and a timestamp, which is
 * deduplicated
 * run directly by a job runner
 * only uses minimal (http client) resources on the job runner
 * uses normal http status codes for error handling

The low overhead might make it feasible to do full url-based deduplication. Doing HTTP requests is fairly generic and can automatically run a request in the context of a given wiki. Jobs would basically be executed with GET or POST requests to some API end points, which are assumed to be idempotent. Semantically POSTs might be better, but POST data in the job queue would need to be limited.

For HTTP, the job runner would limit the number of concurrent requests per host to something like 50-100. SPDY would reduce the overhead in terms of connections.
 * How does "url-based" deduplication work? You mean include a hash in the URL? Why not have it in the payload? Aaron (talk) 00:09, 27 November 2013 (UTC)
 * On further reflection, it might actually be more efficient to make HTTP requests idempotent and cheap wherever possible at the end points. That way jobs would be executed, but would usually not result in any serious duplicate work. Keeping that state at the end point frees the job queue from needing to keep track of it, and also catches duplicate requests from other sources. -- Gabriel Wicke (GWicke) (talk) 22:16, 7 January 2014 (UTC)