Talk:Requests for comment/Job queue redesign

If this redesign was carried out, please add a note saying that this is the new design as of version # 1.xxx. Thanks.

Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator 20:49, 1 February 2012 (UTC)

Idea dump: HTTP-based lightweight jobs, async job runner

 * jobs can have a handful of priorities
 * each job has an insertion timestamp
 * default job is just a method (GET/POST), an URL and a timestamp, which is
 * deduplicated
 * run directly by a job runner
 * only uses minimal (http client) resources on the job runner
 * uses normal http status codes for error handling

The low overhead might make it feasible to do full url-based deduplication. Doing HTTP requests is fairly generic and can automatically run a request in the context of a given wiki. Jobs would basically be executed with GET or POST requests to some API end points, which are assumed to be idempotent. Semantically POSTs might be better, but POST data in the job queue would need to be limited.

For HTTP, the job runner would limit the number of concurrent requests per host to something like 50-100. SPDY would reduce the overhead in terms of connections.
 * How does "url-based" deduplication work? You mean include a hash in the URL? Why not have it in the payload? Aaron (talk) 00:09, 27 November 2013 (UTC)