Offline content generator/Architecture

General Overview
Every render server runs a Node.js instance with several spawned children, a frontend, a client, and a garbage collector. Rendered files are stored locally on the server (or we could store them on a central file store; but locally is easiest for now.) Any render server can respond to a frontend request, redirecting the request to another server if required.

Long running requests like render jobs are pushed to a backend redis queue that the clients will pull from. Everything has a unique job ID which exists as a persistent key on the redis server. Only clients (and the garbage collector when deleting) may write to the job ID key once created by the frontend.

Render Frontend

 * Responds to public HTTP requests (any render server can handle a request)
 * bookcmd=render Places new jobs (and the job metadata) into Redis
 * Creates the job IDs by preprocessing the render request into (format, [(title, revid), ...]) and SHA-ing that string
 * bookcmd=download Returns completed documents to mediawiki
 * Or does a 302 redirect to the server that does have the content if this server did not render it
 * bookcmd=render_status Queries the redis server for the current status of the job
 * bookcmd=zip_post Push a render job to redis with a special 'zip' format? Which will then also have the client upload the results?

Render Client

 * Takes jobs when free from Redis
 * Pulls each title from Parsoid
 * And all additional resources like images
 * Formats the DOM
 * Runs pages through latex/phantomJS
 * Does final PDF compositing of all parts
 * Saves the final PDF into a local/remote disk
 * Updates the redis entry for the job when complete and in progress

Garbage Collector
Every so often
 * Go through all keys in the redis server and remove old jobs / files (older than 7 days?)

Redis

 * Has a pending job queue that a list of job IDs awaiting a render client
 * Should also contain the render request


 * Each job ID has an persistant entry keyed on the ID with a JSON blob for contents. The blob contains
 * Date last updated
 * The current job status (pending, running, completed...)
 * Some substructure containing 'percentage complete', 'current title', etc
 * The render server responsible for the job