Offline content generator/Architecture

General Overview
Every render server runs a Node.js instance with several spawned children, a frontend, a client, and a garbage collector. Rendered files are stored locally on the server (or we could store them on a central file store; but locally is easiest for now.) Any render server can respond to a frontend request, redirecting the request to another server if required.

Long running requests like render jobs are pushed to a backend redis queue that the clients will pull from. Everything has a unique job ID which exists as a persistent key on the redis server. Only clients (and the garbage collector when deleting) may write to the job ID key once created by the frontend.

Render Frontend

 * Responds to public HTTP requests (any render server can handle a request)
 * bookcmd=render Places new jobs (and the job metadata) into Redis
 * Creates the job IDs by preprocessing the render request into (format, [(title, revid), ...]) and SHA-ing that string
 * bookcmd=download Returns completed documents to mediawiki
 * Or does a 302 redirect to the server that does have the content if this server did not render it
 * bookcmd=render_status Queries the redis server for the current status of the job
 * bookcmd=zip_post Push a render job to redis with a special 'zip' format? Which will then also have the client upload the results? This would be used if we needed this service to natively support the PediaPress intermediate format.

Render Client
The render pipeline has three broad stages; getting the job from redis, spidering the site to produce an intermediate file with all resources, and then rendering the output.
 * Takes jobs when free from Redis
 * Spidering
 * Pulls each title from Parsoid
 * Process all downloaded RDF for external resources like images
 * Download resources
 * Rewrite RDF to point to the local resource (i don't think rewriting is necessary, the renderer can do that if needed. cscott (talk) 16:36, 14 November 2013 (UTC))
 * Rendering
 * Process the RDF as required for output format
 * Runs pages through compositor like latex/phantomJS producing intermediate pages
 * Perform final compositing of all parts (add title page, table of contents, page numbers, merging intermediates, etc)
 * Saves the final file into a local/remote disk
 * Updates the redis entry for the job when complete and in progress

Garbage Collector
Every so often
 * Go through all keys in the redis server and remove old jobs / files (older than 7 days?)
 * Also clean up intermediate results and output PDFs?

Redis

 * Has a pending job queue that a list of job IDs awaiting a render client
 * Should also contain the render request


 * Each job ID has an persistant entry keyed on the ID with a JSON blob for contents. The blob contains
 * Date last updated
 * The current job status (pending, running, completed...)
 * Some substructure containing 'percentage complete', 'current title', etc
 * The render server responsible for the job

Intermediate Format
csa: my current plan is to use PDF_rendering/print-on-demand_service for this.