Offline content generator/print-on-demand service

Order process
 array(9) { ["metabook"]=> string(2611) "{"type":"collection","licenses":[{"type":"license","name":"License","mw_rights_icon":"//creativecommons.org/images/public/somerights20.png","mw_rights_page":null,"mw_rights_url":"//creativecommons.org/licenses/by-sa/3.0/","mw_rights_text":"Creative Commons Attribution-Share Alike 3.0"}],"title":"1933 Atlantic hurricane season","subtitle":"","items":[{"type":"chapter","title":"Introduction","items":[{"type":"article","content_type":"text/x-wiki","title":"1933 Atlantic hurricane season","latest":579329188,"revision":579329188,"timestamp":"1383073948","url":"http://en.wikipedia.org/wiki/1933_Atlantic_hurricane_season","currentVersion":1}]},{"type":"chapter","title":"Storms","items":[{"type":"article","content_type":"text/x-wiki","title":"1933 Trinidad hurricane","latest":550867472,"revision":550867472,"timestamp":"1366230514","url":"http://en.wikipedia.org/wiki/1933_Trinidad_hurricane","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Texas tropical storm","latest":572402424,"revision":572402424,"timestamp":"1378848251","url":"http://en.wikipedia.org/wiki/1933_Texas_tropical_storm","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Chesapeake\u2013Potomac hurricane","latest":579390421,"revision":579390421,"timestamp":"1383099157","url":"http://en.wikipedia.org/wiki/1933_Chesapeake%E2%80%93Potomac_hurricane","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Cuba\u2013Brownsville hurricane","latest":572402369,"revision":572402369,"timestamp":"1378848216","url":"http://en.wikipedia.org/wiki/1933_Cuba%E2%80%93Brownsville_hurricane","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Treasure Coast hurricane","latest":575674769,"revision":575674769,"timestamp":"1380856363","url":"http://en.wikipedia.org/wiki/1933_Treasure_Coast_hurricane","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Outer Banks hurricane","latest":576090149,"revision":576090149,"timestamp":"1381119599","url":"http://en.wikipedia.org/wiki/1933_Outer_Banks_hurricane","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Tampico hurricane","latest":567480201,"revision":567480201,"timestamp":"1375840931","url":"http://en.wikipedia.org/wiki/1933_Tampico_hurricane","currentVersion":1},{"type":"article","content_type":"text/x-wiki","title":"1933 Cuba\u2013Bahamas hurricane","latest":578977949,"revision":578977949,"timestamp":"1382894555","url":"http://en.wikipedia.org/wiki/1933_Cuba%E2%80%93Bahamas_hurricane","currentVersion":1}]}]}" ["base_url"]=> string(25) "http://en.wikipedia.org/w" ["script_extension"]=> string(4) ".php" ["template_blacklist"]=> string(32) "MediaWiki:PDF Template Blacklist" ["template_exclusion_category"]=> string(16) "Exclude in print" ["print_template_prefix"]=> string(5) "Print" ["print_template_pattern"]=> string(8) "$1/Print" ["pod_api_url"]=> string(38) "http://pediapress.com/api/collections/" ["command"]=> string(8) "zip_post" }
 * User's browser gets ?title=Special:Book&bookcmd=order_collection&colltitle=???&partner=pediapress or posts to Special:Book with bookcmd=post_zip to order the current book.
 * Extension calls buildJSONCollection to build some metadata, and posts to mw-serve
 * Example (PHP var_dump style):
 * mw-serve posts to pediapress to get an endpoint URL, then schedules a "post" job in its queue
 * The "post" job (grep for "rpc_post") itself queues a "makezip" job and then waits for it to complete
 * The "makezip" (grep for "rpc_makezip") writes the "metabook" data to a file and then shells out a mw-zip command to actually make the zip file.
 * The mw-zip command (implemented by buildzip.py) appears to create a temporary directory, put some files in it, then zip it up. The fetching of the files goes something like this:
 * If the book contains pages from multiple wikis, it will create a subdir for each wiki. Otherwise, it just uses the base directory.
 * For enwiki, it tries to find a "trusted" revision using wikitrust.
 * If it was a multiple-wiki, it writes another metabook.json with the "overall" data and an nfo with a format of "multi-nuwiki" (no other keys).
 * Now the "post" job resumes. It reads the zip file, and posts it to pediapress.

Zip file contents
We're still working on documenting this.
 * metabook.json
 * Containing some version of the "metabook" data. For multi-wiki zips, the per-wiki file will contain just the "items" key with articles for this wiki. This (and nfo.json) are basically the input to the spidering process.


 * nfo.json
 * JSON object with three key-value pairs: "format" being "nuwiki", and "base_url" and "script_extension" copied from the data posted to mw-serve. This is the metadata information needed to allow the spider to make API requests to the appropriate wiki.


 * siteinfo.json
 * The output from the API's action=query&meta=siteinfo&siprop=general|namespaces|interwikimap|namespacealiases|magicwords|rightsinfo


 * licenses.json
 * Containing JSON license data for articles. Is this redundant with rightsinfo in siteinfo.json?


 * redirects.json
 * Containing information on redirects. used to resolve internal links?


 * authors.db
 * sqlite database containing author info. Keys are mediawiki titles (eg, ,  ) and the value is a JSON-encoded array of mediawiki usernames (eg,  ).  Note the presence of "ANONIPEDITS:&lt;number&gt;" which notes how many anonymous editors' IP addresses have been elided from the list.


 * html.db
 * sqlite database containing output from action=parse. Keys are revision ids.  Values are the output as a JSON structure.


 * parsoid.db
 * Experimental addition: parsoid parser output, equivalent to html.db


 * imageinfo.db
 * sqlite database containing image info. Keys are mediawiki titles (eg, ).  Values are JSON-encoded objects (xxx: what API call generates this?) such as


 * revisions-1.txt
 * File containing multiple records of json data. Includes the output of action=expandtemplates for all pages in the book, some other API queries for pages in the book, and image pages for images in the book, possibly among other things. There appears to be no indication of the original queries, just the data.


 * images
 * Directory containing images. Filenames are from MediaWiki with localized "File:" prefix, with tildes replaced with "" and all non-ASCII characters plus slash and backslash replaced with "~%d~" where %d is the Unicode codepoint for the character.

Sqlite DB format
All sqlite databases have a single table named  with the following schema: CREATE TABLE kv_table (key TEXT PRIMARY KEY, val TEXT); That is, they are simple key/value maps. The keys and values are described above.