Topic on Talk:Offline content generator/Architecture

What is the Round Robin (and why do we need a garbage collector)

2
Mwalker (WMF) (talkcontribs)

Gabriel has suggested we use a similar front end to parsoid; e.g. use Varnish. This would also offload the caching to varnish layer for bookcmd=download requests.

Basically it would look like:

                            +------------------------------+
                            | Bi layer varnish boxes       |
MediaWiki --> LVS Boxes --> |  Frontend -> CARP -> Backend | --> LVS Boxes --> Render Servers
                            +------------------------------+

We could set a cache control of a couple days on rendered output; use the standard varnish LRU purge; and manually issue a purge when we get a forced render come in from MediaWiki.

-- However --

Because we wish to be backwards compatible with the old setup; we must always first issue a 'render' command to the backend. The only way to then know if something has been rendered is if it can find it in Redis. So we'll still have to garbage collect the Redis stuff...

GWicke (talkcontribs)

Adding the capability to speak HTTP to the collection extension might not be that hard and would not break the existing interface.

For client-side 'async' rendering without PHP timeout concerns, do an ajax HEAD request to kick off the parse. Then do some ajax polling with HEAD and Cache-control: only-if-cached header set. Reveal the link when the request is ready.

Alternatively, kick off the render as above and reveal link to the file using an external IP for the PDF service (no short PHP timeout). Varnish will then collapse the requests for you. The spinning can still be done here as well.

A nice feature about using Varnish is that you get

  • timeouts
  • bounded queuing
  • quick render start

basically for free.

phantomjs is also not as slow as mwlib. The HTML5 spec renders to 508 A4 pages in about 20 seconds. It has few and low-resolution graphics, but it does not seem to be inconceivable that even super-large books with 150dpi images will render in less time than even the PHP timeout.

Reply to "What is the Round Robin (and why do we need a garbage collector)"