User:Bmansurov (WMF)/Alternative way of generating PDF books

From mediawiki.org

Extension:Collection allows users to create books from wiki pages. Proton PDF Generator (PCG) is a back-end for the extension which allows downloading books in the PDF format.

Generation of the PDF file is done via Extension:ElectronPdfService. Extension:Collection is the glue between these two services.

Before coming up with this solution, we also looked at `wkhtmltopdf` and `Vivliostyle` - both of which didn't fit our needs for a reason or another (see below for links).

Bird's eye view[edit]

Our plan for creating this back-end is as follows (that steps that took us here have been omitted --- see the related links section below for context):

Proton PDF Generator[edit]

PPG is a yet to be built service that does two things:

  • creates a single HTML file from a list of articles (we'll call this 'concatenation' for short)
  • adds the table of contents and page numbers to a PDF file (we'll call this 'PDF post-processing' for short).

See the below sections for more details.

HTML concatenation[edit]

PDF post-processing[edit]

Once the HTML is concatenated, it's fed to Extension:ElectronPdfService which outputs a PDF file. We modify the PDF file to add page numbers and the table of contents with page numbers using a Python library `pdfrw`. We also looked at PHP libraries but nothing comparable to `pdfrw` was found. See https://phabricator.wikimedia.org/T168871 for more details.

Related Links[edit]