Topic on Project:Support desk

Automated PDF export of certain categories/pages (pdfbook/wget/curl/?)

3
Reznuh (talkcontribs)

Hi all,

We run MW 1.22 with Pdfbook & LDAP authentication and everything runs great. Lately something came up where we want to export/save a PDF file of a certain category daily or weekly or whatever. I have tried many different combinations of wget & curl commands through Googling with no luck so far. I am pretty sure it comes down to having to log in and cookies (although some of the scripts I have tried did successfully save cookies to a file, whether through Special:UserLogin or MW's API). Many of these scripts/commands have looked correct, but never any success - always just 0 byte files or ~10k byte files that are just the wiki's heading/main menu code as HTML. Has anyone successfully done something like this or have any other ideas?

TL;DR: Basically we want to hit something like this: http://hostname/wiki/index.php?title=Category:People&action=pdfbook automatically through a script and save the PDF file generated elsewhere. The problem seems to be with authentication to MW.

On a side note, has anyone successfully excluded certain pages from needing authentication when it is otherwise required site-wide?

THANKS!

Reznuh (talkcontribs)

An update - I ended up using DumpHTML to dump the entire site, pulled copies of the pages containing the vendor tag, sorted them alphabetically & pulled the mw-body content using hxselect, dumped all of this into 1 html file and then used wkhtmltopdf to convert it to a PDF file and also generate a table of content. There were a few other tweaks needed (print the file name between each div for correct section labeling, etc) but all in all everyone is happy.

88.130.112.219 (talkcontribs)

Woah, sounds like a huge task! Thanks for posting the update!

Reply to "Automated PDF export of certain categories/pages (pdfbook/wget/curl/?)"