Extension talk:PdfBook/Archive

PdfBook images fix for htmldoc
At about line 101 in  just before

// Write the HTML to a tmp file

insert the following:

$src_str = 'src="' . $wgServer . '/images/';   // Use this instead if not at the webroot    // $src_str = 'src="'. $wgServer. '/' . $wgScriptPath. '/images/'; $html = str_replace($src_str, 'src="', $html);

The  urls should be relative to images folder for images to display correctly in the pdf generated.

The following is a sample of the CLI command that gets generated and executed in the PdfBook extension:

This has been tested in MW v1.19.24 on Debian Jessie and htmldoc v1.18.27.

PdfBook seems not to be working
You can export a single article as a one-page PDF by setting format=single in the query-string. Example:

http://www.foo.bar/wiki/index.php?title=Main_Page&action=pdfbook&format=single

When I do this I get the message :

Main Page&action=pdfbook&format=single There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page.

Whereas Main Page has a lot of text. ;-) What am I overlooking? Installed the latest PdfBook (according to Special:Version 1.1.0, 2014-04-01 in MW 1.23.13. But the 1.1.0 comes from PdfBook dated Jan. 9th, 2016 and the corresponding file 'version' says:

[root@node PdfBook]# cat version PdfBook: 17d1dfd8475ac21b81a60c3f82afe58fde9d47bb

2016-01-09T23:07:34

17d1dfd

htmldoc has been installed.

Missing Images in Https with authentication
--Johnp125 13:52, 11 June 2008 (UTC)

I have a problem when downloading to a pdf file I do not get the pictures when I am downloading via https location. When I download via http the pictures show up. Any idea why this would be the case?
 * Our SSL doesn't seem to be functional at the moment, but can you check the URL its trying to load the images from? maybe check if exporting as raw html instead of pdf also has problem images? --Nad 07:11, 12 June 2008 (UTC)

--Johnp125 15:25, 12 June 2008 (UTC)

I can get the html file to show the pictures, but it wants to register again with the authentication server, or ie says allow blocked content, which I click on and try and sign on again. The server is not allowing me to sign on again, but the pictures are still showing up.

--Johnp125 14:57, 29 July 2008 (UTC)

I think the problem may be a security issue. Is there a way to generate the data without requesting authentication from the web server? I can get the html version to show the pictures just not the pdf version. If I go to the back door and access the site via http then the pictures show up via pdf.

Sdball 17:36, 6 November 2008 (UTC)

I had the same problem, so I tweaked the extension to:
 * use files in /tmp so image references can work
 * search the generated html for images
 * determine the actual path to the image from their url
 * i.e. https://server.com/wiki/images/a/b/image.jpg -> /www/wiki/images/a/b/image.jpg
 * use that path to copy the image file to /tmp
 * modify the generated html to point to the image file, not the absolute url
 * i.e. src= https://server.com/wiki/images/a/b/image.jpg -> src=image.jpg

Feel free to contact me if you'd like the code.

74.143.96.50 20:36, 12 February 2010 (UTC)

Same problem here, different solution. htmldoc uses a unique User-Agent string when hitting the web server. With Apache you can do something like this in your apache config:

SetEnvIf User-Agent ^HTMLDOC let_me_in .. basic auth stuff .. require valid-user Order allow,deny Allow from env=let_me_in Satisfy Any

note that this is a significant security hole since anyone hitting your server with that User-Agent string can now get in. You may want to combine (or possibly replace) it with a filter based on the IP address that htmldoc is hitting your server from. If the request always come from 127.0.0.1 for instance, you can Allow from 127.0.0.1 to let it pass. You could also change the htmldoc binary to use your own special useragent string. MedaWiki shouldn't care that the user is anonymous unless you've forced off anonymous access somehow besides the authentication.

Requests: page number to start with / link in the toolbox menu

 * is it possible to specify the page number to start with? This makes sense when you are going to use the exported PDF as appendix to another doc already with n pages.


 * is it possible to add a link in the toolbox menu section which is only viewable on categories pages?

Sure. Add something like this:

require_once( 'PdfBook.i18n.php' );
 * 1) Get i18 file

$wgHooks['MonoBookTemplateToolboxEnd'][] = 'fnPDFBookLink';
 * 1) Create toolbox link

function fnPDFBookLink( &$monobook ) {   global $wgMessageCache, $wgPdfBookMessages; foreach( $wgPdfBookMess

ages as $lang => $messages ) {   	$wgMessageCache->addMessages( $messages, $lang );    }    $thispage = $monobook->data['thispage']; // e.g. "Category:Wiki"    $nsnumber = $monobook->data['nsnumber']; // NS 14 is category

if ( $nsnumber == 14 ){ echo "\n\t\t\t\t"; $monobook->msg( 'pdf_book_link' ); echo "\n"; }   return true; } And add a i18n file named PdfBook.i18n.php with the following contents:  'Pdf-Druck',        'pdf_book_link' => 'Kategorie als PDF ausgeben' ); $wgPdfBookMessages['en'] = array(       'pdfbook' => 'PdfPrint',        'pdf_book_link' => 'Print category as PDF' ); ?>


 * Does anyone know what the code would be to add the link into the sidebar for the vector skin?
 * This worked brilliantly for the Monobook skin, but i want to use it on the Vector Skin in the Toolbar. If you have the code please let me know, thanks guys. Nali99.


 * Found the solutions bascially use the above code exactly but replace 'MonoBookTemplateToolboxEnd' with 'SkinTemplateToolboxEnd' and also replace '$monobook' with $vector. Your code should look like this:

$wgHooks['SkinTemplateToolboxEnd'][] = 'fnPDFBookLink';
 * 1) Create toolbox link

function fnPDFBookLink( &$vector ) {   global $wgMessageCache, $wgPdfBookMessages; foreach( $wgPdfBookMess

ages as $lang => $messages ) {   	$wgMessageCache->addMessages( $messages, $lang );    }    $thispage = $vector->data['thispage']; // e.g. "Category:Wiki"    $nsnumber = $vector->data['nsnumber']; // NS 14 is category

if ( $nsnumber == 14 ){ echo "\n\t\t\t\t"; $vector->msg( 'pdf_book_link' ); echo "\n"; }   return true; }

By Nali_99

Mediawiki 1.11.0
Version 0.0.3 didn't work anymore after an upgrade. I made a little fix to PdfBook.php around line 98 of PdfBook.php and it works again.

// while ($row = mysql_fetch_row($result)) { while ($row = $db->fetchRow($result)) {

Disclaimer. I don't know PHP for real, don't know mediawiki, don't know how to program. Just got it by inserting debug statements into PdfBook.php. Looks like mysql_fetch is censored somewhere now ;)

PS: To insert debug statements: $wgDebugLogFile = "/tmp/debug.log"; // file should be writable can be anywhere. wfDebug (.....);
 * In LocalSettings.php insert:
 * Anywhere in the code, insert

- Daniel (edutechwiki.unige.ch)
 * Thanks a lot for this, it's still not working for me in 1.11 (I've only just done my 1.11 upgrade), but I've made some changes based on your findings which have got it partially there ;-) --Nad 21:36, 21 September 2007 (UTC)
 * It seems that 1.11 is a bit more memory hungry and my large test books were killing it, after giving PHP 64MB it's working fine now! --Nad 21:41, 21 September 2007 (UTC)

Empty file downloaded
Greetings Nad,

I have been trying to use your PDFBook Mediawiki extension since it may be a great solution to an issue I have.

I have installed HTMLDoc under "c:\pogram files" and can use it on its own to create PDF Books. I have also included the "PdfBook.php" in my "Local Settings.php" file.

The issue I am having is that when I select the link to export my category as a book and select to save or open the pdf file it has 0 bytes. So, the file is created with the correct name but with no data.

Is there something else I must do to ensure HTMLDoc.exe is actually being called by your extension? Is there a required directory that it needs to be in?

Any help would be appreciated!

Thanks!
 * You have to make sure that htmldoc is in your executable PATH so that it can execute from just typing "htmldoc" without needing to supply the full pathname no matter what current directory you're in. Another thing to check would be to comment out the "@unlink($file)" line and after saving a pdf, check if it's left a tmp file in the root of your images directory, which is the data sent to htmldoc. --Nad 00:35, 6 September 2007 (UTC)


 * I'm experiencing the exact same problem, my files turns up empty. I run the server on a windows machine using Apache. I've installed HTMLDoc and I'm able to create PDF-files using the GUI. If I comment out "@unlink($file)" and then generates the tmp-file through the GUI I'll get my pdf, but all files I download are 0 byte in size... What can be wrong? /Jesper 15:59, 23 October 2007 (UTC)
 * With some hacking of Pdf_Book.php I'm now able to create PDF:s, but only from categories, not from a single page. By commenting out "putenv("HTMLDOC_NOCGI=1");" on line 152 it now generates Category PDF:s. /Jesper 08:09, 25 October 2007 (UTC)
 * I can't even get this far. Did you make any changes other than commenting out that one line? Has anybody else gotten this to work on an Apache Server running on Windows? -Michelle 19:19, 1 May 2008 (UTC)
 * Works for me with WAMP and MediaWiki 1.3. I had to copy libeay32.dll and ssleay32.dll from C:\wamp\Apache2\bin to C:\Program Files\HTMLDOC in order to get HTMLDoc working. I also had to restart Apache to make it refresh the PATH environment variable. Before restart it couldn't find HTMLDoc.
 * I also had to copy the 7.1 C dll (msvcr71.dll/msvcp71.dll) to the HTMLDOC folder. You can find it here: http://support.microsoft.com/kb/326922 Antdos (talk) 10:11, 19 July 2012 (UTC)
 * Make sure your webserver user has write access to /var/tmp. On my setup, htmldoc uses this as a tmp directory. You can diagnose this sort of issue by changing the htmldoc command to something like strace htmldoc > $file.log
 * For macOS 10.12: HTMLDOC is installed to /usr/local/bin. If you are using the builtin apache server this directory won't be in the PATH, so /usr/local/bin/htmldoc could not be found by the pdf book extension. Follow the steps outlined in https://serverfault.com/a/827046/434690 --Frankhintsch (talk) 10:07, 8 September 2017 (UTC)

Greetings to anyone who finds this - I was having the same problem, and I'm a total nub at wiki, and after editing PdfBook.php and debugging the sql statement that gets ran, my problem was that, I had to actually add categories, so I just edited a few pages and added and then it worked for me. Hope someone finds this useful. Maybe as a patch sometime in the future there could be code that checks if the $article[] array is empty before headering out to the pdf file, or checking to see if the tmp file it writes is 0 bytes, and then echoing an error message instead of the pdf file. Just a thought.

Invalid PDF File
Nad,

Thanks for your quick response!

However, I am still having issues. The File is being created and has size to it....but Adobe Reader gives me the following error."

"There is an error opening this document. This file is damaged and cannot be repaired".

HTMLDoc seems to be quitting during the conversion job.

If I add the ".html" extension to the temp file and run HTMLDoc from the command line I can convert the temp html file manully over to a PDF file.

I then compare in Notepad the one I generated and the one your script creates and notice the PDF your script creates quites after pocessing a certain amount of lines.

I have your PDFExport Extension working just fine...so I was wondering what else it could be.

Any ideas?

Thanks!
 * How long is it taking to generate the PDF before quitting? 30 seconds? if that long it could be reaching max execution time? and how large is the PDF before it bails? --Nad 20:29, 6 September 2007 (UTC)

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

Like I stated before, I'm using your PDFExport Extension and it works great.

Let me know what you think      --136.182.158.153 21:29, 6 September 2007 (UTC)
 * When you run htmldoc manually passing the generated tmp file to it, are you using the exact same command and parameters that the extension uses? --Nad 21:51, 6 September 2007 (UTC)

continued...

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

If I change this line;

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file";

to

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file > test.pdf";

Then I get a test.pdf in my mediawiki root folder which works perfectly


 * You could try changing the htmldoc command to use passthru like Extension:Pdf Export - I had it like that on mine but had problems with the gzip encoding, but it may work better like that for you --Nad 21:55, 6 September 2007 (UTC)

images in the pdf Book?
Is there any possibility of getting images displayed in the pdf Book as well?. would be a fantastic improvement. Any workarounds? Martin
 * I'm working on it, I just can't get them to work currently. I'm checking out some of the solutions at Extension talk:Pdf Export too as that one uses htmldoc as well. --Nad 12:39, 12 September 2007 (UTC)

Nad, thanks for your great work. I made some fixes to your extension and got it to work correctly with images, even with secure server without modifying .htaccess.

The points are:
 * when generating html output only, links to images could stay absolute as currently.
 * when generating pdf output, links to images should be converted to relative links to the temp file (pdf-book-something in $IP/images)
 * --browserwidth could be a workaround when you have only large images, but would make your small images too small when your image sizes varying a lot. My solution is to rescale large images to fit in the page (pick up image width and height from html output, if they are too big for the paper size, then adjust width="x%", x depending on the ratio width/maxWidth and height/maxHeight.

Hope this helps. Just tell me if you'd want me to send you my codes. Lechau 02:20, 6 June 2008 (UTC)

A hack
In file  around line 101 (I may have inserted other stuff) just before "#write the HTML to a tmp file" insert this:

$ori_string = 'src="'; $repl_string = 'src="'. $wgServer; $html = str_replace ($ori_string, $repl_string, $html);

The problem is that the intermediary output file got stuff like this: src="/mediawiki/images/thumb/pict.png but you want: httpee://your.server.org/mediawiki/images/thumb/pict.png
 * 1) Write the HTML to a tmp file

This is not the best solution, a regexp hacker should actually rip away most of the html picture markup and then replace the thumb by the original pic maybe. But above is at least a minimal job. To see the intermediary file as someone said, comment the unlink at the end and the get it from the images file. //@unlink($file);

Sorry, I'm not a real programmer and have too much workload to help for real. Just wanted to produce some handouts ;) - Daniel

only border and image link is displayed (mw 1.16.4, PHP 5.2.17 (cgi-fcgi))
I did not find the often mentioned ./images folder. Only the images folder in the wiki root. Any ideas?

Same problem as section 2
I'm on Ubuntu Linux with Mediawiki 1.10. Htmldoc is in /usr/bin. I commented out the unlink command, and the temp file is empty (0 length).

I checked to be sure that my Apache user can run htmldoc -- it can. Unsure what I should try next.

By the way, your single-page export plugin works perfectly (even for images). So I know that htmldoc is not at fault here.
 * I didn't write the single page one, but the code seems pretty similar. I'll just have to see what differences there is in the code between this one and the single-page one. --Nad 22:28, 14 September 2007 (UTC)

Upload filetype
What happens when pdf is not a valid file type when uploading? Does the wiki control this with this extension, if so do I need to add pdf file types to the type of files you can upload?
 * The upload filetype is unrelated to this since exported pdf's are downloaded not uploaded. If you want to add pdf to your allowed upload filetypes, use, you may also want to set $wgVerifyMimeType to false if it's giving you hassles when you try and upload exotic types of file. --Nad 04:11, 21 September 2007 (UTC)

More empty file downloads
--Johnp125 02:12, 25 September 2007 (UTC)

Sorry to be such a pain. I have setup a test wiki which is running fedora --Johnp125 00:23, 27 September 2007 (UTC)c 4. Please check out my test wiki and see if you can give me some direction. I have debug for the wiki in localsettings.php on. If you need admin access please email me at johnp125@yahoo.com and I'll hook you up.

http://wikitest.homelinux.net/wiki2/index.php/Main_Page
 * The output shows a bug due to 1.11 being more strict about hook return values. Try again now with the latest version, 0.0.4. Also note that even if it works, you will get just an empty document since the point of this extension is to compose a book from the content of a category, if it not placed in a category or the category contains no members then the result will be empty. To export the content of a single page you should be using Extension:Pdf Export. --Nad 03:33, 25 September 2007 (UTC)
 * However, I'm working on version 0.5 now which can be used in non-category pages and will compose the book from the article links found in the page, so that books can then be composed from explicit lists or DPL queries. --Nad 03:33, 25 September 2007 (UTC)

--Johnp125 13:28, 25 September 2007 (UTC)

Hey that sound great I'd love to help you with it.

You mentioned single page. I had 2 types of pdf downloads there.

http://wikitest.homelinux.net/wiki2/index.php?title=Category:test&action=pdfbook this one should be going after the demo page with the catageory:test and then creating a pdf book from that. Is this not the right way to use the code? I know if I created more pagese and put the catageory:test under them they would get put into the pdf file as well.
 * You had a typo in the word "category", link is working now ;-) --Nad 22:21, 25 September 2007 (UTC)

--Johnp125 17:30, 26 September 2007 (UTC)

Thanks a bunch. Your the greatest. Glad to have this working now.

Checked out your info about Images not showing in mediawiki 1.10.2---1.11. Nice work.
 * I just did another update yesterday which has images working now --Nad 21:06, 26 September 2007 (UTC)

--Johnp125 00:16, 27 September 2007 (UTC)

Is this the update that is going to work with DPL queries? I started to play around with that extension. I know it's working but right now it's too big to try and figure out.

--Johnp125 00:23, 27 September 2007 (UTC)

Hey by the way could you tell me how to make the pdfbook extension just make a big html file, so I could open it in word or openoffice in html format and let the office program convert it from the html file? Or is it easier to say and harder to do?
 * That feature is very easy to add because it simply requires not sending the file to HTMLDOC, I've added an option in a new version (0.0.7) which allows you to do this by adding format=html to the query-string. --Nad 02:06, 27 September 2007 (UTC)

--Johnp125 22:04, 30 September 2007 (UTC)

Wow that sounds great can't wait to try out the html export. I looked for the 0.0.7 version but only saw the 0.0.6 version when I went to the download section. Also could you give me a example of how the format=html is used.

http://www.foo.bar/wiki/index.php?title=Catgeory:Foo&action=pdfbook

Where would it go in this string?
 * Sorry about that I must have forgotten to update it, it's at 0.0.7 now. To change the URL above to produce html, append &format=html to it. We use a template which has a link for both, see OrganicDesign:Template:Book. --Nad 07:11, 1 October 2007 (UTC)

--Johnp125 01:55, 2 October 2007 (UTC)

The html export looks really good. I Did notice on small html files Microsoft word gets confused about it. Maybe if you put the html header info at the top and bottom of page to help microsoft word out. Openoffice did not seem to have a problem with it. However word is looking for the html tags on small exports. If it's a big export it gets the idea.

--Johnp125 02:08, 2 October 2007 (UTC)

Just tested it again with a small html download. Word tried to format it when opening. Then I added the to the beginning and then added the at the end. Then reopened the file with word and bingo it worked fine. Maybe something to add in 0.0.8? Openoffice worked either way.

Keep up the good work. This is the best extension for wiki out there right now.

If you have larger text, don't forget to change server settings. E.g. for a 2000 page document produced with a low-end 2CPU sparc box I use this in php.ini: max_execution_time = 600 max_input_time = 600 memory_limit = 100M and this in http.conf: Timeout 600

Else you just get a blank page without any warning or error message - Daniel K. Schneider 11:00, 20 June 2008 (UTC)

Hacks to change PDF output (v. 0.6)

 * Images: If they don't fit your PDF page, you have to set pixel width of a virtual browser page (that's a "feature" of htmldoc). By default it is 680 pixels only and images larger than that will be rendered larger than your PDF page! Lots of my pictures are...
 * Titlepage: If you want a standardized titlepage before the TOC, create it in HTML and put it somewhere in your file system. I just put it in the images directory.

Then change PdfBook.php like this for example: $cmdext = " --browserwidth 1000 --titlefile $wgUploadDirectory/PDFBook.html"; $cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $cmdext $file"; Basically, I found it a good idea to read the htmldoc manual. In my Unix system it sits in /usr/local/share/doc/htmldoc/htmldoc.pdf. (see chapter 8). Made other changes too.

Now of course Nad may at some point add some more options, but changing a line in the php file does it too :) - Daniel (edutechwiki.unige.ch)

PdfBook Error Solution....for me at least
Nad,

I ended up creating an additional temp file which I had HTMLDoc redirect the output to. This was the only way I was able to have it not quit during the process PDF conversion process. I then open the file and read its data back into $content. After doing that I am able succsefully download the complete pdf file.

But I have another question for you.....I have seen a jspwiki which retrieves all the articles for a category and lists them on a page and uses a form to allow you to select which ones you want. It then retrieves the selected articles as one entire book. Is there a way to include a similar form in Mediawiki. Or do you know of a way to use an external html web page to retrieve/send commands like that to Mediawiki?

Thanks,

Dan --136.182.158.145 21:27, 7 September 2007 (UTC)


 * The PDf Book extension will allow exceptions so that not all items in the category are included. It would be possible to have it add items to the selection in the same way. A form could then be used to generate the list from which the book is made. I'll have a think about that though because it's an interesting point you make, that books could be generated from queries rather than just categories... --Nad 22:01, 7 September 2007 (UTC)

Just in case the anonymous above re-reads this: I had the same problem of PdfBook not generating any output, but the solution was simple: make sure that the upload directory (usually ./images) is writeable for the web server process. After I changed that, PdfBook worked okay. Cheers, Lexw 15:30, 5 October 2007 (UTC)

Missing Images in new version
I love this extension I think it is the best thing for wiki right now. However when I use the new pdfbook version 0.7 I am not getting any pictures. All I get is url links to the pictures. This is in the pdf format not the html format. Any Ideas? --Johnp125 20:29, 15 October 2007 (UTC)
 * Do you mean to say that your images were working on the previous version and have stopped working now? I had never had images working until I made some changes in the last version. Do you have a link to an example of a failing image export so I can check out what the problem may be? --Nad 19:32, 17 October 2007 (UTC)

--Johnp125 18:12, 19 October 2007 (UTC)

Sorry for the delayed post. Yes I had images working on the 0.6 version and then on the 0.7 version I am not getting any images in pdf format. I can go ahead and setup my test server real soon and make sure you and I can test both. I think I still have a copy of the 0.6 version I will try it again as well.

--Johnp125 18:23, 19 October 2007 (UTC)

Also I noticed the links are not working just right. For example if I have a document in Category: Testing and it pulls that document, and in that document it has another page that is in Category: Testing as well should the link not take me to the page in the pdf doc? Right now it is refering to the html link not the pdf link. I would think that it should realize that link was pulled by the category and then change the refrence to the pdf location.


 * I have a problem with the pictures, if the wiki needs a http authentication. It seems, that the pictures are iportet from the webserver and not from the file system. Does this the reason for the problems? Proofy 07:54, 29 November 2007 (UTC)

Missing Images and hangs with larger categories
Mistral 13:28, 17 October 2007 (UTC)

We installed on Linux with 128 mb of memory allocated to php. Using the template idea referred to by Organic Design we have tested this and observe the following.

-images are not uploaded. They are copied to the pdf as links to the wiki image -html and pdf output work fine on small categories ( < 10 entries) Output is ready in less than 2 seconds and it looks nice -however for pages with > 25 entries when you press submit to get pdf output the browser hangs and never completes the operation. You need to close the browser to terminate the operation.
 * It should work for large books, our test book on organicdesign is over 250 pages/800KB and only takes a second or two with 64MB allocated. Have you tried saving it as html only then manually running it through htmldoc to see if that's working ok? --Nad 19:41, 17 October 2007 (UTC)

I looked at your book link and the translation to pdf worked great on IE6 with Acrobat. However I do notice that there is not a single image in the book. Is it possible having 2 or 3 images per page in 25 - 30 pages is the problem?

I looked at the translation into html code to see why the images were not showing. I believe this can be fixed easily.

Here is the html output http://wiki.fomportal.comhttp://wiki.fomportal.com/images/9/94/BERalex_Full.jpg

here is what it should read src="http://wiki.fomportal.com/images/9/94/BERalex_Full.jpg" width="262" height="207" />

Do you see the duplication of the site address? ((http://wiki.fomportal.com)) Maybe this a configuration issue?? Mistral 18:03, 19 October 2007 (UTC)
 * I'll check it out soon, your research into the problem should make it a lot easier for me to fix ;-) --Nad 20:40, 19 October 2007 (UTC)
 * I found a bug which was trying to make URL's absolute which were already absolute, see if 0.0.8 works any better --Nad 00:18, 29 October 2007 (UTC)

Missing images due to apache .htaccess restriction
I've just encountered the problem that no images were displayed within the PDF - only their borders. In my case this was caused by .htaccess asking for a password in order to access the wiki folder.

The solution was to add " " and " " to the .htaccess file so htmldoc could access the images for embedding them into the PDF. --^Rooker 12:14, 05 December 2007 (UTC)
 * Be aware that the corresponding IP adress is not always 127.0.0.1. It didn't work for me. So I spend some ours on debugging until I took a look in the apache access.log where I saw that accesses by the local machine where not logged with 127.0.0.1 but with the real IP adress of our server. --Fydel 12:45, 9 January 2009 (UTC)

seblac 28-10-2010 : Other solution for the same problem : For me the problem comes with the inclusion of a NTLM device in APACHE 2 using SSPI module. The only solution was to encapsulate security rules for php files only :



Order Allow,Deny

Allow from All


 * 


 * AuthName "foo access"


 * AuthType SSPI


 * SSPIAuth On


 * SSPIAuthoritative Off


 * SSPIOfferBasic Off


 * SSPIOmitDomain On


 * require valid-user


 * 



SubCategories
I made a structure using categories and subcategories. My goal is to make a complete Quality Manual using MediaWiki. Using PdfBook extension from a categorie page no sub categories are included in the PDF resulted.

Is there any manner to use pdfbook extension to make a book covering sub and subsubcategories?

Regards, Antonio Todo Bom --Todobom 22:50, 28 October 2007 (UTC)
 * Unfortunately not sorry, currently it can only work on a list, deeper levels are only done from heading levels not sub-categories. You may be able to use DPL to create reports of the sub-category and sub-sub-category content which could then be printed as a book. --Nad 00:10, 29 October 2007 (UTC)
 * I'm facing the same problem with the Quality Manual I'm working on. Please let me know if someone solve this problem and I'll do my best to find a solution to this myself.
 * /Jesper 85.89.79.106 12:43, 30 October 2007 (UTC)

Looks like a job for a recursive program call. When we installed this I thought I would be able to have one master category that contained all the other categories and then just go "Save as pdf" but it's not that easy yet. I hope you are able to add this functionality.

Mistral 16:30, 30 October 2007 (UTC)
 * It's not as simple as that - how do the categories and sub-categories names map to heading level? and then how do the headings and subheadings etc in the document map to pdf headings? --Nad 20:57, 30 October 2007 (UTC)
 * I understand the problem... Somehow the new Category should have it's own heading, and if that's the case, all other H1 would become H2 and so on... But, let's ignore that factor and say that you only wants to make a huge PDF Book with all categories, with the same heading levels used today, how to do that?... I tried to use GPL to make it print all articles in a couple of categories and then PDF the category that article was in, but it didn't work... //Jesper 85.89.79.106 08:46, 1 November 2007 (UTC)
 * I doubt I'll be adding the subcategory functionality for some time, if at all, I just have too much other stuff on. There's an example of using DPL to make books from at Creating a PDF book from a DPL query. --Nad 20:40, 1 November 2007 (UTC)
 * Have a look at Extension:Book. The issue is adressed there. It should also be possible to merge both approaches.--Sh4k 12:23, 29 May 2008 (UTC)

Hello there, Nad. Great work you have there. You mean by this last suggestion that we can draw a custom layout? It would be nice that a page describing the layout, like the following, could specify the heading layout: * Page for Section 1 :* Page for Section 1.1 * Section 2

Is it possible? Nuno Tavares 20:45, 12 January 2008 (UTC)
 * I also take the chance to ask you to look at the last line: Section 2. The page in fact is called Page for Section 2 but what is desired to be shown is Section 2, so I think that should also be the section name, when building the PDF. This is specially usefull if you are using namespaces. Nuno Tavares 21:15, 12 January 2008 (UTC)
 * OK, I found a way (a hack, actually) to allow this. In onUnknownAction, just use "$title->getText" instead of "basename($ttext)" Nuno Tavares 21:49, 12 January 2008 (UTC)

Subcategory article shows up in pdfbook export
When creating a subcategory (= assigning a category page to a category), that (sub)category's page also shows up in the export. Either I've overlooked these pages in previous exports, or this behavior was introduced with a more recent version of MediaWiki (I'm currently using v1.12.0, with pdfbook 0.0.9).

I was quite puzzled, so I thought I'd let someone know about this behavior. -- The rooker 09:51, 28 April 2008 (UTC)

Use CSS when exporting to PDF
Hi all. I want to know if there are some way to use CSS when I'm exporting my PDF:s?.. The thing is, I want to make id="toc" invisible instead of having another table of contents in my PDF Books. //Jesper 85.89.79.106 12:57, 31 October 2007 (UTC)
 * I've been looking round for PDF converters which can handle CSS but I can't find any. You'll have to add to remove the toc. --Nad 20:42, 31 October 2007 (UTC)
 * Hmm... But adding removes the table of contents of the page, and as the page is pretty long, I think the users need that one... It would be great if I could make the TOC disappear only in the PDF. //Jesper 85.89.79.106 06:35, 1 November 2007 (UTC)
 * I've been testing some now and by adding:

$ori_string = 'id="toc"'; $repl_string = 'id="toc" style="visibility: collapse;"'; $html = str_replace ($ori_string, $repl_string, $html);
 * After "# If format=html in query-string, return html content directly" the TOC disappears in the HTML file, but I can't get the same thing to work with the PDF. //Jesper 85.89.79.106 07:00, 1 November 2007 (UTC)
 * Good point, it's not useful to have TOC when it's a book which already has a TOC - I've updated it to add a before parsing each article --Nad 07:58, 1 November 2007 (UTC)
 * Ah, Thanks Nad! That was a fast reply and I really appreciate it! //Jesper 85.89.79.106 08:31, 1 November 2007 (UTC)

no index pages
--Johnp125 16:59, 8 November 2007 (UTC)

Is there anyway to run the query and not create any autogenerated index pages or put the index number in the text?

--Johnp125 18:26, 8 November 2007 (UTC)

ok just checked out the new html version .9. This does what I would like it to do. Images work and everything.

I was having problems with the images because we have a alias for the wiki /wiki/index.php when you run the pdfbook to pdf format I think it cannot find the /wiki/picture.jpg instead of /picture.jpg, anyway the new html version works just fine.

Header info
--Johnp125 18:31, 8 November 2007 (UTC)

I know this question is off on a limb but, is there anyway I could select certain Headline text from not being pulled based on the name like Image Header?

Missing end tag in 0.0.9 source code
Just for the record: it seems that the page at Organic Design which lists the v0.0.9 source code is missing a php end tag at the bottom of the file. Cheers, Lexw 09:23, 13 November 2007 (UTC)
 * End delimiters are removed to avoid whitespace being sent to the output - unfortunately I can't find the link to the official bug report about it. --Nad 19:59, 13 November 2007 (UTC)

Additional functionality in PdfBook
Hi Nad, I have added some additional functionality into PdfBook that you might be interested in for a next version. Seems that you have switched off email (which I can understand), so I couldn't contact you that way. Please contact me by email via 'E-mail this user' if you are interested.

Other users: please don't contact me. I might come back to this topic later, first I want to discuss things with Nad.

Regards, Lexw 13:39, 15 November 2007 (UTC)

Added recursive follow functionality
Hi Nad, I'm using your PdfBook Extension and I've added some functionality to recursively follow links to produce a PDF. With the parameter  or   the created PDF will contain all pages that are referenced from the current page, and recursively all further referenced pages, in a depth-first or breath-first manner. Here are the relevant code snippets:

if ($title->getNamespace == NS_CATEGORY) { $cat   = $title->getDBkey; $db    = &wfGetDB(DB_SLAVE); $cl    = $db->tableName('categorylinks'); $result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); }			else if (isset($_REQUEST['follow'])) { $deep = $_REQUEST['follow'] == 'deep'; wfDebug("PdfBook: following links - " . ($deep ? "depth first\n" : "breadth first\n")); $articles[] = $title; wfDebug("PdfBook: adding page '" . $title->getText . "'\n"); $this->getLinkedArticles($articles,$article,$opt,$deep); } else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); }

function getLinkedArticles(&$articles,$article,$opt,$deep) { global $wgParser; $text = $article->fetchContent; $text = $wgParser->preprocess($text,$article->getTitle,$opt); $linktitles = array; wfDebug("PdfBook: - processing article '" . $article->getTitle->getText . "' ($deep)\n"); if (preg_match_all('/\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) { foreach ($links[1] as $link) { $linktitles[] = Title::newFromText($link); wfDebug("PdfBook: found link '" . $link . "'\n"); }		}		wfDebug("PdfBook: processing " . count($linktitles) . " links...\n"); if ($deep) { foreach ($linktitles as $linktitle) { $exists = false; foreach ($articles as $el) { if ($el->getText == $linktitle->getText) $exists = true; }				if (!$exists) { wfDebug("PdfBook: adding '" . $linktitle->getPrefixedText . "'\n"); $articles[] = $linktitle; wfDebug("PdfBook: adding subpages\n"); $art = new Article($linktitle); $this->getLinkedArticles($articles,$art,$opt,$deep); wfDebug("- <\n"); }			}		} else { $newlinktitles = array; foreach ($linktitles as $linktitle) { $exists = false; foreach ($articles as $el) { if ($el->getText == $linktitle->getText) $exists = true; }				if (!$exists) { wfDebug("PdfBook: adding '" . $linktitle->getText . "'\n"); $articles[] = $linktitle; $newlinktitles[] = $linktitle; }			}			foreach ($newlinktitles as $linktitle) { wfDebug("PdfBook: adding subpages of '" . $linktitle->getText . "'\n"); $art = new Article($linktitle); $this->getLinkedArticles($articles,$art,$opt,$deep); }		}	}

I can also send you the complete file if you want. Tbleier 2008-01-25

Added dynamic title page
In order to have a proper title page on the generated PDF, I've added a few lines of code that read a plain HTML file and replace some placeholders with values like "Category name", etc... and then use that file with htmldoc's otherwise static  option.

Additionally, I've added 2 new variables: $wgPdfBookTitleFile and $wgPdfBookLogoImage so one can easily select a title page and logo image (to display at the bottom of a page).

I'll make a small package and put it on some webserver instead of posting the code here (too messy already). :)   The rooker 14:00, 20 February 2008 (UTC)
 * That is exactly what I have done and wanted to discuss with Nad (see above), but he doesn't seem to react. I've gone a little further and now create the titlefile dynamically from the PdfBook extension, so there is no more external HTML file necessary for generating the title page. A logo file was included in my implementation too (only I added it to the header, not the footer, but that's a matter of configuration which can be overruled in the general wiki LocalSettings.php).
 * Since this implementation is not part of the "official" PdfBook extension, I will have to find a place to store it, if anyone is interested. Rooker, have you already stored your solution somewhere? Lexw 09:27, 8 April 2008 (UTC)


 * @Lexw: I've provided a quickly cleaned version including my modifications. See the "README.txt" inside for details: PdfBook-0.0.9-DynamicTitle.tar.bz2 The rooker 10:57, 17 April 2008 (UTC)


 * Thankyou for this i am using this part of the code. Anyone got any ideas about how i can include headers and footers on everypage of the pdf? --194.169.24.100 16:48, 19 June 2009 (UTC)

PHP compilation error
Hello,

I'm trying to install version 0.0.9 on a Red Hat Entreprise Linux ES 4 on which a mediawiki 1.6.8 is running with php 4.3.9. php-book.php has been copied into the "extensions" directory, then include vi LocalSetting : require_once( "extensions/pdf-book.php" );

and we have this error : Parse error: parse error, unexpected T_OBJECT_OPERATOR in /var/wwwwikitn/html/mediawiki-1.6.8/extensions/pdf-book.php on line 66 which is $msg = $wgUser->getUserPage->getPrefixedText.' exported as a PDF book';

Any idea ? Thanks !


 * Just a wild guess: PHP5 needed? Lexw 07:49, 17 April 2008 (UTC)

Problem in pdfbook if only current page should be converted
I had the problem if I use

[ download as PDF]

no PDF was produced because temporary html file was empty.

I had to add the following line to the else block of if ($title->getNamespace == NS_CATEGORY) { $articles[] = $title; Now it works. The new code looks like this

if ($title->getNamespace == NS_CATEGORY) { $cat   = $title->getDBkey ; $db    = &wfGetDB(DB_SLAVE); $cl    = $db->tableName('categorylinks'); $result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); } 			else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); $articles[] = $title; } --Guenterg 11:31, 28 March 2008 (UTC)
 * I had the same problem, your fix works for me too. Thanks a lot, Guenter!
 * Now I still have to find a way how to avoid that the PdfBook template itself is included in the PDF document if I place that template on an article, not on a category. But that's a different matter... Lexw 09:16, 8 April 2008 (UTC)

-- This modification works pretty good. It produces a funny numbering scheme for the page index. RHEL 5 / PHP 5.1.6 / LAMP / MW 1.12

Whole Namespace Export
The tweak below will allow the extract of a whole NameSpace e.g. "Talk" through the additional action "nspdfbook" eg.

http://localhost/wiki/index.php?title=Talk:Main_Page&action=nspdfbook

Note:
 * You may have to up the "; Resource Limits ;" in your php.ini, if you use the mod to export all "Articles".
 * May wish to Alter the Order by, to sort on page name rather than id.

public static function onUnknownAction( $action, $article ) { global $wgOut, $wgUser, $wgParser, $wgRequest; global $wgServer, $wgArticlePath, $wgScriptPath, $wgUploadPath, $wgUploadDirectory, $wgScript;

if( $action == 'pdfbook' || $action == 'nspdfbook' ) {

$title = $article->getTitle; $opt = ParserOptions::newFromUser( $wgUser );

// Log the export $msg = wfMsg( 'pdfbook-log', $wgUser->getUserPage->getPrefixedText ); $log = new LogPage( 'pdf', false ); $log->addEntry( 'book', $article->getTitle, $msg );

// Initialise PDF variables $format = $wgRequest->getText( 'format' ); $notitle = $wgRequest->getText( 'notitle' ); $layout = $format == 'single' ? '--webpage' : '--firstpage toc'; $charset = self::setProperty( 'Charset',    'iso-8859-1' ); $left   = self::setProperty( 'LeftMargin',  '1cm' ); $right  = self::setProperty( 'RightMargin', '1cm' ); $top    = self::setProperty( 'TopMargin',   '1cm' ); $bottom = self::setProperty( 'BottomMargin','1cm' ); $font   = self::setProperty( 'Font',	     'Arial' ); $size   = self::setProperty( 'FontSize',    '8' ); $ls     = self::setProperty( 'LineSpacing', 1 ); $linkcol = self::setProperty( 'LinkColour', '217A28' ); $levels = self::setProperty( 'TocLevels',   '2' ); $exclude = self::setProperty( 'Exclude',    array ); $width  = self::setProperty( 'Width',       '' ); $width  = $width ? "--browserwidth $width" : ''; if( !is_array( $exclude ) ) $exclude = split( '\\s*,\\s*', $exclude ); // Select articles from members if a category or links in content if not if( $format == 'single' ) $articles = array( $title ); else { $articles = array; if( $title->getNamespace == NS_CATEGORY ) { $db    = wfGetDB( DB_SLAVE ); $cat   = $db->addQuotes( $title->getDBkey ); $result = $db->select(						'categorylinks',						'cl_from',						"cl_to = $cat",						'PdfBook',						array( 'ORDER BY' => 'cl_sortkey' )					); if( $result instanceof ResultWrapper ) $result = $result->result; while ( $row = $db->fetchRow( $result ) ) $articles[] = Title::newFromID( $row[0] ); }                       else { if ($action == 'nspdfbook') { $db    = &wfGetDB(DB_SLAVE); $pl    = $db->tableName('page'); $ns    = $title->getNamespace; $result = $db->query("SELECT page_id FROM $pl WHERE page_namespace = $ns ORDER BY page_id"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); $book = "PDFBook_Namespace_Export-".MWNamespace::getCanonicalName($ns); }				else { $text = $article->fetchContent; $text = $wgParser->preprocess( $text, $title, $opt ); if ( preg_match_all( "/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m", $text, $links ) ) foreach ( $links[1] as $link ) $articles[] = Title::newFromText( $link ); } }			}

// Format the article(s) as a single HTML document with absolute URL's			$book = $title->getText; $html = '';

--Andy 13:55, 11 April 2008 (UTC)


 * I've updated this code to work (maybe) with newer versions. —Emufarmers(T 09:36, 20 February 2013 (UTC)

SpecialVersion Issue and PHP 5.1.4
After installing the PdfBook extension and displaying the version page I get:

Notice: Object class PdfBook could not be converted to int in ....\SpecialVersion.php on line 275.

The line in Specialversion.php is "sort ($list);"

I found a general discussion at http://www.webmasterworld.com/php/3586902.htm that talk about 5.1.4 vs 5.2.4. Any thoughts on how to make PdfBook work in php 5.1.4?

I am using Mediawiki 1.12.0 and php 5.1.4.

Link in to Hierarchy
Any ideas on how best to link into the Hierarchy extension? I think this would be very useful because the hierarchy is setup perfect for printing a book. I haven't quite figured out how to set this up though. You would have to use the extensions "hierarchy" table to pull information about where you are on the hierarchy, and what subordinate pages you would have to print. I think it would be nice to print where you are down, like you are on chapter 1, so it only prints chapter 1, but if you are at the title page it will print the whole book. It might also be nice to be able to setup a list of pages and then print that list in order. I am going to do what I can, but I am pretty new to PHP, and any advice is welcome.

--Greg 16:04, 1 May 2008 (UTC)

Exclusions
It would also be nice to have exclusion meta tags where you can specify what parts are included in the book and what parts are not (so if you have a header/footer you don't have to include that in the book)

--Greg 16:07, 1 May 2008 (UTC)

I have also run into this problem, wanting to include only one section of a page using the PageName markup to get just that section as part of the composite print. This would be a great feature.

--Abby621 14:04, 4 June 2008 (UTC)


 * After looking through the code, I discovered you can accomplish exclusions by placing parts of the article you wish not to include inside of tags (example: exclude this section  )

latest version gives syntax error
Parse error: syntax error, unexpected '}' in Pdf_Book.php on line 49

Is this expected?

FWIW, I'm using Ubuntu Hardy Heron with PHP 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2

Swaroopch 21:20, 21 June 2008 (UTC)
 * Sorry about that, fixed --Nad 21:59, 21 June 2008 (UTC)

"??????" instead of russian letters
We have all "?" sings instead of russian letters. Encoding in browser is UTF-8.

--Rius 16:15, 19 June 2009 (UTC)
 * Change default charset from iso-8859-1 to cp-1251, $charset = $this->setProperty('Charset','cp-1251');
 * Replace php function utf8_decode by other function, what can convert utf8 to cp1251; sample, 89 line of file PdfBook.hooks.php: $html  .= iconv("utf-8", "windows-1251", "$h1$text\n"); //utf8_decode( "$h1$text\n" );
 * If no text in pdf displayed, replace fonts used by htmldoc (/usr/share/htmldoc/fonts) by fonts with cyrillic support.

How would you modify the script to include the last date and time edited for each article?
I'm not a PHP wiz and am wondering what would be involved to output the last edit date/time for each article? Preferably, I would like to see this info directly under the article title. Any help would be excellent. Great extension! --Paul

No images

 * I still can't get images in. The image is in the PDF file, and links back to my wiki image, but the picture simply doesn't appear. Help?
 * Also, title page is empty. How to fill it?

Here is my Template: Template:Pdf_book [ Create a PDF Book]]

Updated bibtex_fields.php
Here is an updated bibtex_fields.php with complete Bibtex Entries and Fields.

bibtex_fields.php

Bibtex Required/Optional - for your wiki

 * Latex defines three types of fields:
 * Required - always displayed
 * Optional - usually not used
 * Ignored - never used, can be arbitrary

@article{citation_key, author = {}, title = {}, journal = {}, year = {}, volume = {}, number = {}, pages = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@book{citation_key, author = {}, editor = {}, % author OR editor required title = {}, publisher = {}, year = {}, volume = {}, number =	{}, % volume OR number series = {}, address = {}, edition =	{}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@conference{citation_key, author = {}, title = {}, booktitle = {}, year = {}, editor = {}, pages = {}, organization = {}, publisher = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@inbook{citation_key, author = {}, editor = {}, % author OR editor title = {}, chapter = {}, pages = {}, % chapter AND/OR pages publisher = {}, year = {}, volume = {}, number = {}, % volume OR number series = {}, type = {}, address = {}, edition = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@incollection{citation_key, author = {}, title = {}, booktitle = {}, % booktitle should be exactly the same as title? Not sure. publisher = {}, year = {}, editor = {}, volume = {}, number = {}, % volume OR number series = {}, type = {}, chapter = {}, pages = {}, address = {}, edition = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@inproceedings{citation_key, author = {}, title = {}, booktitle = {}, % booktitle should be exactly the same as title? Some kind of bug? Not sure. year = {}, editor = {}, volume = {}, number = {}, % volume OR number series = {}, pages = {}, address = {}, month = {}, organization = {}, publisher = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@manual{citation_key, title = {}, author = {}, organization = {}, address = {}, edition = {}, month = {}, year = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@mastersthesis{citation_key, author = {}, title = {}, school = {}, year = {}, type = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@misc{citation_key, author = {}, title = {}, howpublished = {}, month = {}, year = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@phdthesis{citation_key, author = {}, title = {}, school = {}, year = {}, type = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@proceedings{citation_key, title = {}, year = {}, editor = {}, volume = {}, number = {}, % volume OR number series = {}, address = {}, month = {}, organization = {}, publisher = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@techreport{citation_key, author = {}, title = {}, institution = {}, year = {}, type = {}, number = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@unpublished{citation_key, author = {}, title = {}, note = {}, month = {}, year = {}, key = {}, url = {}, keywords = {}, abstract = {} }

Types of Bibtex entries = for your wiki

 * There are 14 available entry types.

Bibtex - standard fields - for your wiki

 * The available fields depend on which entry type is being used. Each entry type has required and optional arguments.

Bibtex Nonstandard / Optional Fields - for your wiki

 * The available fields depend on which entry type is being used. Each entry type has required and optional arguments.

Get rid of temporary files
using proc_open (read and write pipes connected to the htmldoc process) you can get rid of temporary files. This also fixes a variable conflict ($link): jhoetzel  'Pdf Book',	'author'      => 'User:Nad',	'description' => 'Composes a book from articles in a category and exports as a PDF book',	'url'	      => 'http://www.mediawiki.org/wiki/Extension:Pdf_Book',	'version'     => PDFBOOK_VERSION	); class PdfBook { # Constructor function PdfBook { global $wgHooks,$wgParser,$wgPdfBookMagic; $wgParser->setFunctionHook($wgPdfBookMagic,array($this,'magicBook')); $wgHooks['UnknownAction'][] = $this; # Add a new pdf log type global $wgLogTypes,$wgLogNames,$wgLogHeaders,$wgLogActions; $wgLogTypes[]            = 'pdf'; $wgLogNames ['pdf']      = 'pdflogpage'; $wgLogHeaders['pdf']     = 'pdflogpagetext'; $wgLogActions['pdf/book'] = 'pdflogentry'; }	# Expand the book-magic function magicBook(&$parser) { # Populate $argv with both named and numeric parameters $argv = array; foreach (func_get_args as $arg) if (!is_object($arg)) { if (preg_match('/^(.+?)\\s*=\\s*(.+)$/',$arg,$match)) $argv[$match[1]] = $match[2]; else $argv[] = $arg; }		return $text; }	function onUnknownAction($action,$article) { global $wgOut,$wgUser,$wgTitle,$wgParser; global $wgServer,$wgArticlePath,$wgScriptPath,$wgUploadPath,$wgUploadDirectory,$wgScript; if ($action == 'pdfbook') {
 * 1) Extension:PdfBook
 * 2) - Licenced under LGPL (http://www.gnu.org/copyleft/lesser.html)
 * 3) - Author: User:Nad
 * 4) - Started: 2007-08-08

# Log the export $msg = $wgUser->getUserPage->getPrefixedText.' exported as a PDF book'; $log = new LogPage('pdf',false); $log->addEntry('book',$wgTitle,$msg); # Initialise PDF variables $layout = '--firstpage toc'; $left   = $this->setProperty('LeftMargin',  '1cm'); $right  = $this->setProperty('RightMargin', '1cm'); $top    = $this->setProperty('TopMargin',   '1cm'); $bottom = $this->setProperty('BottomMargin','1cm'); $font   = $this->setProperty('Font',	'Arial'); $size   = $this->setProperty('FontSize',    '8'); $linkc  = $this->setProperty('LinkColour',  '217A28'); $levels = $this->setProperty('TocLevels',   '2'); $exclude = $this->setProperty('Exclude',    array); if (!is_array($exclude)) $exclude = split('\\s*,\\s*',$exclude); # Select articles from members if a category or links in content if not $articles = array; $title   = $article->getTitle; $opt     = ParserOptions::newFromUser($wgUser); if ($title->getNamespace == NS_CATEGORY) { $db    = &wfGetDB(DB_SLAVE); $cat   = $db->addQuotes($title->getDBkey); $result = $db->select(					'categorylinks',					'cl_from',					"cl_to = $cat",					'PdfBook',					array('ORDER BY' => 'cl_sortkey')				); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); }			else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); }			# Format the article's as a single HTML document with absolute URL's			$book	 = $title->getText; $html	 = ''; $wgArticlePath = $wgServer.$wgArticlePath; $wgScriptPath = $wgServer.$wgScriptPath; $wgUploadPath = $wgServer.$wgUploadPath; $wgScript     = $wgServer.$wgScript; foreach ($articles as $title) { $ttext = $title->getPrefixedText; if (!in_array($ttext,$exclude)) { $article = new Article($title); $text   = $article->fetchContent; $text   = preg_replace('//s','@@'.'@@$1@@'.'@@',$text); # preserve HTML comments $text  .= ''; $opt->setEditSection(false);   # remove section-edit links $wgOut->setHTMLTitle($ttext);  # use this so DISPLAYTITLE magic works $out    = $wgParser->parse($text,$title,$opt,true,true); $ttext  = $wgOut->getHTMLTitle; $text   = $out->getText; $text   = preg_replace('|(]+?src=")(/.+?>)|',"$1$wgServer$2",$text);					$text    = preg_replace('|@{4}([^@]+?)@{4}|s','',$text); # HTML comments hack					$text    = preg_replace('|

I checked the variables and found that all of them are blank except: wgServer wgArticlePath wgScriptPath

I could not find the others in the entire $GLOBALS variable...

I'm not SUPER familiar with MediaWiki's structure and backend, but I would imagine that many of those (especially $wgUser) should be set.

Any ideas?

--Greg 18:23, 22 October 2008 (UTC)

hiding numbering on headings and article title
great extension! is it possible to hide the numbered headings when printing as a book? i noticed that the extension disregards the user preference and __NONUMBEREDHEADINGS__. we are trying to PDF print a "book" of data entry forms and the heading numbers are not required. thanks --Erikvw 06:23, 18 November 2008 (UTC)

What i have done for now is to remove --numbered from the line

$cmd .= "$toc --format pdf14 --numbered $layout $width";

which seems to work fine.

revision id does not appear on pdf
we are tracking revision information for the printed document using - When printing the PDF, the REVISIONTIMESTAMP prints but REVISIONID does not. I noticed the same for Pdf_Export. Any ideas? thanks Erikvw 04:24, 19 November 2008 (UTC)

some special characters and german umlauts result in empty pdf files
When we try to receive categories with umlauts (e.g. "Übersicht") or special characters like "-" in the category name the generated pdf file is empty. Everything else runs real fine and smooth. Great extension. Any workaround or help regarding this problem would be appreciated. --Fydel 12:29, 15 December 2008 (UTC)
 * I found a simple workaround for that issue. I changed the line where htmldoc is called

escapeshellcmd($cmd); to passthru(escapeshellcmd($cmd));
 * --Fydel 09:10, 9 January 2009 (UTC)


 * Hello, I´ve the same problem. The umlaut in the middle of the heading is correct. But at the beginning the umlaut is not visible. There is nothing.
 * --141.35.213.221 09:24, 19 August 2011 (UTC)

Page Limit in PdfBook
How many pages can be fetched using Extension:PdfBook??? Is there any limit for that??

Download snapshot
The download snapshot for PDFbook doesn't seem to work.

I found a copy that is hosted at sourceforge as part of the install for Flowchartwiki and it has extras like the checkPDFbook stuff.

Is there somewhere to get the latest version of the whole thing, not just the pdfbook.php file?

Cheers.

ASHighlight
The bug with ASHighlight is probably to do with the way that ASHighlight embeds the 'highlight' function's CSS stylesheet output. It's a while since I've done anything with ASHighlight, but I remember this part of it being a bit hacky. Hope this helps. Jdpipe 07:27, 24 March 2009 (UTC)

One way to provide for "compatibility" between the two extensions is to "drop" ... tags before feeding htmldoc command. Could lead to some unexpected results ... but IWFM : $text   = preg_replace( '|<style(.+?) |s', ' ', $text );                  # Style CSS hack --Eric Salomé (@ctx.net) 22:49, 30 August 2010 (UTC)
 * In PdfBook.php, there is a main loop for # Format the article(s) as a single HTML document with absolute URL's
 * just add there the following line among the other existing preg_replace control lines

No such extension "PdfBook"
Try to download it and get "No such extension "PdfBook" ". How can I get this extension? --Robinson Weijman 10:39, 19 June 2009 (UTC)

You can download it from Subversion --Rius 14:20, 19 June 2009 (UTC)


 * Thanks for the tip. I cannot find it - do you have a link?  --Robinson Weijman 09:45, 22 June 2009 (UTC)


 * I'm sorry, there is a link on this article page. --Robinson Weijman 09:48, 22 June 2009 (UTC)

Link to htmldoc v1.8.24 for Windows is dead
"First Htmldoc needs to be installed [...]. Windows Binary can be found  here (v1.8.24) [...]."

This link is dead (404)...


 * Hi, I updated the link. Cheers --kgh 19:12, 5 December 2009 (UTC)

Italian charset
I have a wiki in Italian and i have tried many charsets but i just can't find the correct one. The apostrophe gets turned into a Question mark anytime the pdf gets rendered. Modo D'Uso becomes Modo D?Uso. I also have a wiki in English and when i type can't it comes out can't. It's not a special character it's an apostrophe. I now it's something stupid but i don't know which charset to use or don't know a work around. I usually use iso8859-1 with no problems. I just can't wrap my head around it. Thank you for your help in advance.

I believe i have resolved the problem. I changed my charset to utf-8 and where the apostrophe is i replacing it with & acute ;(but with no spacing between the characters). It now comes out an apostrophe everytime.

Add new feature to disable links in PDF Book
I created a parameter that can be defined in the wiki settings that will disable the printing of links when creating PDF Books. The parameter is $wgPDFBookIgnoreLink. By default it is set to false.

J.saterfiel 14:51, 16 September 2009 (UTC)

The links (other than the TOC) in my PDFs refer back to the wiki, not to the location in the book. I want to preserve these internal links for online users. Is there a way to format the links to do that?

User:Dlpetry:DlPetry 05:41, 09 September 2011 (UTC)

Permission denied
I upgraded from mediawiki 1.12 to 1.15 and now I get invalid pdf files with this error inside when I open it with an editor(The Update Directory is set to images):

I set both the content of images and the PdfBook-Folder as executable with chmod 755 and they both have the same owner (root). Laquestianne, 23. September

No Images
I have Version 1.14 and since I upgraded from 1.12 I have no images anymore in PDF´s. I tried allready the 777 on ./images but didn`t help

Any help on this ?

Still broken?
I see a lot of people have problems with this extension, and I am one of them. Has anyone gotten a pdf that is longer than 3bytes long using mw1.15 and PdfBook (Version 1.0.3, 2008-12-09)? I see no php errors in my error log.

Problems with a few categories
I have the problem, that the pdfbook can't create pdfs from all categories. For a few categories it works without problems and other categories don't work. For example I have a category Server, if i want to create a pdf of that category only a blank browserpage opens.

I don't know where the problem is, the categories are not very big (about 20 pages), there are no special characters in the category-title,it's not a problem with htmldoc,...

For a test I have created a new category with the name Servers. Then I put the content of Server in the new category Servers. The creation of pdf for Servers works fine. So it seems to me that's a problem with the name of the category and not with the content. Thank you for help!

Empty Pdf, PDFBook Problem
Hi, I try to use the mediawiki and the pdfbook extension. I have put the extension to the extension folder and included in the LocalSettings.php. I have installed the htmldoc as well. I am using IIS 5.1. When I put the &action=pdfbook to the URL it creates only an empty pdf. What can be wrong? I have only installed the htmldoc, should I make more settings with it? Br, Zsolt

--Nwessel 08:57, 19 October 2010 (UTC) Please notice that PDFBook works with &action=pdfbook only on categories. When you want to create a pdf for a single page you need to add "&format=single".

I am also having the same issue. Mediawiki 1.29 on Windows IIS and MySQL. pdf downloads as 0 byte blank file. I have tried adding &format=single but I continue to get the same result. Is there something that has to be configured with HTMLDOC that I am missing?

Valid fonts
Can anyone give me the complete list of fonts supported for the  setting? Is this dependent on htmldoc or on the system fonts?

I checked the fonts that htmldoc supports, and their FAQ said:
 * HTMLDOC 1.8.20 and higher support embedding of the base Type 1 fonts: Courier, Helvetica, Symbol, and Times. HTMLDOC does not currently allow embedding of arbitrary fonts specified by the HTML FONT element.

But then I see that the default setting for  is   so I'm confused. I'm running MediaWiki on Ubuntu and I have installed the ttf-mscorefonts-installer package, however when I set  to   I get Times instead :(

Altered version with new features
Some people have complianed about not being able to use the diff code I provided in earlier notes. Because this project isn't being updated I've put full docs on my user page (J.saterfiel) for the altered version. Here is the PdfBook.php I use on my mediawiki installation (1.14). It's a little more advanced than the one currently available.

List of new features:
 * Ability to remove links in the documents
 * Ability for a printed Category (collection of articles) to have a cover page with its Category Name and date created printed on it
 * Ability to have a "Download as PDF" link in the tool bar on any page without needing to explicitly place a link on a page you want to create pdfs on.
 * Ability to change the date format used on the header page (http://us3.php.net/manual/en/function.date.php)
 * Ability to change the information printed on each page header and footer(will need to lookup htmldoc http://www.htmldoc.org/ for more info on what the options are and once installed run htmldoc -help as the full options are not displayed on their website.)

--J.saterfiel 15:05, 27 May 2010 (UTC)

Broken with MW 1.16
I recently upgraded from MW 1.11 to 1.16 and I'm having some trouble with this extension. The issue is that pdf's of categories do not work properly. The pdf is created fine, but it uses name of first page in the category for each entry in the pdf unless you put a Heading 1 in the page. If you do put a heading 1 in the page, then it creates a page in between each page with name of the first page in the category.

Expected behavior (worked in MW 1.11), Fruit.pdf (Fruit is the name of the category):
 * Apples
 * (stuff about apples)
 * Bananas
 * (stuff about bananas)
 * Cantaloupe
 * (Stuff about cantaloupe)

Actual behavior, Fruit.pdf:
 * Apples
 * (stuff about apples, first page in the category)
 * Apples
 * (stuff about bananas, since bananas doesn't have a heading 1 saying bananas at the top of the page)
 * Apples
 * (completely blank page, since cantaloupe page below has a heading 1)
 * Cantaloupe
 * (stuff about cantaloupe, has heading 1 saying Cantaloupe)

Can anybody help?

This worked for me

I commented line 129

//$ttext  = $wgOut->getHTMLTitle;

That worked, thanks!

line 124 for me

Math rendering is very ugly
In pdfs, I'm getting some ugly rendering of mathematical expressions. Symbols are abnormally large, 3 times larger than normal text and resolution is poor. Is this normal? Is Pdfbook causing this? In the wiki they are rendered fine, it's just in pdfs that the problem occurs. For example, try:
 * $$\langle T,\mu \rangle$$

Running Pdfbook Version 1.0.4, 2010-01-05 Pgr94 08:39, 12 September 2010 (UTC)

Nothing happens when action url entered
I am running Media wiki 1.16.0 and PdfBook 1.0.4. and have htmldoc installed on server. I installed the extension in the extensions directory and I have added the require line in the Localsettings. but when I enter the url: mywiki.com/wiki/index.php/Category:Software_Documentation&action=pdfbook Nothing happens but the token wiki message "There is currently no text in this page." I tried a different browser, checked the apache error logs, and made sure the /images directory is writable to the web server. All of which gave me no errors or different response. Can someone please give me a push in the right direction here.... Thanks, Melissa


 * That's not the right URL. Take another look at the instructions and follow the syntax there more closely. —Emufarmers(T 23:23, 3 February 2011 (UTC)

You were right...

How to export all articles to a single file
I can create PDF's containing a single category using the following url-call >> http://mywikibox/wikis/wiki_a/index.php?title=Category:CATNAME&action=pdfbook That’s fine so far in my MW 1.15.x setup using latest PdfBook trunk.

My question is simple: How to generate a PDF containing all articles of the entire wiki without creating a new category and adding all articles to that new category.

Problem exporting pdfbook: all category titles (chapters) are the same name
See here for a solution. Cheers --&#91;&#91;kgh&#93;&#93; 16:15, 26 February 2011 (UTC)

Patch filed: Error on single article with
Ahoy,

you probably have this problem often: You add the  GET parameter in your web browser, but you get an empty PDF file, because you have not selected a category, nor added the   GET parameter. Annoying.

I don't want the  GET parameter to be mandatory for single files. This extension can figure out, whether the selected page is a category page. So I added two lines and it is not necessary any more.

Index: PdfBook.php
 * 1) svn diff

=
====================================================== --- PdfBook.php (Revision 82953) +++ PdfBook.php (Arbeitskopie) @@ -114,6 +114,8 @@              $text = $wgParser->preprocess( $text, $title, $opt ); if ( preg_match_all( "/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m", $text, $links ) ) foreach ( $links[1] as $link ) $articles[] = Title::newFromText( $link ); +             else +                $articles = array( $title ); }           }

Voilà, here we go. Best regards, --Mquintus 21:00, 28 February 2011 (UTC)

Vector Skin Sidebar Link for Pdf Print
Does anyone know how I can add the pdfbook link on the navigation/sidebar, on the vector skin same as on the monobook skin?


 * Has anyone got any ideas?

Found it from above under the section 'Requests'

$wgHooks['SkinTemplateToolboxEnd'][] = 'fnPDFBookLink';
 * 1) Create toolbox link

function fnPDFBookLink( &$vector ) {   global $wgMessageCache, $wgPdfBookMessages; foreach( $wgPdfBookMessages as $lang => $messages ) { $wgMessageCache->addMessages( $messages, $lang ); }   $thispage = $vector->data['thispage']; // e.g. "Category:Wiki" $nsnumber = $vector->data['nsnumber']; // NS 14 is category

if ( $nsnumber == 14 ){ echo "\n\t\t\t\t<a href=\"./$thispage?action=pdfbook\">"; $vector->msg( 'pdf_book_link' ); echo "</a></li>\n"; }   return true; }

Grabbing one more link level subpage to PDF (one more indent)
I needed to follow one more link level inside, to complete the product manual. Its working on product page

101a102 >                                      $articles[] = $title; 103c104,113 <                                              foreach ( $links[1] as $link ) $articles[] = Title::newFromText( $link ); --- >                                              foreach ( $links[1] as $link ) { >                                                      $articles[] = Title::newFromText( $link ); >                                                      $subarticle = new Article ( Title::newFromText( $link ) ); >                                                      $text2 = $subarticle->fetchContent; >                                                      $text2 = $wgParser->preprocess( $text2, $title, $opt ); >                                                      if ( preg_match_all( '/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m', $text2, $links ) ) >                                                      foreach ( $links[1] as $link )  $articles[] = Title::newFromText( $link ); >                                              } > > Regards --Edilsonjr

How to make PDF of All Pages?
Tried adding code in MediaWiki v1.16.2 to the file: languages/messages/MessagesEn.php for the All pages variable and the link to download all as PDF appears. Unfortunately the All pages list are only tabled hyperlinks and not bulletted hyperlinks and no pdf generation occurs. Any workaround to have the all page to display as bulletted items and then make into a pdf book?

How to make output landscape instead of portrait?
Any ideas about how to make the output landscape instead of portrait?... there is no setting to change this, so not too sure if it is possible?... any help? Thanks, Alan


 * two small changes of PdfBook.hooks.php serve this purpose
 * add after line 25:
 * add after line 113:
 * These changes offer a new option for calling pdfBook:
 * Kappa (talk) 14:36, 11 December 2012 (UTC)
 * Kappa (talk) 14:36, 11 December 2012 (UTC)

CSS for tables
hello, can i anywhere add css for the table layout? The tables in the output have no boarder, no color, nothing. Is it posible to define a css file with the table layouts?

Icect


 * No. PdfBook uses HtmlDoc. The current stable version of HtmlDoc (1.8) does not support CSS. CSS support will be added to v1.9 (currently under development). Remco de Boer 11:05, 9 August 2011 (UTC)


 * Hey, thank you very much. I´ve now use html attributs and it works. allways I miss the css "empty cells". now I´m looking forward to the new version. --141.35.213.221 09:57, 17 August 2011 (UTC)

oldid of article is not taken into account
When exporting an article to a PDF, the extensions always takes the newest revision instead of a specific one, e. g. an approved one, which is shown by default in my configuration. I've tried to submit the oldid as well in the url but it has no effect. Any suggestions?

Select namespaces are not being rendered
I have 4 namespaces that belong in an category, only 1 of them is able to be "exported" to an PDF via the tool. When going into the other namespaces I do not get the option across the top to select it into the book. Any ideas?

Extension not available after installation
Hello!

Tried to install the extension. I commited all the steps of the installation guide as follows:

- htmldoc installed, works properly - copied all files of the extension into the specified folder - localsettings.php edited.

But there is no link into the navigation to create a pdf or to add pages to pdf-job.

The extension PdfExport works fine, but pdfBook doesn't though the system-entry mentions it as an installed extension?

I've found that the method to generate a PDF depends on what kind of PDF you're trying to create (single page vs multiple pages). The easiest for me was to enable the "Print as PDF" tab - set $wgPdfBookTab = true; to enable this feature. Otherwise, I've had to add text to pages or create new pages to generate PDFs. Hope this helps! Becky

FlaggedRevs and PDFBook
Hello, I have a Problem, I want to print the current flagged Version of my article. But PDFBook uses the last edit version (article.php does not provides any other functions)so I try to include the FlaggesRevs Classes in the PDFBook and became an error. I´ve no idea why it does not work.

Someone here who know the problem? I will use the FlaggedRevs Classes to use the last flagged Version.

--141.35.213.221 09:52, 16 September 2011 (UTC)

Slightly different version
Excuse my lack of knowledge in how to update Wiki correctly, but I'm editing Boldly, so...

Some of the comments above revolve around:
 * Not being able to use this extension on a historical document
 * Other extensions not resolving

I've done a lot of work on this extension to modify it for my needs, under v1.16. It also includes a lot of the earlier comments and solutions included in this version. And of course resolves the two issues I listed above.

My version has a lot of bespoke coding (strongly formatting documents based on their name, for example), so it's not reasonable to push all that into the mainstream.

So, for those who need it, the code is below.

<?php /** * PdfBook extension * - Composes a book from articles in a category and exports as a PDF book * * See http://www.mediawiki.org/Extension:PdfBook for installation and usage details * See http://www.organicdesign.co.nz/Extension_talk:PdfBook for development notes and disucssion * * Started: 2007-08-08 * * @package MediaWiki * @subpackage Extensions * @author Aran Dunkley User:Nad * @copyright © 2007 Aran Dunkley * @licence GNU General Public Licence 2.0 or later * */ if (!defined('MEDIAWIKI')) die('Not an entry point.');

define('PDFBOOK_VERSION', '1.0.3, 2008-12-09');

$wgExtensionFunctions[]       = 'wfSetupPdfBook'; $wgHooks['LanguageGetMagic'][] = 'wfPdfBookLanguageGetMagic';

$wgExtensionCredits['parserhook'][] = array(	'name'	     => 'PdfBook',	'author'      => 'User:Nad',	'description' => 'Composes a book from articles in a category and exports as a PDF book',	'url'	      => 'http://www.mediawiki.org/wiki/Extension:PdfBook',	'version'     => PDFBOOK_VERSION	);

class PdfBook {

function PdfBook { global $wgHooks, $wgParser, $wgPdfBookMagic; global $wgLogTypes, $wgLogNames, $wgLogHeaders, $wgLogActions; $wgHooks['UnknownAction'][] = $this;

# Add a new pdf log type $wgLogTypes[]            = 'pdf'; $wgLogNames ['pdf']      = 'pdflogpage'; $wgLogHeaders['pdf']     = 'pdflogpagetext'; $wgLogActions['pdf/book'] = 'pdflogentry'; }

/**	 * Perform the export operation */	function onUnknownAction($action, $article) { global $wgOut, $wgUser, $wgTitle, $wgParser, $wgRequest; global $wgServer, $wgArticlePath, $wgScriptPath, $wgUploadPath, $wgUploadDirectory, $wgScript;

if ($action == 'pdfbook') {

$title = $article->getTitle; $opt = ParserOptions::newFromUser($wgUser); $oldpage = $wgRequest->getText('oldid');

# Log the export $msg = $wgUser->getUserPage->getPrefixedText.' exported as a PDF book'; $log = new LogPage('pdf', false); $log->addEntry('book', $wgTitle, $msg);

# Initialise PDF variables $format = $wgRequest->getText('format'); # setting the format depending on the document title. if (substr($title->getText, 0, 2) == "FS") $format = 'singlebook'; if (substr($title->getText, 0, 3) == "REQ") $format = 'singlebook'; if (substr($title, 0, 9) == 'Category:') { $format = 'book'; $oldpage=0; }           # EOC $notitle = $wgRequest->getText('notitle'); $layout = $format == 'single' ? '--webpage' : '--firstpage c1'; if ($format == 'singlebook') $layout = ' '; //$layout = $format == 'single' ? ' ' : '--firstpage c1'; $charset = $this->setProperty('Charset',    'iso-8859-1'); $left   = $this->setProperty('LeftMargin',  '1cm'); $right  = $this->setProperty('RightMargin', '1cm'); $top    = $this->setProperty('TopMargin',   '1cm'); $bottom = $this->setProperty('BottomMargin','1cm'); $font   = $this->setProperty('Font',	     'Arial'); $size   = $this->setProperty('FontSize',    '8'); $linkcol = $this->setProperty('LinkColour', '217A28'); $levels = $this->setProperty('TocLevels',   '2'); $exclude = $this->setProperty('Exclude',    array); $width  = $this->setProperty('Width',       ''); $width  = $width ? "--browserwidth $width" : ''; if (!is_array($exclude)) $exclude = split('\\s*,\\s*', $exclude); # Select articles from members if a category or links in content if not if ($format == 'single' || $format == 'singlebook') $articles = array($title); else { $articles = array; if ($title->getNamespace == NS_CATEGORY) { $db    = wfGetDB(DB_SLAVE); $cat   = $db->addQuotes($title->getDBkey); $result = $db->select(						'categorylinks',						'cl_from',						"cl_to = $cat",						'PdfBook',						array('ORDER BY' => 'cl_sortkey')					); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); }				else { $text = $article->fetchContent; $text = $wgParser->preprocess($text, $title, $opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m', $text, $links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); }			}

# Format the article(s) as a single HTML document with absolute URL's			$book = $title->getText; $html = ''; $titlehtml = ''; $titledone = 0; $wgArticlePath = $wgServer.$wgArticlePath; $wgScriptPath = $wgServer.$wgScriptPath; $wgUploadPath = $wgServer.$wgUploadPath; $wgScript     = $wgServer.$wgScript; # Output some basic metadata for HTMLDOC: $html = "  "; #if (substr($book, 0, 3) != "EST" && substr($book, 0, 2) != "FS") { #if (substr($book, 0, 3) != "EST") { $html .= " $book "; #}			foreach ($articles as $title) { $ttext = $title->getPrefixedText; if (!in_array($ttext, $exclude)) { $article = new Article($title); $text   = $article->fetchContent(strlen($oldpage) == 0 ? 0 : $oldpage); $text   = preg_replace('//s', '@@'.'@@$1@@'.'@@', $text); # preserve HTML comments if ($format != 'single' && $format != 'singlebook') $text .= ''; $opt->setEditSection(false);   # remove section-edit links $wgOut->setHTMLTitle($ttext);  # use this so DISPLAYTITLE magic works $text   = $wgParser->preprocess($text, $title, $opt, strlen($oldpage) == 0 ? 0 : $oldpage); $out    = $wgParser->parse($text, $title, $opt, true, true); //$ttext  = $wgOut->getHTMLTitle; $text   = $out->getText; $text   = preg_replace('|(<img[^>]+?src=")(/.+?>)|', "$1$wgServer$2", $text);       # make image urls absolute                    $text    = preg_replace('|<div\s*class=[\'"]?noprint["\']?>.+? |s', , $text); # non-printable areas                    $text    = preg_replace('|@{4}([^@]+?)@{4}|s', , $text);                  # HTML comments hack                    #$text    = preg_replace('|<table|', '<table border borderwidth=2 cellpadding=3 cellspacing=0', $text);                    // Ignore Links code                    $text    = preg_replace('|<a|','<span',$text);                    $text    = preg_replace('|</a>|',' ',$text);                    // EOC //                    $ttext   = basename($ttext);                    $h1      = $notitle ?  : " $ttext ";                    if (strpos($text,) !== FALSE) {                        $titlehtml = utf8_decode(substr($text, 0, strpos($text,''))); $text = utf8_decode(substr($text, strpos($text,'') + 12)); $h1 = ''; }                   if ($format != 'single' && $format != 'singlebook' && $titledone == 0) { $titlehtml  = utf8_decode("$text\n"); $titledone = 1; } else { if (stripos($ttext, "appendix") == true) { $html  .= utf8_decode("$text\n"); } else { if (substr($book, 0, 3) == "EST" || substr($book, 0, 2) == "FS" || substr($book, 0, 3) == "REQ") { $html  .= utf8_decode("$text\n"); } else { $html  .= utf8_decode("$h1$text\n"); }                       }                    }				}			}            # Finish off the basic HTML for the production $html .= " ";

# If format=html in query-string, return html content directly if ($format == 'html') { $wgOut->disable; header("Content-Type: text/html"); header("Content-Disposition: attachment; filename=\"$book.html\""); print $titlehtml.$html; }			else { # Write the HTML to a tmp file $titlefile = "$wgUploadDirectory/".uniqid('pdf-book'); $tfh = fopen($titlefile, 'w+'); fwrite($tfh, $titlehtml); fclose($tfh); $file = "$wgUploadDirectory/".uniqid('pdf-book'); $fh = fopen($file, 'w+'); fwrite($fh, $html); fclose($fh);

$footer = $format == 'single' ? '...' : '../';               //$footer = '../'; $header = $format == 'single' ? '...' : '..t'; $toc   = $format == 'single' ? '' : " --toclevels $levels --toctitle \"Contents\" --tocheader $header --tocfooter ..i"; //$toc   = " --toclevels $levels --toctitle \"Contents\" --tocheader $header --tocfooter ..i";

$cmd = " --book"; $cmd .= " --links --linkstyle plain --linkcolor $linkcol"; $cmd .= " --title --titlefile $titlefile"; $cmd .= " --size A4 --numbered"; $cmd .= " --left $left --right $right --top $top --bottom $bottom "; $cmd .= " --header $header --header1 $header --footer $footer --nup 1"; $cmd .= "$toc"; $cmd .= " --portrait --color --no-pscommands --no-xrxcomments --compression=9"; $cmd .= " --jpeg=75 --fontsize $size --fontspacing 1.1 --headingfont $font --bodyfont $font"; $cmd .= " --headfootsize $size --headfootfont $font --charset $charset"; $cmd .= " --no-embedfonts --pagemode document --pagelayout single $layout"; $cmd .= " --permissions all"; $cmd .= " --browserwidth 680 --no-strict --no-overflow"; $cmd = "htmldoc -t pdf14 $cmd $file"; # Send the file to the client via htmldoc converter $wgOut->disable; header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$book.pdf\""); putenv("HTMLDOC_NOCGI=1"); passthru($cmd); @unlink($file); @unlink($titlefile); }			return false; }		return true; }

/**	 * Return a property for htmldoc using global, request or passed default */	function setProperty($name, $default) { global $wgRequest; if ($wgRequest->getText("pdf$name"))  return $wgRequest->getText("pdf$name"); if (isset($GLOBALS["wgPdfBook$name"])) return $GLOBALS["wgPdfBook$name"]; return $default; }

/**	 * Needed in some versions to prevent Special:Version from breaking */	function __toString { return 'PdfBook'; } }

/** * Called from $wgExtensionFunctions array when initialising extensions */ function wfSetupPdfBook { global $wgPdfBook; $wgPdfBook = new PdfBook; }

/** * Needed in MediaWiki >1.8.0 for magic word hooks to work properly */ function wfPdfBookLanguageGetMagic(&$magicWords, $langCode = 0) { global $wgPdfBookMagic; $magicWords[$wgPdfBookMagic] = array($langCode, $wgPdfBookMagic); return true; }

//Add on for link to print on the tool bar menu $wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['SkinTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox'; function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => 'Download as PDF',                        'href' => $nav_urls['href'].'?action=pdfbook&format=single&notitle&oldid='.$oldid                ); return true; } function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?><?php htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['text'] ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['href'] ) ?>"><?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['text'] ); ?></a><?php ?></li><?php }       return true; } --217.158.90.2 12:56, 31 January 2012 (UTC)

Corrupt PDF file when opened contains error message
The PDF file created from a page actually contains this: HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

Usage: htmldoc [options] filename1.html [ ... filenameN.html ]

Options:

--batch filename.book --bodycolor color --bodyfont {courier,helvetica,monospace,sans,serif,times} --bodyimage filename.{bmp,gif,jpg,png} --book …

Any ideas how to proceed with this?

Thanks,

Gareth.

Fix variable quoting
I am running mediawiki on Windows (via a Bitnami installation) and the default path contains a space (the "Program Files" part). PDFBook does not quote the path when executing the htmldoc command and as such the command line is not parsed correctly and the above error is what appears in the PDF file. The solution is to make line 73 in PDFBook.php:

$cmd = "htmldoc -t pdf --charset $charset $cmd $file";

Look like this:

$cmd = "htmldoc -t pdf --charset $charset $cmd \"$file\"";

Fix SELinux labels
I have just installed PdfBook on a Centos 7 machine, and got the same error. In my case the reason was SELinux preventing httpd from writing to the images directory. Fixing the labels for /path/to/mediawiki/install/images(/.*)? as described in SELinux fixed the issue for me. --217.253.60.186 23:52, 19 January 2016 (UTC)

$wgPdfBookFormat
I wanted to control in LocalSettings.php whether to have format=single or not. This is what I came up with (add $wgPdfBookFormat = "single"; to your LocalSettings.php or not), but it leaks memory, maybe someone with actual PHP knowledge has a better idea. Thanks, 67.164.57.135 04:18, 8 June 2012 (UTC)

htmldoc binaries location
I installed htmldoc on Debian 6 up-to-date but I didn't have any binairies in /usr/local/bin as exposed in the setting command but in /usr/bin. It works since I fixed the path but I'm not sure I did the right thing.

Shimegi (talk) 07:03, 3 July 2012 (UTC)


 * This is correct. Different distributions use different paths --Pastakhov (talk) 07:26, 3 July 2012 (UTC)

Header level incorrect
When I'm exporting a category, I have a problem:

There are pages A and B in the category.

Pages A and B each have two levels of headers.

What I get in the resulting PDF is:

1) Title of Page A 2) Header Level 1 of Page A 2.1) Header Level 2 of Page A 3) Another Header Level 1 of Page A 4) Title of Page B 5) Header Level 1 of Page B 6) Another Header Level 1 of Page B 6.1) Header Level 2 of Page B

whereas the more correct result IMHO would be:

1) Title of Page A 1.1) Header Level 1 of Page A 1.1.1) Header Level 2 of Page A 1.2) Another Header Level 1 of Page A 2) Title of Page B 2.1) Header Level 1 of Page B 2.2) Another Header Level 1 of Page B ... 2.2.1) Header Level 2 of Page B

You get the idea...

Attempt to fix
This fix increments the level of each header in a document by one (in a plain way).

<pre class="brush:php"> foreach ( $articles as $title ) { $ttext = $title->getPrefixedText; if ( !in_array( $ttext, $exclude ) ) { $article = new Article( $title ); $text   = $article->fetchContent; $text   = preg_replace( '//s', '@@'.'@@$1@@'.'@@', $text ); # preserve HTML comments if ( $format != 'single' ) $text .= ''; $opt->setEditSection( false );   # remove section-edit links $wgOut->setHTMLTitle( $ttext );  # use this so DISPLAYTITLE magic works $out    = $wgParser->parse( $text, $title, $opt, true, true ); $ttext  = $wgOut->getHTMLTitle; $text   = $out->getText; $text   = preg_replace( '|(<img[^>]+?src=")(/.+?>)|', "$1$wgServer$2", $text );       # make image urls absolute                $text    = preg_replace( '|<div\s*class=[\'"]?noprint["\']?>.+? |s', , $text ); # non-printable areas                $text    = preg_replace( '|@{4}([^@]+?)@{4}|s', , $text );                  # HTML comments hack                $text    = preg_replace('|<table|', '<table border borderwidth=2 cellpadding=3 cellspacing=0', $text);        # JM 2012-07-26                $text = preg_replace('/<h5/', '<h6', $text);                $text = preg_replace('/<h4/', '<h5', $text);                $text = preg_replace('/<h3/', '<h4', $text);                $text = preg_replace('/<h2/', '<h3', $text);                $text = preg_replace('/<h1/', '<h2', $text);                $text = preg_replace('|</h5|', '</h6', $text);                $text = preg_replace('|</h4|', '</h5', $text); $text = preg_replace('|</h3|', '</h4', $text); $text = preg_replace('|</h2|', '</h3', $text); $text = preg_replace('|</h1|', '</h2', $text); }       # end JM                $ttext   = basename($ttext); $h1     = $notitle ? '' : " $ttext "; $html  .= utf8_decode("$h1$text\n"); } }

Note the preg_replaces in-between the comments.

BTW if you'd like to keep the page breaks you can replace

$text = preg_replace('/<h1/', '<h2', $text);

by $text = preg_replace('/<h1/', '<h2', $text);

Problems with Htmldoc (1.9.0)
Hello,

With new version of htmldoc (as the one used by archlinux), I got all my content on one single Line (I mean all content of all pages on only one line !). To fix this, I've added, in phpbook.hook.php a " " tag for the html content.

$html = " ".$html." ";

I hope it may help other users !

Mathieu

Is this extension actively maintained?
I'm interested in porting this extension to use the wkhtmltopdf library, which does a superior job of HTML>PDF than HtmlDoc. Has anybody already done any work on this? Is the extension being actively maintained and developed? Andrujhon (talk)

Maintainance
Aran Dunkley seems to be off to Brazil - I just sent him a Facebook message to find out whether he's going to maintain the pdfbook extension. There seem to be a few modified versions out there already e.g.
 * https://github.com/geops/PdfBook

Please add any other link. I'm volunteering to setup a new maintained version on github if Aran doesn't want to continue to work on the svn version. -- Seppl2013 (talk) 15:54, 22 September 2013 (UTC)

Got it working!
I somehow got it working..

I'm not exactly sure which steps are the ones that helped, so here are some things I did:


 * 1) I did not use the .php files listed here.  Instead, I used the ones User:Seppl20 pointed to at github.
 * 2) Installed HTMLDOC into the Apache (I'm using XAMPP) cgi-bin folder.
 * 3) I don't know anything about Apache PATH whatevers, so I tried to follow the instructions here to open Apache's path.
 * 4) Finally, I tried following HTMLDOC manual instructions for a little bit but I kept getting stuck with an empty .pdf.  I went to my host computer and tried running the same change to the url (?title=Category:blah&action=pdfbook) and found that I was getting the missing LIBEAY32.DLL error.
 * 5) Blindly copied/pasted the htmldoc.exe, LIBEAY32.DLL, MSVCR.dll, SSLEAT32.DLL files into every possible directory from the server down to the wiki.  I can't be certain which folder was the one that needed it.  I want to say it was in /htdocs/mywiki, but again... blind pasting everywhere.

I read a lot of comments about about commenting out lines or adding in small amounts of text, but my final product did not entail any of that.

Good luck!

Hollymollybobolly (talk) 22:35, 20 December 2013 (UTC)

Set href format=single if page is not a category (wgPdfBookTab = true)
I modified the code that the format=single is used if the page is not a category. +++ PdfBook/PdfBook.hooks.php  2014-03-19 11:19:05.000000000 +0100 @@ -162,11 +162,20 @@               global $wgPdfBookTab;

if ( $wgPdfBookTab ) { -                      $actions['views']['pdfbook'] = array( -                               'class' => false, -                               'text' => wfMsg( 'pdfbook-action' ), -                               'href' => $skin->getTitle->getLocalURL( "action=pdfbook&format=single" ), -                       ); +                      if ( $skin->getTitle->isContentPage ) { +                              $actions['views']['pdfbook'] = array( +                                       'class' => false, +                                       'text' => wfMsg( 'pdfbook-action' ), +                                       'href' => $skin->getTitle->getLocalURL( "action=pdfbook&format=single" ), +                               ); +                      } +                       else { +                              $actions['views']['pdfbook'] = array( +                                       'class' => false, +                                       'text' => wfMsg( 'pdfbook-action' ), +                                       'href' => $skin->getTitle->getLocalURL( "action=pdfbook" ), +                               ); +                      }                }                return true; }

Compatability with MediaWiki v1.23
I'm currently on branch wmf/1.23wmf19 and I found the headings (h2, h3, etc) wouldn't show up correctly in the PDF. It turns out that the headings now contain some extra span tags that don't go well together with HTMLDOC. I wrote a hack in Perl to remove the unwanted elements. My PHP isn't up to par to implement it well in PdfBook so I hope someone will find this useful and implement it properly. Here are the Perl regexp's I used.

$content =~ s/<span class="mw-headline".*?>(.*?)<\/span>/$1/g; $content =~ s/ (.*?)<\/span>//g; $content =~ s/ (.*?)<\/span>//g;

I`ve added in phpbook.hooks.php $text = preg_replace( '/<span class="mw-headline" id="(.*?)">(.*?)<\/span>/', "$2", $text );  to  ? See the configuration options. --Remco de Boer 11:42, 1 July 2019 (UTC)

Yes this is already in the localsettings file. still can't find the pdf book links. any other ideas?
 * Are you logged in? (See the issue reported above)? --Remco de Boer 07:05, 2 July 2019 (UTC)
 * Additionally: which version of MediaWiki are you using? --Remco de Boer 07:06, 2 July 2019 (UTC)
 * Yes, I'm logged in as Admin. Version 1.29.2
 * Not sure why you don't see the tab. Have you tried to manually enter a PDF creation URL? E.g. . --Remco de Boer 18:54, 2 July 2019 (UTC)
 * With this link a PDF gets created, it works. What is the thing with HTMLDOC? I installed it on my server (I'm using a Windows Webserver IIS). Do I need to add something to the Path in Control Panel?

Blank PDFs being generated
Hi, I am trying to export a whole Wiki via the categories.

In total there are around 700 pages. When I run PdfBook I get blank PDFs. I have had it briefly working on a docker installation, but have failed to replicate this.

I have tried on a local server, on a local server using docker, but I just can't get it working.

I have been unable to install it on my production server due to dependency issues with Windows and htmldoc.

My local server is on Fedora, the docker one, docker is running on Windows 10, then it is just the standard MediaWiki image/container. I have installed htmldoc using the command line in both Fedora and Docker.

Any help is appreciated. I note this has been discussed before, but can't find an actual answer

--Squeak24 (talk) 21:27, 23 July 2019 (UTC)