Extension talk:PdfBook/Archive

Missing Images in Https with authentication
--Johnp125 13:52, 11 June 2008 (UTC)

I have a problem when downloading to a pdf file I do not get the pictures when I am downloading via https location. When I download via http the pictures show up. Any idea why this would be the case?
 * Our SSL doesn't seem to be functional at the moment, but can you check the URL its trying to load the images from? maybe check if exporting as raw html instead of pdf also has problem images? --Nad 07:11, 12 June 2008 (UTC)

--Johnp125 15:25, 12 June 2008 (UTC)

I can get the html file to show the pictures, but it wants to register again with the authentication server, or ie says allow blocked content, which I click on and try and sign on again. The server is not allowing me to sign on again, but the pictures are still showing up.

--Johnp125 14:57, 29 July 2008 (UTC)

I think the problem may be a security issue. Is there a way to generate the data without requesting authentication from the web server? I can get the html version to show the pictures just not the pdf version. If I go to the back door and access the site via http then the pictures show up via pdf.

Sdball 17:36, 6 November 2008 (UTC)

I had the same problem, so I tweaked the extension to:
 * use files in /tmp so image references can work
 * search the generated html for images
 * determine the actual path to the image from their url
 * i.e. https://server.com/wiki/images/a/b/image.jpg -> /www/wiki/images/a/b/image.jpg
 * use that path to copy the image file to /tmp
 * modify the generated html to point to the image file, not the absolute url
 * i.e. src= https://server.com/wiki/images/a/b/image.jpg -> src=image.jpg

Feel free to contact me if you'd like the code.

Requests

 * is it possible to add a link in the toolbox menu section which is only viewable on categories pages?

Mediawiki 1.11.0
Version 0.0.3 didn't work anymore after an upgrade. I made a little fix to PdfBook.php around line 98 of PdfBook.php and it works again.

// while ($row = mysql_fetch_row($result)) { while ($row = $db->fetchRow($result)) {

Disclaimer. I don't know PHP for real, don't know mediawiki, don't know how to program. Just got it by inserting debug statements into PdfBook.php. Looks like mysql_fetch is censored somewhere now ;)

PS: To insert debug statements: $wgDebugLogFile = "/tmp/debug.log"; // file should be writable can be anywhere. wfDebug (.....);
 * In LocalSettings.php insert:
 * Anywhere in the code, insert

- Daniel (edutechwiki.unige.ch)
 * Thanks a lot for this, it's still not working for me in 1.11 (I've only just done my 1.11 upgrade), but I've made some changes based on your findings which have got it partially there ;-) --Nad 21:36, 21 September 2007 (UTC)
 * It seems that 1.11 is a bit more memory hungry and my large test books were killing it, after giving PHP 64MB it's working fine now! --Nad 21:41, 21 September 2007 (UTC)

Empty file downloaded
Greetings Nad,

I have been trying to use your PDFBook Mediawiki extension since it may be a great solution to an issue I have.

I have installed HTMLDoc under "c:\pogram files" and can use it on its own to create PDF Books. I have also included the "PdfBook.php" in my "Local Settings.php" file.

The issue I am having is that when I select the link to export my category as a book and select to save or open the pdf file it has 0 bytes. So, the file is created with the correct name but with no data.

Is there something else I must do to ensure HTMLDoc.exe is actually being called by your extension? Is there a required directory that it needs to be in?

Any help would be appreciated!

Thanks!
 * You have to make sure that htmldoc is in your executable PATH so that it can execute from just typing "htmldoc" without needing to supply the full pathname no matter what current directory you're in. Another thing to check would be to comment out the "@unlink($file)" line and after saving a pdf, check if it's left a tmp file in the root of your images directory, which is the data sent to htmldoc. --Nad 00:35, 6 September 2007 (UTC)


 * I'm experiencing the exact same problem, my files turns up empty. I run the server on a windows machine using Apache. I've installed HTMLDoc and I'm able to create PDF-files using the GUI. If I comment out "@unlink($file)" and then generates the tmp-file through the GUI I'll get my pdf, but all files I download are 0 byte in size... What can be wrong? /Jesper 15:59, 23 October 2007 (UTC)
 * With some hacking of Pdf_Book.php I'm now able to create PDF:s, but only from categories, not from a single page. By commenting out "putenv("HTMLDOC_NOCGI=1");" on line 152 it now generates Category PDF:s. /Jesper 08:09, 25 October 2007 (UTC)
 * I can't even get this far. Did you make any changes other than commenting out that one line? Has anybody else gotten this to work on an Apache Server running on Windows? -Michelle 19:19, 1 May 2008 (UTC)

Invalid PDF File
Nad,

Thanks for your quick response!

However, I am still having issues. The File is being created and has size to it....but Adobe Reader gives me the following error."

"There is an error opening this document. This file is damaged and cannot be repaired".

HTMLDoc seems to be quitting during the conversion job.

If I add the ".html" extension to the temp file and run HTMLDoc from the command line I can convert the temp html file manully over to a PDF file.

I then compare in Notepad the one I generated and the one your script creates and notice the PDF your script creates quites after pocessing a certain amount of lines.

I have your PDFExport Extension working just fine...so I was wondering what else it could be.

Any ideas?

Thanks!
 * How long is it taking to generate the PDF before quitting? 30 seconds? if that long it could be reaching max execution time? and how large is the PDF before it bails? --Nad 20:29, 6 September 2007 (UTC)

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

Like I stated before, I'm using your PDFExport Extension and it works great.

Let me know what you think      --136.182.158.153 21:29, 6 September 2007 (UTC)
 * When you run htmldoc manually passing the generated tmp file to it, are you using the exact same command and parameters that the extension uses? --Nad 21:51, 6 September 2007 (UTC)

continued...

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

If I change this line;

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file";

to

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file > test.pdf";

Then I get a test.pdf in my mediawiki root folder which works perfectly


 * You could try changing the htmldoc command to use passthru like Extension:Pdf Export - I had it like that on mine but had problems with the gzip encoding, but it may work better like that for you --Nad 21:55, 6 September 2007 (UTC)

images
Is there any possibility of getting images displayed in the pdf Book as well?. would be a fantastic improvement. Any workarounds? Martin
 * I'm working on it, I just can't get them to work currently. I'm checking out some of the solutions at Extension talk:Pdf Export too as that one uses htmldoc as well. --Nad 12:39, 12 September 2007 (UTC)

Nad, thanks for your great work. I made some fixes to your extension and got it to work correctly with images, even with secure server without modifying .htaccess.

The points are:
 * when generating html output only, links to images could stay absolute as currently.
 * when generating pdf output, links to images should be converted to relative links to the temp file (pdf-book-something in $IP/images)
 * --browserwidth could be a workaround when you have only large images, but would make your small images too small when your image sizes varying a lot. My solution is to rescale large images to fit in the page (pick up image width and height from html output, if they are too big for the paper size, then adjust width="x%", x depending on the ratio width/maxWidth and height/maxHeight.

Hope this helps. Just tell me if you'd want me to send you my codes. Lechau 02:20, 6 June 2008 (UTC)

A hack
In file PdfBook.php around line 118 (I may have inserted other stuff) just before "#write the HTML to a tmp file" insert this:

$ori_string = 'src="'; $repl_string = 'src="'. $wgServer; $html = str_replace ($ori_string, $repl_string, $html);

The problem is that the intermediary output file got stuff like this: src="/mediawiki/images/thumb/pict.png but you want: httpee://your.server.org/mediawiki/images/thumb/pict.png
 * 1) Write the HTML to a tmp file

This is not the best solution, a regexp hacker should actually rip away most of the html picture markup and then replace the thumb by the original pic maybe. But above is at least a minimal job. To see the intermediary file as someone said, comment the unlink at the end and the get it from the images file. //@unlink($file);

Sorry, I'm not a real programmer and have too much workload to help for real. Just wanted to produce some handouts ;) - Daniel

Same problem as section 2
I'm on Ubuntu Linux with Mediawiki 1.10. Htmldoc is in /usr/bin. I commented out the unlink command, and the temp file is empty (0 length).

I checked to be sure that my Apache user can run htmldoc -- it can. Unsure what I should try next.

By the way, your single-page export plugin works perfectly (even for images). So I know that htmldoc is not at fault here.
 * I didn't write the single page one, but the code seems pretty similar. I'll just have to see what differences there is in the code between this one and the single-page one. --Nad 22:28, 14 September 2007 (UTC)

Upload filetype
What happens when pdf is not a valid file type when uploading? Does the wiki control this with this extension, if so do I need to add pdf file types to the type of files you can upload?
 * The upload filetype is unrelated to this since exported pdf's are downloaded not uploaded. If you want to add pdf to your allowed upload filetypes, use $wgFileExtensions[] = 'pdf', you may also want to set $wgVerifyMimeType to false if it's giving you hassles when you try and upload exotic types of file. --Nad 04:11, 21 September 2007 (UTC)

More empty file downloads
--Johnp125 02:12, 25 September 2007 (UTC)

Sorry to be such a pain. I have setup a test wiki which is running fedora --Johnp125 00:23, 27 September 2007 (UTC)c 4. Please check out my test wiki and see if you can give me some direction. I have debug for the wiki in localsettings.php on. If you need admin access please email me at johnp125@yahoo.com and I'll hook you up.

http://wikitest.homelinux.net/wiki2/index.php/Main_Page
 * The output shows a bug due to 1.11 being more strict about hook return values. Try again now with the latest version, 0.0.4. Also note that even if it works, you will get just an empty document since the point of this extension is to compose a book from the content of a category, if it not placed in a category or the category contains no members then the result will be empty. To export the content of a single page you should be using Extension:Pdf Export. --Nad 03:33, 25 September 2007 (UTC)
 * However, I'm working on version 0.5 now which can be used in non-category pages and will compose the book from the article links found in the page, so that books can then be composed from explicit lists or DPL queries. --Nad 03:33, 25 September 2007 (UTC)

--Johnp125 13:28, 25 September 2007 (UTC)

Hey that sound great I'd love to help you with it.

You mentioned single page. I had 2 types of pdf downloads there.

http://wikitest.homelinux.net/wiki2/index.php?title=Category:test&action=pdfbook this one should be going after the demo page with the catageory:test and then creating a pdf book from that. Is this not the right way to use the code? I know if I created more pagese and put the catageory:test under them they would get put into the pdf file as well.
 * You had a typo in the word "category", link is working now ;-) --Nad 22:21, 25 September 2007 (UTC)

--Johnp125 17:30, 26 September 2007 (UTC)

Thanks a bunch. Your the greatest. Glad to have this working now.

Checked out your info about Images not showing in mediawiki 1.10.2---1.11. Nice work.
 * I just did another update yesterday which has images working now --Nad 21:06, 26 September 2007 (UTC)

--Johnp125 00:16, 27 September 2007 (UTC)

Is this the update that is going to work with DPL queries? I started to play around with that extension. I know it's working but right now it's too big to try and figure out.

--Johnp125 00:23, 27 September 2007 (UTC)

Hey by the way could you tell me how to make the pdfbook extension just make a big html file, so I could open it in word or openoffice in html format and let the office program convert it from the html file? Or is it easier to say and harder to do?
 * That feature is very easy to add because it simply requires not sending the file to HTMLDOC, I've added an option in a new version (0.0.7) which allows you to do this by adding format=html to the query-string. --Nad 02:06, 27 September 2007 (UTC)

--Johnp125 22:04, 30 September 2007 (UTC)

Wow that sounds great can't wait to try out the html export. I looked for the 0.0.7 version but only saw the 0.0.6 version when I went to the download section. Also could you give me a example of how the format=html is used.

http://www.foo.bar/wiki/index.php?title=Catgeory:Foo&action=pdfbook

Where would it go in this string?
 * Sorry about that I must have forgotten to update it, it's at 0.0.7 now. To change the URL above to produce html, append &format=html to it. We use a template which has a link for both, see OrganicDesign:Template:Book. --Nad 07:11, 1 October 2007 (UTC)

--Johnp125 01:55, 2 October 2007 (UTC)

The html export looks really good. I Did notice on small html files Microsoft word gets confused about it. Maybe if you put the html header info at the top and bottom of page to help microsoft word out. Openoffice did not seem to have a problem with it. However word is looking for the html tags on small exports. If it's a big export it gets the idea.

--Johnp125 02:08, 2 October 2007 (UTC)

Just tested it again with a small html download. Word tried to format it when opening. Then I added the to the beginning and then added the at the end. Then reopened the file with word and bingo it worked fine. Maybe something to add in 0.0.8? Openoffice worked either way.

Keep up the good work. This is the best extension for wiki out there right now.

If you have larger text, don't forget to change server settings. E.g. for a 2000 page document produced with a low-end 2CPU sparc box I use this in php.ini: max_execution_time = 600 max_input_time = 600 memory_limit = 100M and this in http.conf: Timeout 600

Else you just get a blank page without any warning or error message - Daniel K. Schneider 11:00, 20 June 2008 (UTC)

Hacks to change PDF output (v. 0.6)

 * Images: If they don't fit your PDF page, you have to set pixel width of a virtual browser page (that's a "feature" of htmldoc). By default it is 680 pixels only and images larger than that will be rendered larger than your PDF page! Lots of my pictures are...
 * Titlepage: If you want a standardized titlepage before the TOC, create it in HTML and put it somewhere in your file system. I just put it in the images directory.

Then change PdfBook.php like this for example: $cmdext = " --browserwidth 1000 --titlefile $wgUploadDirectory/PDFBook.html"; $cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $cmdext $file"; Basically, I found it a good idea to read the htmldoc manual. In my Unix system it sits in /usr/local/share/doc/htmldoc/htmldoc.pdf. (see chapter 8). Made other changes too.

Now of course Nad may at some point add some more options, but changing a line in the php file does it too :) - Daniel (edutechwiki.unige.ch)

PdfBook Error Solution....for me at least
Nad,

I ended up creating an additional temp file which I had HTMLDoc redirect the output to. This was the only way I was able to have it not quit during the process PDF conversion process. I then open the file and read its data back into $content. After doing that I am able succsefully download the complete pdf file.

But I have another question for you.....I have seen a jspwiki which retrieves all the articles for a category and lists them on a page and uses a form to allow you to select which ones you want. It then retrieves the selected articles as one entire book. Is there a way to include a similar form in Mediawiki. Or do you know of a way to use an external html web page to retrieve/send commands like that to Mediawiki?

Thanks,

Dan --136.182.158.145 21:27, 7 September 2007 (UTC)


 * The PDf Book extension will allow exceptions so that not all items in the category are included. It would be possible to have it add items to the selection in the same way. A form could then be used to generate the list from which the book is made. I'll have a think about that though because it's an interesting point you make, that books could be generated from queries rather than just categories... --Nad 22:01, 7 September 2007 (UTC)

Just in case the anonymous above re-reads this: I had the same problem of PdfBook not generating any output, but the solution was simple: make sure that the upload directory (usually ./images) is writeable for the web server process. After I changed that, PdfBook worked okay. Cheers, Lexw 15:30, 5 October 2007 (UTC)

Missing Images in new version
I love this extension I think it is the best thing for wiki right now. However when I use the new pdfbook version 0.7 I am not getting any pictures. All I get is url links to the pictures. This is in the pdf format not the html format. Any Ideas? --Johnp125 20:29, 15 October 2007 (UTC)
 * Do you mean to say that your images were working on the previous version and have stopped working now? I had never had images working until I made some changes in the last version. Do you have a link to an example of a failing image export so I can check out what the problem may be? --Nad 19:32, 17 October 2007 (UTC)

--Johnp125 18:12, 19 October 2007 (UTC)

Sorry for the delayed post. Yes I had images working on the 0.6 version and then on the 0.7 version I am not getting any images in pdf format. I can go ahead and setup my test server real soon and make sure you and I can test both. I think I still have a copy of the 0.6 version I will try it again as well.

--Johnp125 18:23, 19 October 2007 (UTC)

Also I noticed the links are not working just right. For example if I have a document in Category: Testing and it pulls that document, and in that document it has another page that is in Category: Testing as well should the link not take me to the page in the pdf doc? Right now it is refering to the html link not the pdf link. I would think that it should realize that link was pulled by the category and then change the refrence to the pdf location.


 * I have a problem with the pictures, if the wiki needs a http authentication. It seems, that the pictures are iportet from the webserver and not from the file system. Does this the reason for the problems? Proofy 07:54, 29 November 2007 (UTC)

Missing Images and hangs with larger categories
Mistral 13:28, 17 October 2007 (UTC)

We installed on Linux with 128 mb of memory allocated to php. Using the template idea referred to by Organic Design we have tested this and observe the following.

-images are not uploaded. They are copied to the pdf as links to the wiki image -html and pdf output work fine on small categories ( < 10 entries) Output is ready in less than 2 seconds and it looks nice -however for pages with > 25 entries when you press submit to get pdf output the browser hangs and never completes the operation. You need to close the browser to terminate the operation.
 * It should work for large books, our test book on organicdesign is over 250 pages/800KB and only takes a second or two with 64MB allocated. Have you tried saving it as html only then manually running it through htmldoc to see if that's working ok? --Nad 19:41, 17 October 2007 (UTC)

I looked at your book link and the translation to pdf worked great on IE6 with Acrobat. However I do notice that there is not a single image in the book. Is it possible having 2 or 3 images per page in 25 - 30 pages is the problem?

I looked at the translation into html code to see why the images were not showing. I believe this can be fixed easily.

Here is the html output http://wiki.fomportal.comhttp://wiki.fomportal.com/images/9/94/BERalex_Full.jpg

here is what it should read src="http://wiki.fomportal.com/images/9/94/BERalex_Full.jpg" width="262" height="207" />

Do you see the duplication of the site address? ((http://wiki.fomportal.com)) Maybe this a configuration issue?? Mistral 18:03, 19 October 2007 (UTC)
 * I'll check it out soon, your research into the problem should make it a lot easier for me to fix ;-) --Nad 20:40, 19 October 2007 (UTC)
 * I found a bug which was trying to make URL's absolute which were already absolute, see if 0.0.8 works any better --Nad 00:18, 29 October 2007 (UTC)

Missing images due to apache .htaccess restriction
I've just encountered the problem that no images were displayed within the PDF - only their borders. In my case this was caused by .htaccess asking for a password in order to access the wiki folder.

The solution was to add " " and " " to the .htaccess file so htmldoc could access the images for embedding them into the PDF. --^Rooker 12:14, 05 December 2007 (UTC)
 * Be aware that the corresponding IP adress is not always 127.0.0.1. It didn't work for me. So I spend some ours on debugging until I took a look in the apache access.log where I saw that accesses by the local machine where not logged with 127.0.0.1 but with the real IP adress of our server. --Fydel 12:45, 9 January 2009 (UTC)

SubCategories
I made a structure using categories and subcategories. My goal is to make a complete Quality Manual using MediaWiki. Using PdfBook extension from a categorie page no sub categories are included in the PDF resulted.

Is there any manner to use pdfbook extension to make a book covering sub and subsubcategories?

Regards, Antonio Todo Bom --Todobom 22:50, 28 October 2007 (UTC)
 * Unfortunately not sorry, currently it can only work on a list, deeper levels are only done from heading levels not sub-categories. You may be able to use DPL to create reports of the sub-category and sub-sub-category content which could then be printed as a book. --Nad 00:10, 29 October 2007 (UTC)
 * I'm facing the same problem with the Quality Manual I'm working on. Please let me know if someone solve this problem and I'll do my best to find a solution to this myself.
 * /Jesper 85.89.79.106 12:43, 30 October 2007 (UTC)

Looks like a job for a recursive program call. When we installed this I thought I would be able to have one master category that contained all the other categories and then just go "Save as pdf" but it's not that easy yet. I hope you are able to add this functionality.

Mistral 16:30, 30 October 2007 (UTC)
 * It's not as simple as that - how do the categories and sub-categories names map to heading level? and then how do the headings and subheadings etc in the document map to pdf headings? --Nad 20:57, 30 October 2007 (UTC)
 * I understand the problem... Somehow the new Category should have it's own heading, and if that's the case, all other H1 would become H2 and so on... But, let's ignore that factor and say that you only wants to make a huge PDF Book with all categories, with the same heading levels used today, how to do that?... I tried to use GPL to make it print all articles in a couple of categories and then PDF the category that article was in, but it didn't work... //Jesper 85.89.79.106 08:46, 1 November 2007 (UTC)
 * I doubt I'll be adding the subcategory functionality for some time, if at all, I just have too much other stuff on. There's an example of using DPL to make books from at Creating a PDF book from a DPL query. --Nad 20:40, 1 November 2007 (UTC)
 * Have a look at Extension:Book. The issue is adressed there. It should also be possible to merge both approaches.--Sh4k 12:23, 29 May 2008 (UTC)

Hello there, Nad. Great work you have there. You mean by this last suggestion that we can draw a custom layout? It would be nice that a page describing the layout, like the following, could specify the heading layout: * Page for Section 1 :* Page for Section 1.1 * Section 2

Is it possible? Nuno Tavares 20:45, 12 January 2008 (UTC)
 * I also take the chance to ask you to look at the last line: Section 2. The page in fact is called Page for Section 2 but what is desired to be shown is Section 2, so I think that should also be the section name, when building the PDF. This is specially usefull if you are using namespaces. Nuno Tavares 21:15, 12 January 2008 (UTC)
 * OK, I found a way (a hack, actually) to allow this. In onUnknownAction, just use "$title->getText" instead of "basename($ttext)" Nuno Tavares 21:49, 12 January 2008 (UTC)

Subcategory article shows up in pdfbook export
When creating a subcategory (= assigning a category page to a category), that (sub)category's page also shows up in the export. Either I've overlooked these pages in previous exports, or this behavior was introduced with a more recent version of MediaWiki (I'm currently using v1.12.0, with pdfbook 0.0.9).

I was quite puzzled, so I thought I'd let someone know about this behavior. -- The rooker 09:51, 28 April 2008 (UTC)

Use CSS when exporting to PDF
Hi all. I want to know if there are some way to use CSS when I'm exporting my PDF:s?.. The thing is, I want to make id="toc" invisible instead of having another table of contents in my PDF Books. //Jesper 85.89.79.106 12:57, 31 October 2007 (UTC)
 * I've been looking round for PDF converters which can handle CSS but I can't find any. You'll have to add to remove the toc. --Nad 20:42, 31 October 2007 (UTC)
 * Hmm... But adding removes the table of contents of the page, and as the page is pretty long, I think the users need that one... It would be great if I could make the TOC disappear only in the PDF. //Jesper 85.89.79.106 06:35, 1 November 2007 (UTC)
 * I've been testing some now and by adding:

$ori_string = 'id="toc"'; $repl_string = 'id="toc" style="visibility: collapse;"'; $html = str_replace ($ori_string, $repl_string, $html);
 * After "# If format=html in query-string, return html content directly" the TOC disappears in the HTML file, but I can't get the same thing to work with the PDF. //Jesper 85.89.79.106 07:00, 1 November 2007 (UTC)
 * Good point, it's not useful to have TOC when it's a book which already has a TOC - I've updated it to add a before parsing each article --Nad 07:58, 1 November 2007 (UTC)
 * Ah, Thanks Nad! That was a fast reply and I really appreciate it! //Jesper 85.89.79.106 08:31, 1 November 2007 (UTC)

no index pages
--Johnp125 16:59, 8 November 2007 (UTC)

Is there anyway to run the query and not create any autogenerated index pages or put the index number in the text?

--Johnp125 18:26, 8 November 2007 (UTC)

ok just checked out the new html version .9. This does what I would like it to do. Images work and everything.

I was having problems with the images because we have a alias for the wiki /wiki/index.php when you run the pdfbook to pdf format I think it cannot find the /wiki/picture.jpg instead of /picture.jpg, anyway the new html version works just fine.

Header info
--Johnp125 18:31, 8 November 2007 (UTC)

I know this question is off on a limb but, is there anyway I could select certain Headline text from not being pulled based on the name like Image Header?

Missing end tag in 0.0.9 source code
Just for the record: it seems that the page at Organic Design which lists the v0.0.9 source code is missing a php end tag at the bottom of the file. Cheers, Lexw 09:23, 13 November 2007 (UTC)
 * End delimiters are removed to avoid whitespace being sent to the output - unfortunately I can't find the link to the official bug report about it. --Nad 19:59, 13 November 2007 (UTC)

Additional functionality in PdfBook
Hi Nad, I have added some additional functionality into PdfBook that you might be interested in for a next version. Seems that you have switched off email (which I can understand), so I couldn't contact you that way. Please contact me by email via 'E-mail this user' if you are interested.

Other users: please don't contact me. I might come back to this topic later, first I want to discuss things with Nad.

Regards, Lexw 13:39, 15 November 2007 (UTC)

Added recursive follow functionality
Hi Nad, I'm using your PdfBook Extension and I've added some functionality to recursively follow links to produce a PDF. With the parameter follow=deep or follow=broad the created PDF will contain all pages that are referenced from the current page, and recursively all further referenced pages, in a depth-first or breath-first manner. Here are the relevant code snippets:

if ($title->getNamespace == NS_CATEGORY) { $cat   = $title->getDBkey; $db    = &wfGetDB(DB_SLAVE); $cl    = $db->tableName('categorylinks'); $result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); }			else if (isset($_REQUEST['follow'])) { $deep = $_REQUEST['follow'] == 'deep'; wfDebug("PdfBook: following links - " . ($deep ? "depth first\n" : "breadth first\n")); $articles[] = $title; wfDebug("PdfBook: adding page '" . $title->getText . "'\n"); $this->getLinkedArticles($articles,$article,$opt,$deep); } else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); }

function getLinkedArticles(&$articles,$article,$opt,$deep) { global $wgParser; $text = $article->fetchContent; $text = $wgParser->preprocess($text,$article->getTitle,$opt); $linktitles = array; wfDebug("PdfBook: - processing article '" . $article->getTitle->getText . "' ($deep)\n"); if (preg_match_all('/\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) { foreach ($links[1] as $link) { $linktitles[] = Title::newFromText($link); wfDebug("PdfBook: found link '" . $link . "'\n"); }		}		wfDebug("PdfBook: processing " . count($linktitles) . " links...\n"); if ($deep) { foreach ($linktitles as $linktitle) { $exists = false; foreach ($articles as $el) { if ($el->getText == $linktitle->getText) $exists = true; }				if (!$exists) { wfDebug("PdfBook: adding '" . $linktitle->getPrefixedText . "'\n"); $articles[] = $linktitle; wfDebug("PdfBook: adding subpages\n"); $art = new Article($linktitle); $this->getLinkedArticles($articles,$art,$opt,$deep); wfDebug("- <\n"); }			}		} else { $newlinktitles = array; foreach ($linktitles as $linktitle) { $exists = false; foreach ($articles as $el) { if ($el->getText == $linktitle->getText) $exists = true; }				if (!$exists) { wfDebug("PdfBook: adding '" . $linktitle->getText . "'\n"); $articles[] = $linktitle; $newlinktitles[] = $linktitle; }			}			foreach ($newlinktitles as $linktitle) { wfDebug("PdfBook: adding subpages of '" . $linktitle->getText . "'\n"); $art = new Article($linktitle); $this->getLinkedArticles($articles,$art,$opt,$deep); }		}	}

I can also send you the complete file if you want. Tbleier 2008-01-25

Added dynamic title page
In order to have a proper title page on the generated PDF, I've added a few lines of code that read a plain HTML file and replace some placeholders with values like "Category name", etc... and then use that file with htmldoc's otherwise static  option.

Additionally, I've added 2 new variables: $wgPdfBookTitleFile and $wgPdfBookLogoImage so one can easily select a title page and logo image (to display at the bottom of a page).

I'll make a small package and put it on some webserver instead of posting the code here (too messy already). :)   The rooker 14:00, 20 February 2008 (UTC)
 * That is exactly what I have done and wanted to discuss with Nad (see above), but he doesn't seem to react. I've gone a little further and now create the titlefile dynamically from the PdfBook extension, so there is no more external HTML file necessary for generating the title page. A logo file was included in my implementation too (only I added it to the header, not the footer, but that's a matter of configuration which can be overruled in the general wiki LocalSettings.php).
 * Since this implementation is not part of the "official" PdfBook extension, I will have to find a place to store it, if anyone is interested. Rooker, have you already stored your solution somewhere? Lexw 09:27, 8 April 2008 (UTC)


 * @Lexw: I've provided a quickly cleaned version including my modifications. See the "README.txt" inside for details: PdfBook-0.0.9-DynamicTitle.tar.bz2 The rooker 10:57, 17 April 2008 (UTC)

PHP compilation error
Hello,

I'm trying to install version 0.0.9 on a Red Hat Entreprise Linux ES 4 on which a mediawiki 1.6.8 is running with php 4.3.9. php-book.php has been copied into the "extensions" directory, then include vi LocalSetting : require_once( "extensions/pdf-book.php" );

and we have this error : Parse error: parse error, unexpected T_OBJECT_OPERATOR in /var/wwwwikitn/html/mediawiki-1.6.8/extensions/pdf-book.php on line 66 which is $msg = $wgUser->getUserPage->getPrefixedText.' exported as a PDF book';

Any idea ? Thanks !


 * Just a wild guess: PHP5 needed? Lexw 07:49, 17 April 2008 (UTC)

Problem in pdfbook if only current page should be converted
I had the problem if I use

[ download as PDF]

no PDF was produced because temporary html file was empty.

I had to add the following line to the else block of if ($title->getNamespace == NS_CATEGORY) { $articles[] = $title; Now it works. The new code looks like this

if ($title->getNamespace == NS_CATEGORY) { $cat   = $title->getDBkey ; $db    = &wfGetDB(DB_SLAVE); $cl    = $db->tableName('categorylinks'); $result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); } 			else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); $articles[] = $title; } --Guenterg 11:31, 28 March 2008 (UTC)
 * I had the same problem, your fix works for me too. Thanks a lot, Guenter!
 * Now I still have to find a way how to avoid that the PdfBook template itself is included in the PDF document if I place that template on an article, not on a category. But that's a different matter... Lexw 09:16, 8 April 2008 (UTC)

-- This modification works pretty good. It produces a funny numbering scheme for the page index. RHEL 5 / PHP 5.1.6 / LAMP / MW 1.12

Whole Namespace Export
The tweak below will allow the extract of a whole NameSpace e.g. "Talk" through the additional action "nspdfbook" eg.

http://localhost/wiki/index.php?title=Talk:Main_Page&action=nspdfbook

Note:
 * You may have to up the "; Resource Limits ;" in your php.ini, if you use the mod to export all "Articles".
 * May wish to Alter the Order by, to sort on page name rather than id.

function onUnknownAction($action,$article) { global $wgOut,$wgUser,$wgTitle,$wgParser; global $wgServer,$wgArticlePath,$wgScriptPath,$wgUploadPath,$wgUploadDirectory,$wgScript;

if ($action == 'pdfbook' || $action == 'nspdfbook') {

# Log the export $msg = $wgUser->getUserPage->getPrefixedText.' exported as a PDF book'; $log = new LogPage('pdf',false); $log->addEntry('book',$wgTitle,$msg);

# Initialise PDF variables $layout = '--firstpage toc'; $left   = $this->setProperty('LeftMargin',  '1cm'); $right  = $this->setProperty('RightMargin', '1cm'); $top    = $this->setProperty('TopMargin',   '1cm'); $bottom = $this->setProperty('BottomMargin','1cm'); $font   = $this->setProperty('Font',   'Arial'); $size   = $this->setProperty('FontSize',    '8'); $link   = $this->setProperty('LinkColour',  '217A28'); $levels = $this->setProperty('TocLevels',   '2'); $exclude = $this->setProperty('Exclude',    array); if (!is_array($exclude)) $exclude = split('\\s*,\\s*',$exclude);

# Select articles from members if a category or links in content if not $articles = array; $title   = $article->getTitle; $opt     = ParserOptions::newFromUser($wgUser); if ($title->getNamespace == NS_CATEGORY) { $cat   = $title->getDBkey; $db    = &wfGetDB(DB_SLAVE); $cl    = $db->tableName('categorylinks'); $result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); $book = $title->getText; }                       else {  if ($action == 'nspdfbook') { $db    = &wfGetDB(DB_SLAVE); $pl    = $db->tableName('page'); $ns    = $title->getNamespace; $result = $db->query("SELECT page_id FROM $pl WHERE page_namespace = $ns ORDER BY page_id"); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); $book = "PDFBook_Namespace_Export-".Namespace::getCanonicalName($ns); }                               else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); $book        = $title->getText; }                            }

# Format the article's as a single HTML document with absolute URL's                       $html     = '';

--Andy 13:55, 11 April 2008 (UTC)

SpecialVersion Issue and PHP 5.1.4
After installing the PdfBook extension and displaying the version page I get:

Notice: Object class PdfBook could not be converted to int in ....\SpecialVersion.php on line 275.

The line in Specialversion.php is "sort ($list);"

I found a general discussion at http://www.webmasterworld.com/php/3586902.htm that talk about 5.1.4 vs 5.2.4. Any thoughts on how to make PdfBook work in php 5.1.4?

I am using Mediawiki 1.12.0 and php 5.1.4.

Link in to Hierarchy
Any ideas on how best to link into the Hierarchy extension? I think this would be very useful because the hierarchy is setup perfect for printing a book. I haven't quite figured out how to set this up though. You would have to use the extensions "hierarchy" table to pull information about where you are on the hierarchy, and what subordinate pages you would have to print. I think it would be nice to print where you are down, like you are on chapter 1, so it only prints chapter 1, but if you are at the title page it will print the whole book. It might also be nice to be able to setup a list of pages and then print that list in order. I am going to do what I can, but I am pretty new to PHP, and any advice is welcome.

--Greg 16:04, 1 May 2008 (UTC)

Exclusions
It would also be nice to have exclusion meta tags where you can specify what parts are included in the book and what parts are not (so if you have a header/footer you don't have to include that in the book)

--Greg 16:07, 1 May 2008 (UTC)

I have also run into this problem, wanting to include only one section of a page using the PageName markup to get just that section as part of the composite print. This would be a great feature.

--Abby621 14:04, 4 June 2008 (UTC)

latest version gives syntax error
Parse error: syntax error, unexpected '}' in Pdf_Book.php on line 49

Is this expected?

FWIW, I'm using Ubuntu Hardy Heron with PHP 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2

Swaroopch 21:20, 21 June 2008 (UTC)
 * Sorry about that, fixed --Nad 21:59, 21 June 2008 (UTC)

"??????" instead of russian letters
We have all "?" sings instead of russian letters. Encoding in browser is UTF-8.

How would you modify the script to include the last date and time edited for each article?
I'm not a PHP wiz and am wondering what would be involved to output the last edit date/time for each article? Preferably, I would like to see this info directly under the article title. Any help would be excellent. Great extension! --Paul

No images

 * I still can't get images in. The image is in the PDF file, and links back to my wiki image, but the picture simply doesn't appear. Help?
 * Also, title page is empty. How to fill it?

Here is my Template: Template:Pdf_book [ Create a PDF Book]

Updated bibtex_fields.php
Here is an updated bibtex_fields.php with complete Bibtex Entries and Fields.

bibtex_fields.php

Bibtex Required/Optional - for your wiki

 * Latex defines three types of fields:
 * Required - always displayed
 * Optional - usually not used
 * Ignored - never used, can be arbitrary

@article{citation_key, author = {}, title = {}, journal = {}, year = {}, volume = {}, number = {}, pages = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@book{citation_key, author = {}, editor = {}, % author OR editor required title = {}, publisher = {}, year = {}, volume = {}, number =	{}, % volume OR number series = {}, address = {}, edition =	{}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@conference{citation_key, author = {}, title = {}, booktitle = {}, year = {}, editor = {}, pages = {}, organization = {}, publisher = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@inbook{citation_key, author = {}, editor = {}, % author OR editor title = {}, chapter = {}, pages = {}, % chapter AND/OR pages publisher = {}, year = {}, volume = {}, number = {}, % volume OR number series = {}, type = {}, address = {}, edition = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@incollection{citation_key, author = {}, title = {}, booktitle = {}, % booktitle should be exactly the same as title? Not sure. publisher = {}, year = {}, editor = {}, volume = {}, number = {}, % volume OR number series = {}, type = {}, chapter = {}, pages = {}, address = {}, edition = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@inproceedings{citation_key, author = {}, title = {}, booktitle = {}, % booktitle should be exactly the same as title? Some kind of bug? Not sure. year = {}, editor = {}, volume = {}, number = {}, % volume OR number series = {}, pages = {}, address = {}, month = {}, organization = {}, publisher = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@manual{citation_key, title = {}, author = {}, organization = {}, address = {}, edition = {}, month = {}, year = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@mastersthesis{citation_key, author = {}, title = {}, school = {}, year = {}, type = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@misc{citation_key, author = {}, title = {}, howpublished = {}, month = {}, year = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@phdthesis{citation_key, author = {}, title = {}, school = {}, year = {}, type = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@proceedings{citation_key, title = {}, year = {}, editor = {}, volume = {}, number = {}, % volume OR number series = {}, address = {}, month = {}, organization = {}, publisher = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@techreport{citation_key, author = {}, title = {}, institution = {}, year = {}, type = {}, number = {}, address = {}, month = {}, note = {}, key = {}, url = {}, keywords = {}, abstract = {} }

@unpublished{citation_key, author = {}, title = {}, note = {}, month = {}, year = {}, key = {}, url = {}, keywords = {}, '''abstract = {} }

Types of Bibtex entries = for your wiki

 * There are 14 available entry types.

Bibtex - standard fields - for your wiki

 * The available fields depend on which entry type is being used. Each entry type has required and optional arguments.

Bibtex Nonstandard / Optional Fields - for your wiki

 * The available fields depend on which entry type is being used. Each entry type has required and optional arguments.

Get rid of temporary files
using proc_open (read and write pipes connected to the htmldoc process) you can get rid of temporary files. This also fixes a variable conflict ($link): jhoetzel  'Pdf Book',	'author'      => 'User:Nad',	'description' => 'Composes a book from articles in a category and exports as a PDF book',	'url'	      => 'http://www.mediawiki.org/wiki/Extension:Pdf_Book',	'version'     => PDFBOOK_VERSION	); class PdfBook { # Constructor function PdfBook { global $wgHooks,$wgParser,$wgPdfBookMagic; $wgParser->setFunctionHook($wgPdfBookMagic,array($this,'magicBook')); $wgHooks['UnknownAction'][] = $this; # Add a new pdf log type global $wgLogTypes,$wgLogNames,$wgLogHeaders,$wgLogActions; $wgLogTypes[]            = 'pdf'; $wgLogNames ['pdf']      = 'pdflogpage'; $wgLogHeaders['pdf']     = 'pdflogpagetext'; $wgLogActions['pdf/book'] = 'pdflogentry'; }	# Expand the book-magic function magicBook(&$parser) { # Populate $argv with both named and numeric parameters $argv = array; foreach (func_get_args as $arg) if (!is_object($arg)) { if (preg_match('/^(.+?)\\s*=\\s*(.+)$/',$arg,$match)) $argv[$match[1]] = $match[2]; else $argv[] = $arg; }		return $text; }	function onUnknownAction($action,$article) { global $wgOut,$wgUser,$wgTitle,$wgParser; global $wgServer,$wgArticlePath,$wgScriptPath,$wgUploadPath,$wgUploadDirectory,$wgScript; if ($action == 'pdfbook') {
 * 1) Extension:PdfBook
 * 2) - Licenced under LGPL (http://www.gnu.org/copyleft/lesser.html)
 * 3) - Author: User:Nad
 * 4) - Started: 2007-08-08

# Log the export $msg = $wgUser->getUserPage->getPrefixedText.' exported as a PDF book'; $log = new LogPage('pdf',false); $log->addEntry('book',$wgTitle,$msg); # Initialise PDF variables $layout = '--firstpage toc'; $left   = $this->setProperty('LeftMargin',  '1cm'); $right  = $this->setProperty('RightMargin', '1cm'); $top    = $this->setProperty('TopMargin',   '1cm'); $bottom = $this->setProperty('BottomMargin','1cm'); $font   = $this->setProperty('Font',	'Arial'); $size   = $this->setProperty('FontSize',    '8'); $linkc  = $this->setProperty('LinkColour',  '217A28'); $levels = $this->setProperty('TocLevels',   '2'); $exclude = $this->setProperty('Exclude',    array); if (!is_array($exclude)) $exclude = split('\\s*,\\s*',$exclude); # Select articles from members if a category or links in content if not $articles = array; $title   = $article->getTitle; $opt     = ParserOptions::newFromUser($wgUser); if ($title->getNamespace == NS_CATEGORY) { $db    = &wfGetDB(DB_SLAVE); $cat   = $db->addQuotes($title->getDBkey); $result = $db->select(					'categorylinks',					'cl_from',					"cl_to = $cat",					'PdfBook',					array('ORDER BY' => 'cl_sortkey')				); if ($result instanceof ResultWrapper) $result = $result->result; while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]); }			else { $text = $article->fetchContent; $text = $wgParser->preprocess($text,$title,$opt); if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) foreach ($links[1] as $link) $articles[] = Title::newFromText($link); }			# Format the article's as a single HTML document with absolute URL's			$book	 = $title->getText; $html	 = ''; $wgArticlePath = $wgServer.$wgArticlePath; $wgScriptPath = $wgServer.$wgScriptPath; $wgUploadPath = $wgServer.$wgUploadPath; $wgScript     = $wgServer.$wgScript; foreach ($articles as $title) { $ttext = $title->getPrefixedText; if (!in_array($ttext,$exclude)) { $article = new Article($title); $text   = $article->fetchContent; $text   = preg_replace('//s','@@'.'@@$1@@'.'@@',$text); # preserve HTML comments $text  .= ''; $opt->setEditSection(false);   # remove section-edit links $wgOut->setHTMLTitle($ttext);  # use this so DISPLAYTITLE magic works $out    = $wgParser->parse($text,$title,$opt,true,true); $ttext  = $wgOut->getHTMLTitle; $text   = $out->getText; $text   = preg_replace('|(]+?src=")(/.+?>)|',"$1$wgServer$2",$text);					$text    = preg_replace('|@{4}([^@]+?)@{4}|s','',$text); # HTML comments hack					$text    = preg_replace('|

I checked the variables and found that all of them are blank except: wgServer wgArticlePath wgScriptPath

I could not find the others in the entire $GLOBALS variable...

I'm not SUPER familiar with MediaWiki's structure and backend, but I would imagine that many of those (especially $wgUser) should be set.

Any ideas? --Greg 18:23, 22 October 2008 (UTC)

hiding numbering on headings and article title
great extension! is it possible to hide the numbered headings when printing as a book? i noticed that the extension disregards the user preference and __NONUMBEREDHEADINGS__. we are trying to PDF print a "book" of data entry forms and the heading numbers are not required. thanks --Erikvw 06:23, 18 November 2008 (UTC)

What i have done for now is to remove --numbered from the line

$cmd .= "$toc --format pdf14 --numbered $layout $width";

which seems to work fine.

revision id does not appear on pdf
we are tracking revision information for the printed document using - When printing the PDF, the REVISIONTIMESTAMP prints but REVISIONID does not. I noticed the same for Pdf_Export. Any ideas? thanks Erikvw 04:24, 19 November 2008 (UTC)

some special characters and german umlauts result in empty pdf files
When we try to receive categories with umlauts (e.g. "Übersicht") or special characters like "-" in the category name the generated pdf file is empty. Everything else runs real fine and smooth. Great extension. Any workaround or help regarding this problem would be appreciated. --Fydel 12:29, 15 December 2008 (UTC)
 * I found a simple workaround for that issue. I changed the line where htmldoc is called

escapeshellcmd($cmd); to passthru(escapeshellcmd($cmd));
 * --Fydel 09:10, 9 January 2009 (UTC)

Page Limit in PdfBook
How many pages can be fetched using Extension:PdfBook??? Is there any limit for that??