Extension talk:PdfBook/Archive

Mediawiki 1.11.0
Version 0.0.3 didn't work anymore after an upgrade. I made a little fix to PdfBook.php around line 98 of PdfBook.php and it works again.

// while ($row = mysql_fetch_row($result)) { while ($row = $db->fetchRow($result)) {

Disclaimer. I don't know PHP for real, don't know mediawiki, don't know how to program. Just got it by inserting debug statements into PdfBook.php. Looks like mysql_fetch is censored somewhere now ;)

PS: To insert debug statements: $wgDebugLogFile = "/tmp/debug.log"; // file should be writable can be anywhere. wfDebug (.....);
 * In LocalSettings.php insert:
 * Anywhere in the code, insert

- Daniel (edutechwiki.unige.ch)
 * Thanks a lot for this, it's still not working for me in 1.11 (I've only just done my 1.11 upgrade), but I've made some changes based on your findings which have got it partially there ;-) --Nad 21:36, 21 September 2007 (UTC)
 * It seems that 1.11 is a bit more memory hungry and my large test books were killing it, after giving PHP 64MB it's working fine now! --Nad 21:41, 21 September 2007 (UTC)

Empty file downloaded
Greetings Nad,

I have been trying to use your PDFBook Mediawiki extension since it may be a great solution to an issue I have.

I have installed HTMLDoc under "c:\pogram files" and can use it on its own to create PDF Books. I have also included the "PdfBook.php" in my "Local Settings.php" file.

The issue I am having is that when I select the link to export my category as a book and select to save or open the pdf file it has 0 bytes. So, the file is created with the correct name but with no data.

Is there something else I must do to ensure HTMLDoc.exe is actually being called by your extension? Is there a required directory that it needs to be in?

Any help would be appreciated!

Thanks!
 * You have to make sure that htmldoc is in your executable PATH so that it can execute from just typing "htmldoc" without needing to supply the full pathname no matter what current directory you're in. Another thing to check would be to comment out the "@unlink($file)" line and after saving a pdf, check if it's left a tmp file in the root of your images directory, which is the data sent to htmldoc. --Nad 00:35, 6 September 2007 (UTC)


 * I'm experiencing the exact same problem, my files turns up empty. I run the server on a windows machine using Apache. I've installed HTMLDoc and I'm able to create PDF-files using the GUI. If I comment out "@unlink($file)" and then generates the tmp-file through the GUI I'll get my pdf, but all files I download are 0 byte in size... What can be wrong? /Jesper 15:59, 23 October 2007 (UTC)
 * With some hacking of Pdf_Book.php I'm now able to create PDF:s, but only from categories, not from a single page. By commenting out "putenv("HTMLDOC_NOCGI=1");" on line 152 it now generates Category PDF:s. /Jesper 08:09, 25 October 2007 (UTC)

Invalid PDF File
Nad,

Thanks for your quick response!

However, I am still having issues. The File is being created and has size to it....but Adobe Reader gives me the following error."

"There is an error opening this document. This file is damaged and cannot be repaired".

HTMLDoc seems to be quitting during the conversion job.

If I add the ".html" extension to the temp file and run HTMLDoc from the command line I can convert the temp html file manully over to a PDF file.

I then compare in Notepad the one I generated and the one your script creates and notice the PDF your script creates quites after pocessing a certain amount of lines.

I have your PDFExport Extension working just fine...so I was wondering what else it could be.

Any ideas?

Thanks!
 * How long is it taking to generate the PDF before quitting? 30 seconds? if that long it could be reaching max execution time? and how large is the PDF before it bails? --Nad 20:29, 6 September 2007 (UTC)

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

Like I stated before, I'm using your PDFExport Extension and it works great.

Let me know what you think      --136.182.158.153 21:29, 6 September 2007 (UTC)
 * When you run htmldoc manually passing the generated tmp file to it, are you using the exact same command and parameters that the extension uses? --Nad 21:51, 6 September 2007 (UTC)

continued...

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

If I change this line;

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file";

to

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file > test.pdf";

Then I get a test.pdf in my mediawiki root folder which works perfectly


 * You could try changing the htmldoc command to use passthru like Extension:Pdf Export - I had it like that on mine but had problems with the gzip encoding, but it may work better like that for you --Nad 21:55, 6 September 2007 (UTC)

images
Is there any possibility of getting images displayed in the pdf Book as well?. would be a fantastic improvement. Any workarounds? Martin
 * I'm working on it, I just can't get them to work currently. I'm checking out some of the solutions at Extension talk:Pdf Export too as that one uses htmldoc as well. --Nad 12:39, 12 September 2007 (UTC)

A hack
In file PdfBook.php around line 118 (I may have inserted other stuff) just before "#write the HTML to a tmp file" insert this:

$ori_string = 'src="'; $repl_string = 'src="'. $wgServer; $html = str_replace ($ori_string, $repl_string, $html);

The problem is that the intermediary output file got stuff like this: src="/mediawiki/images/thumb/pict.png but you want: httpee://your.server.org/mediawiki/images/thumb/pict.png
 * 1) Write the HTML to a tmp file

This is not the best solution, a regexp hacker should actually rip away most of the html picture markup and then replace the thumb by the original pic maybe. But above is at least a minimal job. To see the intermediary file as someone said, comment the unlink at the end and the get it from the images file. //@unlink($file);

Sorry, I'm not a real programmer and have too much workload to help for real. Just wanted to produce some handouts ;) - Daniel

Same problem as section 2
I'm on Ubuntu Linux with Mediawiki 1.10. Htmldoc is in /usr/bin. I commented out the unlink command, and the temp file is empty (0 length).

I checked to be sure that my Apache user can run htmldoc -- it can. Unsure what I should try next.

By the way, your single-page export plugin works perfectly (even for images). So I know that htmldoc is not at fault here.
 * I didn't write the single page one, but the code seems pretty similar. I'll just have to see what differences there is in the code between this one and the single-page one. --Nad 22:28, 14 September 2007 (UTC)

Upload filetype
What happens when pdf is not a valid file type when uploading? Does the wiki control this with this extension, if so do I need to add pdf file types to the type of files you can upload?
 * The upload filetype is unrelated to this since exported pdf's are downloaded not uploaded. If you want to add pdf to your allowed upload filetypes, use $wgFileExtensions[] = 'pdf', you may also want to set $wgVerifyMimeType to false if it's giving you hassles when you try and upload exotic types of file. --Nad 04:11, 21 September 2007 (UTC)

More empty file downloads
--Johnp125 02:12, 25 September 2007 (UTC)

Sorry to be such a pain. I have setup a test wiki which is running fedora --Johnp125 00:23, 27 September 2007 (UTC)c 4. Please check out my test wiki and see if you can give me some direction. I have debug for the wiki in localsettings.php on. If you need admin access please email me at johnp125@yahoo.com and I'll hook you up.

http://wikitest.homelinux.net/wiki2/index.php/Main_Page
 * The output shows a bug due to 1.11 being more strict about hook return values. Try again now with the latest version, 0.0.4. Also note that even if it works, you will get just an empty document since the point of this extension is to compose a book from the content of a category, if it not placed in a category or the category contains no members then the result will be empty. To export the content of a single page you should be using Extension:Pdf Export. --Nad 03:33, 25 September 2007 (UTC)
 * However, I'm working on version 0.5 now which can be used in non-category pages and will compose the book from the article links found in the page, so that books can then be composed from explicit lists or DPL queries. --Nad 03:33, 25 September 2007 (UTC)

--Johnp125 13:28, 25 September 2007 (UTC)

Hey that sound great I'd love to help you with it.

You mentioned single page. I had 2 types of pdf downloads there.

http://wikitest.homelinux.net/wiki2/index.php?title=Category:test&action=pdfbook this one should be going after the demo page with the catageory:test and then creating a pdf book from that. Is this not the right way to use the code? I know if I created more pagese and put the catageory:test under them they would get put into the pdf file as well.
 * You had a typo in the word "category", link is working now ;-) --Nad 22:21, 25 September 2007 (UTC)

--Johnp125 17:30, 26 September 2007 (UTC)

Thanks a bunch. Your the greatest. Glad to have this working now.

Checked out your info about Images not showing in mediawiki 1.10.2---1.11. Nice work.
 * I just did another update yesterday which has images working now --Nad 21:06, 26 September 2007 (UTC)

--Johnp125 00:16, 27 September 2007 (UTC)

Is this the update that is going to work with DPL queries? I started to play around with that extension. I know it's working but right now it's too big to try and figure out.

--Johnp125 00:23, 27 September 2007 (UTC)

Hey by the way could you tell me how to make the pdfbook extension just make a big html file, so I could open it in word or openoffice in html format and let the office program convert it from the html file? Or is it easier to say and harder to do?
 * That feature is very easy to add because it simply requires not sending the file to HTMLDOC, I've added an option in a new version (0.0.7) which allows you to do this by adding format=html to the query-string. --Nad 02:06, 27 September 2007 (UTC)

--Johnp125 22:04, 30 September 2007 (UTC)

Wow that sounds great can't wait to try out the html export. I looked for the 0.0.7 version but only saw the 0.0.6 version when I went to the download section. Also could you give me a example of how the format=html is used.

http://www.foo.bar/wiki/index.php?title=Catgeory:Foo&action=pdfbook

Where would it go in this string?
 * Sorry about that I must have forgotten to update it, it's at 0.0.7 now. To change the URL above to produce html, append &format=html to it. We use a template which has a link for both, see OrganicDesign:Template:Book. --Nad 07:11, 1 October 2007 (UTC)

--Johnp125 01:55, 2 October 2007 (UTC)

The html export looks really good. I Did notice on small html files Microsoft word gets confused about it. Maybe if you put the html header info at the top and bottom of page to help microsoft word out. Openoffice did not seem to have a problem with it. However word is looking for the html tags on small exports. If it's a big export it gets the idea.

--Johnp125 02:08, 2 October 2007 (UTC)

Just tested it again with a small html download. Word tried to format it when opening. Then I added the to the beginning and then added the at the end. Then reopened the file with word and bingo it worked fine. Maybe something to add in 0.0.8? Openoffice worked either way.

Keep up the good work. This is the best extension for wiki out there right now.

Hacks to change PDF output (v. 0.6)

 * Images: If they don't fit your PDF page, you have to set pixel width of a virtual browser page (that's a "feature" of htmldoc). By default it is 680 pixels only and images larger than that will be rendered larger than your PDF page! Lots of my pictures are...
 * Titlepage: If you want a standardized titlepage before the TOC, create it in HTML and put it somewhere in your file system. I just put it in the images directory.

Then change PdfBook.php like this for example: $cmdext = " --browserwidth 1000 --titlefile $wgUploadDirectory/PDFBook.html"; $cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $cmdext $file"; Basically, I found it a good idea to read the htmldoc manual. In my Unix system it sits in /usr/local/share/doc/htmldoc/htmldoc.pdf. (see chapter 8). Made other changes too.

Now of course Nad may at some point add some more options, but changing a line in the php file does it too :) - Daniel (edutechwiki.unige.ch)

PdfBook Error Solution....for me at least
Nad,

I ended up creating an additional temp file which I had HTMLDoc redirect the output to. This was the only way I was able to have it not quit during the process PDF conversion process. I then open the file and read its data back into $content. After doing that I am able succsefully download the complete pdf file.

But I have another question for you.....I have seen a jspwiki which retrieves all the articles for a category and lists them on a page and uses a form to allow you to select which ones you want. It then retrieves the selected articles as one entire book. Is there a way to include a similar form in Mediawiki. Or do you know of a way to use an external html web page to retrieve/send commands like that to Mediawiki?

Thanks,

Dan --136.182.158.145 21:27, 7 September 2007 (UTC)


 * The PDf Book extension will allow exceptions so that not all items in the category are included. It would be possible to have it add items to the selection in the same way. A form could then be used to generate the list from which the book is made. I'll have a think about that though because it's an interesting point you make, that books could be generated from queries rather than just categories... --Nad 22:01, 7 September 2007 (UTC)

Just in case the anonymous above re-reads this: I had the same problem of PdfBook not generating any output, but the solution was simple: make sure that the upload directory (usually ./images) is writeable for the web server process. After I changed that, PdfBook worked okay. Cheers, Lexw 15:30, 5 October 2007 (UTC)

Missing Images in new version
I love this extension I think it is the best thing for wiki right now. However when I use the new pdfbook version 0.7 I am not getting any pictures. All I get is url links to the pictures. This is in the pdf format not the html format. Any Ideas? --Johnp125 20:29, 15 October 2007 (UTC)
 * Do you mean to say that your images were working on the previous version and have stopped working now? I had never had images working until I made some changes in the last version. Do you have a link to an example of a failing image export so I can check out what the problem may be? --Nad 19:32, 17 October 2007 (UTC)

--Johnp125 18:12, 19 October 2007 (UTC)

Sorry for the delayed post. Yes I had images working on the 0.6 version and then on the 0.7 version I am not getting any images in pdf format. I can go ahead and setup my test server real soon and make sure you and I can test both. I think I still have a copy of the 0.6 version I will try it again as well.

--Johnp125 18:23, 19 October 2007 (UTC)

Also I noticed the links are not working just right. For example if I have a document in Category: Testing and it pulls that document, and in that document it has another page that is in Category: Testing as well should the link not take me to the page in the pdf doc? Right now it is refering to the html link not the pdf link. I would think that it should realize that link was pulled by the category and then change the refrence to the pdf location.

Missing Images and hangs with larger categories
Mistral 13:28, 17 October 2007 (UTC)

We installed on Linux with 128 mb of memory allocated to php. Using the template idea referred to by Organic Design we have tested this and observe the following.

-images are not uploaded. They are copied to the pdf as links to the wiki image -html and pdf output work fine on small categories ( < 10 entries) Output is ready in less than 2 seconds and it looks nice -however for pages with > 25 entries when you press submit to get pdf output the browser hangs and never completes the operation. You need to close the browser to terminate the operation.
 * It should work for large books, our test book on organicdesign is over 250 pages/800KB and only takes a second or two with 64MB allocated. Have you tried saving it as html only then manually running it through htmldoc to see if that's working ok? --Nad 19:41, 17 October 2007 (UTC)

I looked at your book link and the translation to pdf worked great on IE6 with Acrobat. However I do notice that there is not a single image in the book. Is it possible having 2 or 3 images per page in 25 - 30 pages is the problem?

I looked at the translation into html code to see why the images were not showing. I believe this can be fixed easily.

Here is the html output http://wiki.fomportal.comhttp://wiki.fomportal.com/images/9/94/BERalex_Full.jpg

here is what it should read src="http://wiki.fomportal.com/images/9/94/BERalex_Full.jpg" width="262" height="207" />

Do you see the duplication of the site address? ((http://wiki.fomportal.com)) Maybe this a configuration issue?? Mistral 18:03, 19 October 2007 (UTC)
 * I'll check it out soon, your research into the problem should make it a lot easier for me to fix ;-) --Nad 20:40, 19 October 2007 (UTC)
 * I found a bug which was trying to make URL's absolute which were already absolute, see if 0.0.8 works any better --Nad 00:18, 29 October 2007 (UTC)

SubCategories
I made a structure using categories and subcategories. My goal is to make a complete Quality Manual using MediaWiki. Using PdfBook extension from a categorie page no sub categories are included in the PDF resulted.

Is there any manner to use pdfbook extension to make a book covering sub and subsubcategories?

Regards, Antonio Todo Bom --Todobom 22:50, 28 October 2007 (UTC)
 * Unfortunately not sorry, currently it can only work on a list, deeper levels are only done from heading levels not sub-categories. You may be able to use DPL to create reports of the sub-category and sub-sub-category content which could then be printed as a book. --Nad 00:10, 29 October 2007 (UTC)
 * I'm facing the same problem with the Quality Manual I'm working on. Please let me know if someone solve this problem and I'll do my best to find a solution to this myself.
 * /Jesper 85.89.79.106 12:43, 30 October 2007 (UTC)

Looks like a job for a recursive program call. When we installed this I thought I would be able to have one master category that contained all the other categories and then just go "Save as pdf" but it's not that easy yet. I hope you are able to add this functionality.

Mistral 16:30, 30 October 2007 (UTC)
 * It's not as simple as that - how do the categories and sub-categories names map to heading level? and then how do the headings and subheadings etc in the document map to pdf headings? --Nad 20:57, 30 October 2007 (UTC)
 * I understand the problem... Somehow the new Category should have it's own heading, and if that's the case, all other H1 would become H2 and so on... But, let's ignore that factor and say that you only wants to make a huge PDF Book with all categories, with the same heading levels used today, how to do that?... I tried to use GPL to make it print all articles in a couple of categories and then PDF the category that article was in, but it didn't work... //Jesper 85.89.79.106 08:46, 1 November 2007 (UTC)

Use CSS when exporting to PDF
Hi all. I want to know if there are some way to use CSS when I'm exporting my PDF:s?.. The thing is, I want to make id="toc" invisible instead of having another table of contents in my PDF Books. //Jesper 85.89.79.106 12:57, 31 October 2007 (UTC)
 * I've been looking round for PDF converters which can handle CSS but I can't find any. You'll have to add to remove the toc. --Nad 20:42, 31 October 2007 (UTC)
 * Hmm... But adding removes the table of contents of the page, and as the page is pretty long, I think the users need that one... It would be great if I could make the TOC disappear only in the PDF. //Jesper 85.89.79.106 06:35, 1 November 2007 (UTC)
 * I've been testing some now and by adding:

$ori_string = 'id="toc"'; $repl_string = 'id="toc" style="visibility: collapse;"'; $html = str_replace ($ori_string, $repl_string, $html);
 * After "# If format=html in query-string, return html content directly" the TOC disappears in the HTML file, but I can't get the same thing to work with the PDF. //Jesper 85.89.79.106 07:00, 1 November 2007 (UTC)
 * Good point, it's not useful to have TOC when it's a book which already has a TOC - I've updated it to add a before parsing each article --Nad 07:58, 1 November 2007 (UTC)
 * Ah, Thanks Nad! That was a fast reply and I really appreciate it! //Jesper 85.89.79.106 08:31, 1 November 2007 (UTC)