Extension talk:PdfBook/Archive

Mediawiki 1.11.0
Version 0.0.3 didn't work anymore after an upgrade. I made a little fix to PdfBook.php around line 98 of PdfBook.php and it works again.

// while ($row = mysql_fetch_row($result)) { while ($row = $db->fetchRow($result)) {

Disclaimer. I don't know PHP for real, don't know mediawiki, don't know how to program. Just got it by inserting debug statements into PdfBook.php. Looks like mysql_fetch is censored somewhere now ;)

PS: To insert debug statements: $wgDebugLogFile = "/tmp/debug.log"; // file should be writable can be anywhere. wfDebug (.....);
 * In LocalSettings.php insert:
 * Anywhere in the code, insert

- Daniel (edutechwiki.unige.ch)
 * Thanks a lot for this, it's still not working for me in 1.11 (I've only just done my 1.11 upgrade), but I've made some changes based on your findings which have got it partially there ;-) --Nad 21:36, 21 September 2007 (UTC)
 * It seems that 1.11 is a bit more memory hungry and my large test books were killing it, after giving PHP 64MB it's working fine now! --Nad 21:41, 21 September 2007 (UTC)

Empty file downloaded
Greetings Nad,

I have been trying to use your PDFBook Mediawiki extension since it may be a great solution to an issue I have.

I have installed HTMLDoc under "c:\pogram files" and can use it on its own to create PDF Books. I have also included the "PdfBook.php" in my "Local Settings.php" file.

The issue I am having is that when I select the link to export my category as a book and select to save or open the pdf file it has 0 bytes. So, the file is created with the correct name but with no data.

Is there something else I must do to ensure HTMLDoc.exe is actually being called by your extension? Is there a required directory that it needs to be in?

Any help would be appreciated!

Thanks!
 * You have to make sure that htmldoc is in your executable PATH so that it can execute from just typing "htmldoc" without needing to supply the full pathname no matter what current directory you're in. Another thing to check would be to comment out the "@unlink($file)" line and after saving a pdf, check if it's left a tmp file in the root of your images directory, which is the data sent to htmldoc. --Nad 00:35, 6 September 2007 (UTC)

Invalid PDF File
Nad,

Thanks for your quick response!

However, I am still having issues. The File is being created and has size to it....but Adobe Reader gives me the following error."

"There is an error opening this document. This file is damaged and cannot be repaired".

HTMLDoc seems to be quitting during the conversion job.

If I add the ".html" extension to the temp file and run HTMLDoc from the command line I can convert the temp html file manully over to a PDF file.

I then compare in Notepad the one I generated and the one your script creates and notice the PDF your script creates quites after pocessing a certain amount of lines.

I have your PDFExport Extension working just fine...so I was wondering what else it could be.

Any ideas?

Thanks!
 * How long is it taking to generate the PDF before quitting? 30 seconds? if that long it could be reaching max execution time? and how large is the PDF before it bails? --Nad 20:29, 6 September 2007 (UTC)

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

Like I stated before, I'm using your PDFExport Extension and it works great.

Let me know what you think      --136.182.158.153 21:29, 6 September 2007 (UTC)
 * When you run htmldoc manually passing the generated tmp file to it, are you using the exact same command and parameters that the extension uses? --Nad 21:51, 6 September 2007 (UTC)

continued...

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

If I change this line;

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file";

to

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file > test.pdf";

Then I get a test.pdf in my mediawiki root folder which works perfectly


 * You could try changing the htmldoc command to use passthru like Extension:Pdf Export - I had it like that on mine but had problems with the gzip encoding, but it may work better like that for you --Nad 21:55, 6 September 2007 (UTC)

images
Is there any possibility of getting images displayed in the pdf Book as well?. would be a fantastic improvement. Any workarounds? Martin
 * I'm working on it, I just can't get them to work currently. I'm checking out some of the solutions at Extension talk:Pdf Export too as that one uses htmldoc as well. --Nad 12:39, 12 September 2007 (UTC)

A hack
In file PdfBook.php around line 118 (I may have inserted other stuff) just before "#write the HTML to a tmp file" insert this:

$ori_string = 'src="'; $repl_string = 'src="'. $wgServer; $html = str_replace ($ori_string, $repl_string, $html);

The problem is that the intermediary output file got stuff like this: src="/mediawiki/images/thumb/pict.png but you want: httpee://your.server.org/mediawiki/images/thumb/pict.png
 * 1) Write the HTML to a tmp file

This is not the best solution, a regexp hacker should actually rip away most of the html picture markup and then replace the thumb by the original pic maybe. But above is at least a minimal job. To see the intermediary file as someone said, comment the unlink at the end and the get it from the images file. //@unlink($file);

Sorry, I'm not a real programmer and have too much workload to help for real. Just wanted to produce some handouts ;) - Daniel

Same problem as section 2
I'm on Ubuntu Linux with Mediawiki 1.10. Htmldoc is in /usr/bin. I commented out the unlink command, and the temp file is empty (0 length).

I checked to be sure that my Apache user can run htmldoc -- it can. Unsure what I should try next.

By the way, your single-page export plugin works perfectly (even for images). So I know that htmldoc is not at fault here.
 * I didn't write the single page one, but the code seems pretty similar. I'll just have to see what differences there is in the code between this one and the single-page one. --Nad 22:28, 14 September 2007 (UTC)

Media wiki blank screen fix
I have have installed htmldoc on fedora c 4. yum install htmldoc. Then I installed the Pdf export extension and then mediawiki fails to run at all. I just get a blank screen. Which direction should I take?
 * Whats the details from the special:version page, and is there any content at all in the html source of the blank page? also did you install this extension (Pdf Book) or Extension:Pdf Export?) --Nad 22:34, 15 September 2007 (UTC)

--Johnp125 03:17, 16 September 2007 (UTC)

I installed the PdfBook.php extension. require_once("extnsions/PdfBook.php");

If I take this out the wiki works fine. I have installed several older versions of wiki thinking this might be the issue.

http://www.website.com/wiki/

http://www.website.com/wiki2/

http://www.website.com/wiki3/

none of them work with this extension.

I did a symbolic link for the /usr/bin/htmldoc program to /var/www/cgi-bin/ I made sure apache had access to the htmldoc program and symbolic link.

I just get a blank screen in wiki.

I would love to get this program to work as it is the best program to export by catagories.
 * It could be something to do with it being cgi rather than modphp, I don't have much experience with cgi, but you might want to check specifically if php is able to execute, it needs to be able to execute it directly from the naked "htmldoc" shell command with no preceding path information. You can check this by putting the following one-line php file in your wiki directory and requesting it (ensure its executable by web server):

If that doesn't output a whole bunch of information about htmldoc (like this), then you need to get that working first. You might find some info at http://www.easysw.com/htmldoc/docfiles/5-cgi.html relevent as it's got stuff to do with htmldoc and cgi. --Nad 03:23, 16 September 2007 (UTC)

--Johnp125 23:35, 16 September 2007 (UTC)

I know php is working. Are we using htmldoc in php or cgi-bin, because maybe I do not need to get it working as a cgi script.

Also in the code <?print `htmldoc --help`?> how are you refrencing htmldoc?

<?print `/usr/bin/htmldoc --help`?> maybe like this? would this help the test to work because I'm getting just htmldoc --help after I run this. So maybe it can't find it.

My htmldoc folder for fedora C4 is /usr/bin. My web folder is /var/www/html/. My cgi-bin folder is /var/www/cgi-bin with a alias of www.website.com/cgi-bin/
 * That's exactly what I'm talking about, you need to add htmldoc to your environment's PATH so that it is accessible without preceding it by any specific path information. Although /usr/bin should already be in path, so maybe there's another problem. The key issue is that you need to have htmldoc work simply by typing eg. "htmldoc --help" with no preceding path information. --Nad 00:44, 17 September 2007 (UTC)

--Johnp125 03:08, 17 September 2007 (UTC)

I'm not sure what I need to do to get it working. I have created symbolic links for the cgi-bin directory and the html directory. I'm have added the suggested settings in apache according to the htmldoc info. I know it's my httpd.conf file but not sure what I need to do to fix it. How did you get httpdoc working on your apache server? Maybe that would point me in the right direction. The htmldoc program works from any location but for apache you need a symbolic link.
 * This may be due to your cgi setup which I'm not familiar with; but I don't understand what Apache has to do with this? in my set up Apache has absolutely nothing to do with it, PHP is just executing a shell command, so as long as it can execute from shell with no problem, then it should be fine from PHP. --Nad 03:37, 17 September 2007 (UTC)

More Blank PDF files
--Johnp125 17:56, 17 September 2007 (UTC)

MediaWiki: 1.10.0

PHP: 5.1.6 (apache2handler)

MySQL: 5.0.22

Ok the htmldoc --test works now but I'm getting blank pdf files like the error above.

Also I can't use the variables. As soon as I set one of the variables like $wgPdfBookLeftMargin = 1cm. I just go back to blank page in mediawiki.


 * I'm running out of ideas on this one - what version of htmldoc is it? --Nad 21:17, 17 September 2007 (UTC)

--Johnp125 12:48, 19 September 2007 (UTC) htmldoc version 1.8.27

After I create a pdf document I looked in the /var/www/html/wiki/images folder and did not find a tmp file. This is a test wiki so only 1 other file was in there. Is there anything I need to add to LocalSettings.php to be able upload the files. Maybe it is not able to create the file in the first place?
 * You should get normal file uploading tested and working first with $wgEnableUploads etc - see Manual/config settings. --Nad 21:43, 19 September 2007 (UTC)

--Johnp125 12:25, 20 September 2007 (UTC)
 * Try again now I've done some fixes based on Daniels patch above and it's working for me in 1.11 now. If your book is a hundred pages or more you may need to increase PHP's memory to 64MB or so. --Nad 21:44, 21 September 2007 (UTC)

Upload filetype
What happens when pdf is not a valid file type when uploading? Does the wiki control this with this extension, if so do I need to add pdf file types to the type of files you can upload?
 * The upload filetype is unrelated to this since exported pdf's are downloaded not uploaded. If you want to add pdf to your allowed upload filetypes, use $wgFileExtensions[] = 'pdf', you may also want to set $wgVerifyMimeType to false if it's giving you hassles when you try and upload exotic types of file. --Nad 04:11, 21 September 2007 (UTC)

More empty file downloads
On the demo links on your main page they are no longer working. They are showing blank pages as well.

This link can then be added to a template which can be trancluded into any category suitable for downloading as a book. For an example of such a template, see OrganicDesign:Category:I am that, which uses OrganicDesign:Template:Book to display the message and download link.
 * Try now as new 1.11 patch may solve it. --Nad 21:44, 21 September 2007 (UTC)

--Johnp125 02:00, 22 September 2007 (UTC)

Ok your download demo's seem to be working now. Which patch do I need to get. Wiki or pdfbook?

Also could this program be made to export the documents to html or word format with minimal editing?
 * By "1.11 patch" I just meant the new version of pdf-book (0.0.3) which has been patched to fix the problems it's been having with MW1.11. I think it would be easy to get it to export to html since it already composes a single large html page to feed into HtmlDoc, but exporting to MS-Word would be a totally different kettle of fish. Before I changed to pdf-exporting I had started work on Extension:Open Office Export, but there was too much work to do on it to get it to a practical state and HtmlDoc made the pdf option a far more attractive prospect. --Nad 09:58, 22 September 2007 (UTC)

--Johnp125 12:16, 22 September 2007 (UTC)

Well you would know better than me but doesen't word open html files? I wonder what it would look like to open the html files before htmldoc got a hold of it.

--Johnp125 12:44, 22 September 2007 (UTC)

I think I have the new version from the link on the main page but now the link just goes to a blank page. Also I noticed that your test link does not work again. OrganicDesign:Category:I am that. It was working last night.
 * The I am that test seems fine today, what was it doing when it failed? it could have been due to server maintenance as we've had a fair bit of trouble that needs fixing since the 1.11 upgrade. If you want to check the html output, just remove the line that says @unlink($file) and make note of where it puts it because that's the html that gets processed by htmldoc. --Nad 21:19, 22 September 2007 (UTC)

--Johnp125 23:32, 22 September 2007 (UTC)

The error is.

Internet Explorer cannot download index.php from www.organicdesign.co.nz

Internet Explorer was not able to open this Internet site. The requested site is either unavailable or cannot be found. Please try again later.

This error comes up after teh download starts with 0%.

Getting File information Index.php from www.organicdesign.co.nz --Johnp125 23:34, 22 September 2007 (UTC)

What would it take to modify the script to convert to a big html file instead of a pdf? --Johnp125 00:01, 23 September 2007 (UTC)

I think one of the issues I am having about the blank files is because of security. Where does it dump the html file? /tmp? if so it never seems to make it there. Maybe I need to change security to apache? --Johnp125 02:16, 23 September 2007 (UTC)
 * It puts the html file into the root of the wiki file upload directory, so if yo can upload files to your wiki successfully then it should have no security concerns dumping it's html there. --Nad 04:15, 23 September 2007 (UTC)

Did the @unlink($file) change in the pdfbook.php code and it created files in the /var/www/html/wiki/images location which produced pdf-book46ed0e444c7eb with no data in it at all. I have a test catagory at the bottom of a page. I can pull the test catagory from the special:export no problem. Any idea why I'm getting no data?
 * Sorry, I just don't know why it may be giving you an empty file, you could try print $ttext in the main loop to see if it's looping through the category titles correctly, then printing $text to see if its getting each articles content etc, but it's too hard for me to know without being able to see it for myself... --Nad 04:23, 23 September 2007 (UTC)

--Johnp125 04:33, 23 September 2007 (UTC)

I was able to capture the error in apache error_log.

[client 192.168.1.102] PHP Notice: Undefined variable: links in /var/www/html/wiki/extensions/pdfbook.php on line 132, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Notice: Undefined variable: links in /var/www/html/wiki/extensions/pdfbook.php on line 132, referer: http://192.168.1.99/wiki/index.php/Main_Page
 * Unfortunately that still doesn't help much since line 132 only accesses $cmd and $file which are both unconditionally defined - unless your line numbers have changed due to adding debugging etc - what exactly is the content of line 132 in your script? --Nad 06:06, 23 September 2007 (UTC)

--Johnp125 13:22, 23 September 2007 (UTC)

Let me count them. Do I count the blank lines? and debug was off.