Extension talk:Pdf Export/LQT Archive 1

There seems to be a licensing problem with htmldoc for windows installations ?

I did not manage to open pdf file, so I had to add a header line to download the pdf

Furthermore, htmldoc does not support unicode, I began a translation utf8 to latin1 for displaying french characters correctly , this may need enhancements.
 * I filled in the gaps for german umlauts based on the site you linked to. ~gandm

mailto:sancelot@free.fr here is my working file with windows  'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

// thanks to interesting http://klaus.e175.net/code/latin_utf8.phps link // only french cars are done function utf8_latin1($text){ return strtr($text,array( "\xC3\x9F"=>"&amp;szlig;", "\xC3\xA4"=>"&amp;auml;", "\xC3\xAB"=>"&amp;euml;", "\xC3\xAF"=>"&amp;iuml;", "\xC3\xBC"=>"&amp;uuml;", "\xC3\xB6"=>"&amp;ouml;", "\xC3\x84"=>"&amp;Auml;", "\xC3\x8B"=>"&amp;Euml;", "\xC3\x8E"=>"&amp;Iuml;", "\xC3\x9C"=>"&amp;Uuml;", "\xC3\x96"=>"&amp;Ouml;", "\xC3\xA2"=>"&amp;acirc;", "\xC3\xAA"=>"&amp;ecirc;", "\xC3\xAE"=>"&amp;icirc;", "\xC3\xB4"=>"&amp;ocirc;", "\xC3\xBB"=>"&amp;ucirc;", "\xC3\x82"=>"&amp;Acirc;", "\xC3\x8A"=>"&amp;Ecirc;", "\xC3\x8E"=>"&amp;Icirc;", "\xC3\x94"=>"&amp;Ocirc;", "\xC3\x9B"=>"&amp;Ucirc;", "\xC3\xA0"=>"&amp;agrave;", "\xC3\xA8"=>"&amp;egrave;", "\xC3\xB9"=>"&amp;ugrave;", "\xC3\xA9"=>"&amp;eacute;", "\xC3\x80"=>"&amp;Agrave;", "\xC3\x88"=>"&amp;Egrave;", "\xC3\x99"=>"&amp;Ugrave;", "\xC3\x89"=>"&amp;Eacute;", "\xC3\xA7"=>"&amp;ccedil;" //     "%3A"=>"/" )); } function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Sauvegarder en PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText;

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('href="#', 'href="' . $wgServer . '/' . $page . '#', $bhtml); $bhtml=utf8_latin1($bhtml); #$html = "  ". $page. "  " . $bhtml. " ";                        # thanks to mediawiki AT sandeman.freesurf.fr : $html = "  ". utf8_decode($page). "  " . $bhtml. " ";                        // make a temporary directory with an unique name $mytemp = "c:\\temp\\f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1"); # Write the content type to the client... header("Content-Type:application/pdf"); header('Content-Disposition: attachment;filename="'.$page.'.pdf"'); flush;

# Run HTMLDOC to provide the PDF file to the user... passthru("htmldoc -t pdf14 --color  --quiet --jpeg --webpage $mytemp "); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); } function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage}" ) )                ); return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?>msg( 'pdf_print_link' ); ?>data['nav_urls']['pdfprint']['href'] ) ?>">msg( 'pdf_print_link' ); ?>

Is it possible to intergrate multiple page with windows version
Is it possible to intergrate multiple page with windows version?

PDF in Spanish
With the diff below I could solve the ñ and accent problem

diff

61c61 < --- >                        $bhtml = utf8_decode($bhtml); 76c76 <                        passthru("htmldoc -t pdf --quiet --jpeg --webpage '$mytemp'"); --- >                        passthru("htmldoc -t pdf --charset 8859-1 --quiet --jpeg --webpage '$mytemp'");

save diff to patch.txt next execute

patch SpecialPdf.php patch.txt

--Esacchi 20:12, 2 August 2006 (UTC)

SpecialPDF and MimeTeX
This is a really cool and useful extension. HOwever, it is ignoring an extension we added to support MimeTeX so that the output does not include any math created through that extension.

Our MimeTeX extension replaces Any LaTeX formula

with 

Is there another way we should be generating this so that SpecialPDF can capture the image? DavidJameson 20:32, 4 August 2006 (UTC)

Updated for unicode, multiple articles, and images
Searched around for a way to do multiple articles to PDF, had to combine what was listed here and what was contained in wiki2pdf. Works the same way as SpecialPDF.php, put it in your extensions folder. HTML files are created (for processing) in your /webroot/wikiroot/pdfs folder (so create it if you don't have it) or another folder of your choice. It still uses HTMLDOC with some switches to format headers and footers, and there are string substitutions for the images exported out of the Wiki...

 'myPdf',        'author' =>' Thomas Hempel, Simon Wheatley, and others',        'description' => 'prints a collection of articles as a pdf book',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfmyPDFNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfmyPDFToolbox';

function wfmyPDF { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint2' => 'PdfPrint2' , 'pdf_print_link2' => 'Export PDF book'));

class myPDF extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function myPDF { SpecialPage::SpecialPage( 'PdfPrint2' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer; //Get the name of the main article from which this routine was called // - this will be used for the book/file name $page=isset($par) ? $par:$wgRequest->getText('page'); $title=Title::newFromText($page); $article=new Article($title); //write a header file with the title HTML tab for the book - header.html //all pdfs will be written to /webroot/wikiroot/pdfs $doctitle=str_replace("_", " ", $page); //write the header file: $mytemp = $_SERVER["DOCUMENT_ROOT"].$wgScriptPath."/pdfs/header.html"; $article_f = fopen($mytemp, 'w'); $doctitle=str_replace("_", " ", $page); fwrite($article_f, "  ".$doctitle."     "); fclose($article_f);

//add this header file to the list of files that htmldoc will process $filelist=$mytemp; $c=1;

//get the article content, i.e. a list of articles to print to pdf //each one is denoted by curly braces $SaveText=$article->getContent; $i = strpos($SaveText,"{"); while ($i >= 0) { $j = strpos($SaveText,"}"); if ($j <= $i) break; $art = trim(substr($SaveText, $i+1,$j-$i-1)); $SaveText=substr($SaveText, $j+1); //Go fetch the article that was listed $title1 = Title::newFromURL( $art ); $article1 = new Article($title1); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = true; //parse the article into HTML $parserOutput = $wgParser->parse( $article1->preSaveTransform( $article1->getContent ) ."\n\n",                               $title1, $parserOptions ); //get the html content, then format it to remove any wiki escape chars $bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml); //make sure all links are absolute $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); //make sure all image tags are true $bhtml = str_replace ('&lt;img', '', $bhtml); //write a new title and H1 heading - used for the chapter in the pdf book $html = "  ".$art."   ".$art." \n".$bhtml."  "; //output article to next html file in list: $mytemp = $_SERVER["DOCUMENT_ROOT"].$wgScriptPath."/pdfs/file".$c.".html"; $article_f = fopen($mytemp, 'w'); fwrite($article_f, $html); fclose($article_f); $c=$c+1; $filelist=$filelist." ".$mytemp; $i = strpos($SaveText,"{"); //limit output files to 100 - used in testing in case things get out of hand if ($c > 100) break; }                       putenv("HTMLDOC_NOCGI=1");

# Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user... passthru("htmldoc --book -t pdf14 --bodyfont Helvetica --header t.1 --footer c.1 --no-links --linkstyle plain --charset 8859-1 --color --quiet --jpeg --webpage ".$filelist); unlink ($filelist); }       }        SpecialPage::addPage (new myPDF); }

function wfmyPDFNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint2'] = array(                       'text' => wfMsg( 'pdf_print_link2' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint2', "page=". wfUrlencode( "{$skintemplate->thispage}" ) )                ); return true; }

function wfmyPDFToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint2'] ) ) if ( $monobook->data['nav_urls']['pdfprint2']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link2' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint2']['href'] ) ?>"><?php echo $monobook->msg( 'pdf_print_link2' ); ?></a><?php ?></li><?php }       return true; } ?>

MP 09:45, 30 August 2006 (UTC)
This doesn't work for me - I've added the php file to the extensions folder and altered Localsettings.php to include it. When I navigate to my site I just get a blank page rather than logon. I'm obviously missing something blindingly obvious.....


 * Windows XP
 * Apache 2
 * php 5.1.4
 * MySQL 4.1.16
 * Mediawiki 1.7.1

---CheShA Says: "You haven't set permissions on the SpecialPdf.php file; Apache can't access it"

SCW 3:14pm, 2nd Sept, 2006 (CT)


require_once("extensions/myPDF.php"); hghg

Hello Editing the Page

Bad argument to HTMLDOC
myPDF is not working for me. It's generating a bogus arg to HTMLDOC and so instead of getting a PDF file, I get a file with the error message:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: Bad option argument "--charse "!

Note that there should be a 't' at the end of the argument but instead there is an embedded CR control character

DavidJameson 14:22, 5 September 2006 (UTC)

SCW, 05sept06, 10:09CT
A line break got into the cut and paste on that line for htmldoc passthru. take out the linebreak to join the lines together to make the argument to htmldoc '--charset' and all should be okay.

I'll be posting a newer version that has th option for creating PDF books with a title page, TOC and nice nesting of articles...

Simon.

No HTML files found
I must be missing something else....after fixing the linebreak problem, I opened a wikipage and then clicked on Export PDF book.

This time I got a file with the error:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

Usage: htmldoc [options] filename1.html [ ... filenameN.html ] htmldoc filename.book

What's this notion about a "collection" of articles? Is there something I'm supposed to do to "collect" some articles together before I can print them? How would I do that, and why would I need to? The "Export PDF Book" link shows up in Toolbox when I'm viewing a particular page - So how can generate THAT page as PDF but including all images?

DavidJameson 15:36, 5 September 2006 (UTC)

okay, the following script is based of the original, temporary PDF files are saved to /tmp, like they were in the original. This script will not deal with putting multiple articles into one PDF file (see above for that). It will handle image files.

<?php

if (!defined('MEDIAWIKI')) die; require_once ("$IP/includes/SpecialPage.php");

$wgExtensionFunctions[] = 'wfSpecialPdf'; $wgExtensionCredits['specialpage'][] = array(       'name' => 'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Print as PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml);

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('&lt;img', '<img', $bhtml); $bhtml = str_replace ('/&gt;', '/>', $bhtml); $html = "  ". $page. "  " . $bhtml. " ";                        // make a temporary directory with an unique name $mytemp = "/tmp/f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1"); # Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user...                       passthru("htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage '$mytemp'"); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); }

function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage} " ) )                ); return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['href'] ) ?>"> <?php echo $monobook->msg( 'pdf_print_link' ); ?></a><?php ?></li><?php }       return true; } ?>

Almost there (grin)
Well, this version almost works perfectly - it gave me a nice PDF file with explicit images referenced in the wiki page.

However, it crashes when trying to handle one of the image tags produced by our MimeTex math generator.

E.g.

<img src="/cgi-bin/mimetex.cgi?\green f(\xi)=\int_{-\infty}^\xi e^{-\tau^2}d\tau { {x \atop y } }" align="absmiddle" border="0" alt="TeX Formula">

seems to cause a crash.

I wonder if the code that tweaks the IMG tag itself is getting confused with the more sophisticated stuff in this particular image reference.

DavidJameson 12:10, 6 September 2006 (UTC)

PHP error
Just for grins, I ran the process through a PHP debugger. The debugger barfed with the error below. Error: E_ERROR Call to a member function getNamespace on a non-object at /var/www/html/wikiroot/riskit/includes/Article.php line 155

The line it's complaining about is			if ( $this->mTitle->getNamespace == NS_MEDIAWIKI ) { which is found in the function getContent in Article.php

(P.S.....someone needs to install the GESHI syntax highlighting extension on wikimedia.org)

DavidJameson 12:18, 6 September 2006 (UTC)

PHP error
OK - I've figured out both problems (to an acceptable extent)

1) The PHP bug is due to the fact that sometimes (and I don't know why it's only sometimes), the URL associated with the Print As PDF command in the toolbox has %0D%0A tacked on at the end. No idea where this is coming from but it can be removed by modifying the echo statement on line 103, adding a str_replace function to remove the extra characters. php echo htmlspecialchars( str_replace('%0D%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] ))

2) The reason the math images weren't being processed was because the URL in the SRC attribute of the IMG tag did not include the server. I modified my mimetex extension to include the server but this problem will arise again for anyone else who references an image without using a server in the URL. The full solution will require the text to be searched for all image urls, examine them to see if there's a server part and if not, insert the $wgServer into the string. Probably can be done quickly with a regex. DavidJameson 20:31, 6 September 2006 (UTC)


 * What's wrong with just doing this: "$bhtml = str_replace ('/images/',$wgServer . '/images/', $bhtml);" instead of the $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); that you have there? Made the math images all work for me. --dgrant 18:30, 25 October 2006 (UTC)

Error
On some pages I get this error. I'm using DavidJameson's code above. --dgrant 18:34, 25 October 2006 (UTC)

Fatal error: Call to a member function getNamespace on a non-object in /var/www/mediawiki-checkout/includes/Article.php on line 150


 * Ok, now I'm getting this on all pages for some reason. 216.13.217.231 01:30, 7 November 2006 (UTC)

Does not work with 1.8.2 of mediawiki.
No way no how. Neither does any of the code on this page.

Error: Fatal error: Call to a member function getNamespace on a non-object in /var/www/includes/Article.php on line 150


 * I have this Error in Mediawiki 1.8.2 and 1.9.0 when calling Special:PdfPrint. It works for me when clicking on the link in the toolbox. --Ikiwaner 23:57, 15 January 2007 (UTC)


 * I'm running 1.6.7 on BluWiki and getting the exact same error... If someone figures this out can they email me? --SamOdio 14:29, 20 March 2007 (UTC)

24.10.2007 by Dirk:

Hi, I have the same Problem, my pdf Export wont work @ all. System:SQL 5.0.45 & PHP 5.2.3

Host: all-inkl.com (htmldoc installed)

Mediawiki: 1.11.0

I've started an PDF print with the attribute page, print (http://www.carpc-wiki.info/index.php?title=Spezial:PdfPrint&page=Hauptseite) as url I get an empty/corrupt pdf back. If I go to the Spezialpages and klick PDF Export, I get this error: Error: Fatal error: Call to a member function getNamespace on a non-object in /var/www/includes/Article.php on line 150

too. what may be wrong? I corrected the Temp path, the htmldoc path too, the temp does have chmod 777 (must have it right?)

the Wiki ist: http://www.carpc-wiki.info

thanks for information. email: info(at)carpc-wiki.info

No Images
Great extension, works really well except it doesn't appear to include images in the export. Can anyone please confirm that this is normal and that I haven't done anything wrong?

Thanks, CheShA.

Still No Images
Hi, can anyone tell me what to do that my PDF Exported File includes the Pictures from the Original article?

I´ve heard there´s a PDF Hack to fix that problem. If anyone had an idea,... please let me know

THX [mailto:DarkManX1@gmx.net X-Cident]

Still No Images
Hi there, someone can help us about this problem ? the PDF exporting is great, but the link is incorrect for the image "Image:Image_Name.gif", and the not the direct link itself.

Help please ! th3_gooroo@hotmail.com

---

Here was my solution:

After adding:

I replaced

with

Currently running at http://kumu.brocku.ca

---

Possible Image Fix
This solution almost works for me - but it led me to the right answer: Add these two lines where the other globals are defined:

and then change

to

For me this fixed the problem with images but internal links to wiki pages still didn't work. These links are relative just like the images and need to be absoluted. --justinm

Images With Windows
I managed to get this working by inserting the following:

I don't really know if this is the best way to do it, but it fixed my problem with the html being generated as:

Images When Wiki Is Root of Domain
The string replacement

doesn't do anything if  is the empty string. This is the case if the wiki is the root of the domain.

I've fixed this by replacing the above line with

so both the  and the   attributes are fixed up.

It seems to me that the root of these problems is that there's no way to pass a default server prefix to.
 * You could try Extension:PdfBook, it can also export single pages now and images work fine on it. --Nad 21:13, 2 November 2008 (UTC)
 * Thanks Nad. this worked. Using a redirect and virtual host in Apache, server.oursite.org/wiki becomes wiki.oursite.org/. With your edit, images appear again on the PDFs. --Erikvw 11:11, 17 December 2008 (UTC)

More generic fix
Thanks to all for pointing me in the right path to fix this on my wiki. After looking at a few wiki installations, it appears some problems can be seen when wgScript to an empty string, while others choose not to use the default path to articles. Either $wgScript or wgScriptPath may together serve as the generic approach to finding the absolute path to an article however. The following should help images display, and fix relative links, while maintaining all absolute links. Please comment/improve as needed:

Replace the following two lines in PdfExport_body.php (same as others have pointed out):

with the following: --Biyer 16:05, 25 January 2010 (UTC)

what files should be changed ?
Very nice this solutions, but which files should be editted?

Possible Fix
on UNIX: check the DNS-Resolution on the webserver which host the wiki (nslookup or dig). Is there no resolution for your wiki-domain, htmldoc can not read the images from servers (and gives also no errors out). But the generated PDF contains no images.

Still Still no Images
hello, after some problems with htmldoc this extension works on my wiki, but in pdfs there are no images. i tried all this solutions, but no one works on my (windows) system. Are there other solutions/fixes ?

Thanks. Johannes741 14:35, 25 May 2010 (UTC)

Empty Files
Help, i only get empty files. But when i execute this line on the console i get a working pdf... htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage '$mytemp' > test.pdf Whats wrong ? Maybe an apache-issue or something ?


 * I had the same problem. I changed this line to  and it worked. The path before   is usually the same as   in LocalSettings.php.   is another usual path to try. --Ikiwaner 20:06, 20 December 2006 (UTC)


 * I had problems with the permissions on  which I fixed by running   147.209.216.245 01:31, 15 June 2007 (UTC)


 * I had this problem on Windows XP. Took me hours to figure out.  I moved the C:\Program Files\HTMLDoc to C:\HTMLDoc, added HTMLDoc as a virtual directory, then in the PDFExport.php I use the line:


 * The problem occurred because of the space between in "Program Files", and if I remember correctly it WOULD NOT WORK if you do not append the .exe to htmldoc

Clean code
Hi there, would be nice if you could post the clean code with all above corrections included. I tried them all with mediawiki 1.8.2 but I can't get it to work. Cheers Florian

Test and working on 1.9
this was tried on Linux/Apache/PHP5 with 1.9 in a software configuration very simillar to Wikipedia and works fine. tom

Working code for Windows with MediaWiki v1.9
The PDF export worked for me on Windows after I fixed the path AND more importantly on line 80, I had to change '$mytemp' to $mytemp - i.e. remove the single quotes aroung $mytemp.

-Vivek Agarwal

Here is the complete source:

<?php

if (!defined('MEDIAWIKI')) die; require_once ("$IP/includes/SpecialPage.php");

$wgExtensionFunctions[] = 'wfSpecialPdf'; $wgExtensionCredits['specialpage'][] = array(       'name' => 'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Print as PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }

function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml);

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('<img', '<img', $bhtml); $bhtml = str_replace ('/>', '/>', $bhtml);

$html = "  ". $page. "  " . $bhtml. " ";

// make a temporary directory with an unique name $mytemp = "d:/tmp/f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1");

# Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user...                       passthru("htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage $mytemp"); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); }

function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage} " ) )                );

return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( str_replace('%0D%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] )) ?>"> <?php echo $monobook->msg( 'pdf_print_link' ); ?></a><?php ?></li><?php }       return true; } ?>


 * I used your code... It gave me link to Print PDF. But when i attempt to open the PDF it say that its unable to open and size of the pdf is only 5kb. You were talking abt fixing "path" Do we need to set any path?? help me...

Page rendering
Hello, I'd wish some improvement of this useful extension. While it works technically it's a fact that the pages look better when printed to a PDF printer over your web browser. To have an improvement compared to web broweser PDFs it should look more LaTeX-style. --Ikiwaner 00:00, 16 January 2007 (UTC)
 * Ever tried Extension:Wiki2LaTeX? --Flominator 10:30, 14 August 2007 (UTC)

Errors in the last version in discussion
Using the very last iteration of the code posted in the discussions, I get the following error when I click the Print as PDF link:

Fatal error: Call to a member function getNamespace on a non-object in /srv/www/htdocs/wiki/includes/Article.php on line 150

Working with sites starting with http://www...not with sites http://...
Seems to be working only with sites which include www in their adress. Is this possible?--87.2.110.219 21:53, 23 January 2007 (UTC)

A problem with htmldoc encoding
htmldoc is very sensitive about the encoding.

In file SpecialPdf.php, line 89 passthru("/usr/bin/htmldoc -t pdf14 --charset 8859-1 --color --quiet --jpeg --webpage..., for the new version of htmldoc --charset should be iso-8859-1 Ivan

What is the purpose of HTMLDOC, if it's Windows app?
Hi all

I don't quite understand, if HTMLDOC is a windows application, how will this help me if my web server is a Linux server? There is also a Linux Version!

A problem with PHP passthru
Thanks for the extension! I'm using MediaWiki 1.8.2 on Windows 2003 and it's work. I had a problem with the passthru fonction that i solved by copying cmd.exe in the php installation folder.

Where do I download extension
Can't seem to find SpecialPage.php, where do I download the file?


 * Just cut and paste the code above into a text file with that name and extension. Jschroe

Scaling Images to fit paper width
I found an argument to htmldoc, that allows you to specify the 'width' of the page in pixels, which is sort of the opposite of scaling, but rather, setting the viewable resolution for the images. So, I have an image that is 900 pixels wide, I'd want to set my browser width to something greater than 900 to see the whole image at once. passthru("htmldoc -t pdf14 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'"); would become: passthru("htmldoc -t pdf14 --browserwidth 950 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'"); --DavidSeymore 18:48, 14 May 2007 (UTC)

Title of Wiki-Article in PDF
Hi,

i was wondering if it it possible to display the title of the Wiki article in the generated PDF?!THX

Change $html = "  ". $page. "  " . $bhtml. " ";   to something like $html = "    <H1>". $title. "</H1>". $bhtml. " ";  Suspect

Name of the generated file
The extension works fine except for the fact that is outputs a file called index.php, renaming this to something.pdf works as it is in fact an pdf file. But I m wondering how I could fix it so it outputs <article_name.pdf>. Any suggestions?

Solution
I needed to uncomment this line to get <article_name.pdf> instead of <index.php>.
 * 1) header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));

Generated PDF is 1kb and corrupted
I am testing MW 1.10.1 on a LAMP system (upgraded from 1.6.7). Then tried the instructions on the main article page. After clicking on the Print to PDF link though I get a 1kb PDF. Any ideas as to what could be wrong? How do i go about fixing this? SellFone 07:23, 3 August 2007 (UTC)

Solution
I downloaded the HTMLDOC Binary and i didnt realize that you had to pay for it. When I ran it, it asked for a license and so I went ahead and installed gcc & gcc-c++ so i could compile from source and now its working.

Patch to Export Multiple Pages to PDF
This patch creates a Special Page form similar to Special:Export for specifying multiple pages to export to PDF. This patch was created against the 1.1 (19-July-2007) version of PdfExport and tested in MediaWiki 1.11.

'''Please post the whole files, insteat the patches. Thx'''

Patch for PdfExport.php:

Patch for PdfExport.i18n.php:

--Johnp125 18:09, 25 September 2007 (UTC)

Tried out the extra code. I get a htmldoc error when going to pdfprint. the regular print as pdf seems to work just fine just not the additional items.

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group. ERROR: No HTML files! Usage: htmldoc [options] filename1.html [ ... filenameN.html ] htmldoc filename.book Options: --batch filename.book --bodycolor color --bodyfont {courier,helvetica,monospace,sans,serif,times} --bodyimage filename.{bmp,gif,jpg,png} --book --bottom margin{in,cm,mm} --browserwidth pixels --charset {cp-874...1258,iso-8859-1...8859-15,koi8-r} --color --compression[=level] --continuous --cookies 'name="value with space"; name=value' --datadir directory --duplex --effectduration {0.1..10.0} --embedfonts --encryption --firstpage {p1,toc,c1} --fontsize {4.0..24.0} --fontspacing {1.0..3.0} --footer fff {--format, -t} {ps1,ps2,ps3,pdf11,pdf12,pdf13,pdf14,html,htmlsep} --gray --header fff --header1 fff --headfootfont {courier{-bold,-oblique,-boldoblique}, helvetica{-bold,-oblique,-boldoblique}, monospace{-bold,-oblique,-boldoblique}, sans{-bold,-oblique,-boldoblique}, serif{-bold,-italic,-bolditalic}, times{-roman,-bold,-italic,-bolditalic}} --headfootsize {6.0..24.0} --headingfont {courier,helvetica,monospace,sans,serif,times} --help --helpdir directory --hfimage0 filename.{bmp,gif,jpg,png} --hfimage1 filename.{bmp,gif,jpg,png} --hfimage2 filename.{bmp,gif,jpg,png} --hfimage3 filename.{bmp,gif,jpg,png} --hfimage4 filename.{bmp,gif,jpg,png} --hfimage5 filename.{bmp,gif,jpg,png} --hfimage6 filename.{bmp,gif,jpg,png} --hfimage7 filename.{bmp,gif,jpg,png} --hfimage8 filename.{bmp,gif,jpg,png} --hfimage9 filename.{bmp,gif,jpg,png} --jpeg[=quality] --landscape --left margin{in,cm,mm} --linkcolor color --links --linkstyle {plain,underline} --logoimage filename.{bmp,gif,jpg,png} --no-compression --no-duplex --no-embedfonts --no-encryption --no-links --no-localfiles --no-numbered --no-overflow --no-pscommands --no-strict --no-title --no-toc --numbered --nup {1,2,4,6,9,16} {--outdir, -d} dirname {--outfile, -f} filename.{ps,pdf,html} --overflow --owner-password password --pageduration {1.0..60.0} --pageeffect {none,bi,bo,d,gd,gdr,gr,hb,hsi,hso,vb,vsi,vso,wd,wl,wr,wu} --pagelayout {single,one,twoleft,tworight} --pagemode {document,outline,fullscreen} --path "dir1;dir2;dir3;...;dirN" --permissions {all,annotate,copy,modify,print,no-annotate,no-copy,no-modify,no-print,none} --portrait --proxy http://host:port --pscommands --quiet --referer url --right margin{in,cm,mm} --size {letter,a4,WxH{in,cm,mm},etc} --strict --textcolor color --textfont {courier,times,helvetica} --title --titlefile filename.{htm,html,shtml} --titleimage filename.{bmp,gif,jpg,png} --tocfooter fff --tocheader fff --toclevels levels --toctitle string --top margin{in,cm,mm} --user-password password {--verbose, -v} --version --webpage fff = heading format string; each 'f' can be one of:. = blank / = n/N arabic page numbers (1/3, 2/3, 3/3) : = c/C arabic chapter page numbers (1/2, 2/2, 1/4, 2/4, ...) 1 = arabic numbers (1, 2, 3, ...) a = lowercase letters A = uppercase letters c = current chapter heading C = current chapter page number (arabic) d = current date D = current date and time h = current heading i = lowercase roman numerals I = uppercase roman numerals l = logo image t = title text T = current time

This shows at the top of the page.

Below that is the options to convert different documents to pdf, but it does not work.

Fedora C 4 fix
I have installed the pdf export extension and added the code in localsettings.php and my wiki just shows a blank screen when I install this extension. I have installed htmldoc and it works from command prompt.

Has anyone a solution,, to get this run under windows ?
Please post here. Export works already fine, but the pdf file is empty.

ThX


 * The code as posted above under Working Code for Windows with MediaWiki v1.9 works.. you just need to make sure the temp file path exists and is writable, and that IIS anonymous user has access to cmd and anonymous access is possible for images.. Suspect

Fatal Error
Call to undefined method: specialpdf->__construct in /usr/home/admin/domains/< >/public_html/mediawiki-1.6.10/extensions/PdfExport/PdfExport.php on line 51

This occurs when opening any page in MediaWiki. HTMLDoc installed

To fix this problem, replace parent::__construct( 'PdfPrint' );

with SpecialPage::SpecialPage ('PdfPrint');

Got it working on Win2k3 and MediaWiki 1.10.1 (Finally)
Here's the solution. Copy and paste the code from above, and make the following modifications:


 * Download the latest version of HTMLDoc (which, despite the claim that it has an installer, it does not)
 * Extract the contents of the HTMLDoc zip file to C:\Program Files\HTMLDoc\
 * Add "C:\Program Files\HTMLDoc\" to the PATH environment variable
 * Set IUSR_<MACHINE-NAME> "Read" and "Read & Execute" permissions on C:\Windows\System32\cmd.exe
 * Set IUSR_<MACHINE-NAME> "Full Control" on C:\Windows\Temp\
 * Copy and paste the new PdfExport.php code from Working Code for Windows with MediaWiki v1.9
 * Change the value of $mytemp to $mytemp = "C:\\Windows\\Temp\\f" .time. "-" .rand . ".html";

That was enough to do it for me - hope this helps some of you!

~JT

Got it working with Win2k3 and MediaWiki 1.11.0
~Mark E.
 * I used the above instructions for 1.10, except the value of $mytemp indicated by JT was wrong - I have changed it to include double backslashes.
 * If it still doesn't work, try copying your CMD.exe to your \PHP folder. Make sure your PHP folder has IUSR_MACHINENAME read and read & execute permissions.


 * To make images work, I had to change line 59 from:

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
 * to

$bhtml = str_replace ($wgScriptPath, $wgScriptPath, $bhtml); because it was doubling the servername; and in line 81, I changed  to. --Maiden taiwan 20:05, 6 March 2008 (UTC)

Could you explain for beginners what we are suppose to do with HTMLDoc. How to use this extension on my website ? Thanks for your help. Marcjb 22:32, 27 August 2007 (UTC)

Working on Debian unstable with Mediawiki 1.7
But I'm not getting any images either.

Datakid 01:55, 5 September 2007 (UTC)

More robust way of ensuring URL's are absolute
I've had to make URL's absolute for a couple of other extensions and found a more robust way than doing a replacement on the parsed text. Instead just set the following three globals before the parser is called and it will make them all absolute for you: --Nad 04:00, 5 September 2007 (UTC)

Nad, Is the parser that you refer to already in the php script that is on the front page? I've added those lines at the top of the file, after $wgHooks and before function wfSpecialPdf, and still no images?
 * I also tried putting those lines in SpecialPdf.execute after $wgServer; and before  $page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); Still no joy. Datakid 01:31, 6 September 2007 (UTC)
 * Ideally they should be defined just before the $wgParser->parse is called in the execute function. But I've just grep'd the html output of one of my exported PDF's from Extension:Pdf Book which uses this method of "absoluterising", but it hasn't worked for the images, maybe best to stick with the current text-replacement until I sort out that problem. The replacement used to make the url's absolute is:


 * Just realised that you also need to modify $wgUploadPath to make image src attributes absolute too, but I also have a problem with images showing up even with the src attributes being absolute... --Nad 04:13, 6 September 2007 (UTC)

Newest development is to allow an arbitrary number of such repositories per installation. Since it is likely for them to have different individual algorithms to 'absolutize' their path names, it might be a good idea to rely on an image-related function for the task. Images 'know' which repository they are in, including the local file system, or the upload directory. The philosophy of image related code is to always address Image objects, and usually not deal with repositories directly. So when you find an image in the source text, get a  object, and you probably have a function returning an absolute URL right with it. If it's not there, add it, or file a bug. If you cannot do it yourself, you could also ask me to add it, and I might even do so (-; I'm dealing with image-related functions anyways atm. ;-)
 * Holla,
 * Keep in mind that,  above is highly installation dependant, and may be quite different between installations, actually, it can be any string from   upwards.
 * Keep in mind that, not every image resides in the $wgUploadPath or a subdirectory thereof. Images might be in a shared repository, like WikiMedia Commons.
 * --Purodha Blissenbach 07:55, 8 September 2007 (UTC)


 * Having had a glace at the code, I am pretty certain that, the structurally correct way is, to let the parser object take care of asking image objects for absolute URLs. That means, add a  or similar, which is likely not yet possible, but imho pretty easily added to the parser. Investigate! It may be already there. Else, see above.
 * --Purodha Blissenbach 08:13, 8 September 2007 (UTC)

Bug: /tmp/ not writeable due to openbasedir restriction.
We did not get a pdf file, but got a series of php output of the type:

(paths truncated for brevity)

instead. We suggest, not to send these downstream (with wrong http headers, btw.) but rather display a decent error message on the special page. Likely, using  before writing into the file would do. --Purodha Blissenbach 16:48, 8 September 2007 (UTC)

Bug and Fix: Pdf Export blocks Semantic MediaWiki
We have several extensions installed, and included Pdf Export before including SemanticMediaWiki (SMW). The outcome was SMW not working: Thus the installation of SMW could not be completed. It requires the Special:SMWAdmin page to be accessed.
 * Special:Version looked as expected, but
 * Special:Specialpages did not show any of SMWs special pages,
 * URL-calling the page Special:SMWAdmin, or entering it via the seach box, yielded a "nonexisting special page" error.

Pdf Export was not working, see above. When we removed it from LocalSettings.php, we could use SMW. When we placed its call after the inclusion and activation of SMW in LocalSettings.php, SMW continued to work. See also bug 11238.

--Purodha Blissenbach 16:48, 8 September 2007 (UTC)


 * A fix for this bug is described in bug 11238. --Markus Krötzsch 10:26, 2 October 2007 (UTC)

Bug and Fix: Pdf Export forces all special pages to load during init, thus slowing down the wiki

 * A fix for this bug is described in bug 11238. --Markus Krötzsch 10:26, 2 October 2007 (UTC)

Table and image flow / Image alignment
Tables and images have all of their text forced below, as if they were followed by: Also, images that are right aligned: end up on the left side of the page and all text is forced to appear after the image. Is this a limitation of htmldoc or something that can be fixed in the extension? -- 66.83.143.246 18:53, 13 September 2007 (UTC)
 * I've temporarily fixed the image problem by wrapping them into a table, but this does not solve the general problem why htmldoc is forcing breaks after floats. 66.83.143.246 13:28, 14 September 2007 (UTC)


 * It's most likely not a limitation of htmldoc but actually the extension, probably to do with the parser sanitisation process. Comment out the line that removes the temp file and check the html in that.. Is not pretty.  Suspect

Error page
--Johnp125 03:10, 23 September 2007 (UTC)

When I click on the print to pdf button all I get is this.

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved.

This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

checked error log in apache and I'm getting errors on line 55,57,58.

[client 192.168.1.102] PHP Warning: fopen(/var/www/html/wiki/pdfs/header.html) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 55, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fwrite: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 57, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fclose: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 58, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Fatal error: Call to a member function getNamespace on a non-object in /var/www/html/wiki/includes/Article.php on line 160, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fopen(/var/www/html/wiki/pdfs/header.html) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 55, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fwrite: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 57, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fclose: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 58, referer: http://192.168.1.99/wiki/index.php/Main_Page

I'm running fedora c4. the htmldoc is in the /usr/bin folder.

Blue boxes on images and empty table of contents entry
htmldoc was adding blue borders on images that didn't have the frame attribute since they all had anchor tags around them and a slot in the table of contents for the mediawiki generated table of contents. I removed these with the following regular expressions in the execute function:

// Remove the table of contents completely $bhtml = preg_replace(      '/, where there should be a simple form, allowing users to try another page name, like some other special pages do, too.

We have that implemented already and are currently testing it on http://krefeldwiki.de/wiki/Spezial:PdfPrint

--Purodha Blissenbach 15:58, 16 October 2007 (UTC)


 * Do you meen it's a standard feature, or did you manually hack it to do so?
 * I would be very interested in the code :)


 * --Kaspera 13:05, 22 October 2007 (UTC)


 * He hacked it, and it seems to be working on that Krefeld wiki (warning: the site is t-e-r-r-i-b-l-y s---l---o---w).
 * I would be interested in the code too.
 * Lexw 11:33, 20 November 2007 (UTC)


 * Yes, we had to make few changes to make it work. We shall publish the modified code as soon as time permits together with the  enhancement code, see next level 2 section. -- Purodha Blissenbach 12:46, 1 December 2007 (UTC)

Enhancement: allow call via
It would be desirable to make the extension callable via  parameter, that means, the following calls should be producing identical results: We are currently implementing this. --Purodha Blissenbach 15:58, 16 October 2007 (UTC)
 * Special:PdfPrint/Georg_von_Rheinbaben
 * Special:PdfPrint?page=Georg_von_Rheinbaben
 * Georg_von_Rheinbaben?action=pdf

Prince : A better alternative to HTMLDOC?
Is it possible to use Prince instead of HTMLDOC?

Prince seem to works the same way but it also supports CSS.

Seem to be working great with MW 1.11.0 after modifying the PdfExport.php.

 * Download and install Prince


 * Download the PHP5 Prince Accessories


 * Move prince.php into the PdfExport folder

require_once( 'prince.php' ); // make a temporary directory with an unique name // NOTE: If no PDF file is created and you get message "ERROR: No HTML files!", // try using a temporary directory that is within web server space. // For example (assuming the web server root directory is /var/www/html): // $mytemp = "/var/html/www/tmp/f" .time. "-" .rand. ".html"; $mytemp = "/tmp/f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1");
 * In PdfExport.php, add the line:
 * Remove the following section in PdfExport.php:

header("Content-Type: application/pdf");
 * 1) Write the content type to the client...


 * 1) uncomment this line if you wish acrobat to launch in a separate window
 * 2) header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));

flush;


 * 1) if the page is on a HTTPS server and contains images that are on the HTTPS server AND also reachable with HTTP
 * 2) uncomment the next line
 * 3) system("perl -pi -e 's/img src=\"https:\/\//img src=\"http:\/\//g' '$mytemp'");

passthru("htmldoc -t pdf14 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'");
 * 1) Run HTMLDOC to provide the PDF file to the user...

unlink ($mytemp); $prince = new Prince('actual path to Prince executable');
 * Replace with: (Don't forget to put in the actually path to Prince executable)
 * 1) Path to Prince executable

$prince-&gt;convert3($html);
 * 1) Convert an XML or HTML string to a PDF file,
 * 2) which will be passed through to the output buffer of the current PHP page.

header("Content-Type: application/pdf");
 * 1) Write the content type to the client...

header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));
 * 1) uncomment this line if you wish acrobat to launch in a separate window


 * Then sit back and enjoy the magic :)
 * Images works !!!!!
 * You can even tell Prince to use the main.css from your skin and make the pdf export look just like the actual wiki.
 * It is also easy to use your own css to make the pdf look very professional. Check out the samples section for some great inspiration.
 * The only major draw back is the license cost for running on server. It is only free for personal use.


 * yes... USD 1900 for academic license... too much for me :-) And HTMLDOC isn't HTML4.0 compliant (i.e. no CSS!). Thus, no PDF from wiki pages, at the moment... --GB 22:13, 26 November 2007 (UTC)

PdfExport ERROR if set User rigths
I tried to use my own MW for giving PDF file to another software. An external software passes the appropriate URL to explorer and the PDF is ready.

If user rights is default there is no problem. After I changed the user rights (see below), there is not any content in the generated PDF file, except the navigation menu on the left-hand side of the wiki's page.

I set the user rights: $wgGroupPermissions['*'   ]['read']         = false; $wgGroupPermissions['user' ]['read']        = true; $wgWhitelistRead = array("Main Page", 'Special:PdfPrint', 'Special:Userlogin',);

How can I get the whole content of the page? Can I change somehow the USER ID in the PdfExport.php? Any other solution?

Anonymous Access for Images to Work
For htmldoc to render the images it requires anonymous access

Since the file needs to be saved locally in order to remove the wiki escape characters that would otherwise cause htmldoc to fail…and the file contains absolute urls to the images.. it is in fact the server that is requesting the page.. not the user.

This is of particular importance for windows users who have enabled remote auth or alike, since anonymous access has to be disabled for these to work.

My current work around is to setup a second website under iis that allows anonymous access however only by the server's ip address, then setup a host entry that uses the anonymous site ip and the live site hostname. Then when the server goes to connect to the URL it gets anonymous access to the images and they are included in the PDF.

An alternative MAY be to setup an application pool that uses a user credential that you have setup on the wiki.. but don’t know if this will work as htmldoc is run at commandline so most likely the connection will be servername$ Suspect 06:29, 7 December 2007 (UTC)


 * Seems complicated. What I did on my site is simply put a condition in the LocalSettings file to enable authentication only if the request comes from anything else than localhost:

if ($_SERVER['REMOTE_ADDR']!='127.0.0.1') { require_once( "includes/LdapAuthentication.php" ); ... authentication code continues ... } --Guimou 03:58, 12 December 2007 (UTC)


 * Hmm, dont think that is going to work in a Windows installation, since the image references are absolute, the PHP code isnt even executed..IIS is blocking anonymous access to the images.. BTW, I couldnt get the application pool setup to work -- Suspect

Working code for MW 1.11 on Fedora 8
The code published on the "extension" page works most of the time but does not output images.

Also, on our server (Fedora 8, MW 1.11.0) the URL associated with the Print As PDF command in the toolbox sometimes has %0A tacked on at the end, and not %0D%0A as DavidJameson reported on his server.

I therefore ended up using the code in Extension_talk:Pdf_Export above as PdfExport.php, corrected line 103 to read ?><a href="<?php echo htmlspecialchars( str_replace('%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] )) ?>"><?php and changed line 60 to read $bhtml = str_replace ('/images/',$wgServer . '/images/', $bhtml); as per dgrant 18:30, 25 October 2006

I can now export PDF files with images. Hvdeynde 11:04, 19 December 2007 (UTC)

PDF Link is not here
I searched for google and mediawiki, but found no hint. I am using MW1.9.3 on WAMP.


 * I tried typing manually wiki/index.php?title=Special:PdfPrint/Accueil, and it worked, ever since we did not run with rewrited urls. And direct links still not there.

Has someone a public demo so I can see what I should have in the sidebar ?

What could I do wrong so it does not work ?

Any help welcomed. I tried hacking the toolbox filling, but it was quite ... complex.

212.157.112.26 15:17, 20 December 2007 (UTC) // [mailto:mathiasm@laposte.net Mathias]

Solution: it comes from the template hook.. If your template don't use the standard call from monobook, you must change it accordingly.

Passthru unable to fork
I got a php warning saying that Passthru was unable to fork. I am running PHP and MediaWiki on Windows and IIS, and the solution seems to be giving read and read execute access to cmd.exe, which is found in the system32 folder. I also needed to make sure my web user could write to the temp folder location specified in the script.

Hope this helps someone!

Don't like this extension
CGI Error: The specified CGI application misbehaved by not returning a complete set of HTTP headers.

Like others have said, there IS NO INSTALLER despite claims to the contrary. This thing is a MESS to set up. I did not get it working and I'm probably going to give up. Thanks for nothing.

Special:PdfPrint should not crash when clicked
From Special:Specialpages, it's possible to click the "PdfPrint" special page. Since it has been passed no parameters, it crashes:

PHP Fatal error: Call to a member function getNamespace on a non-object in D:\\mediawiki\\w\\includes\\Article.php on line 160, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP Stack trace:, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP  1. {main} D:\\mediawiki\\w\\index.php:0, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP  2. MediaWiki->initialize D:\\mediawiki\\w\\index.php:89, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP  3. MediaWiki->initializeSpecialCases D:\\mediawiki\\w\\includes\\Wiki.php:45, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP  4. SpecialPage::executePath D:\\mediawiki\\w\\includes\\Wiki.php:201, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP  5. SpecialPdf->execute D:\\mediawiki\\w\\includes\\SpecialPage.php:459, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages PHP  6. Article->getContent D:\\mediawiki\\w\\extensions\\PdfExport\\PdfExport.php:64, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages

With no parameter, PdfPrint should display a form where the user can enter an article name.

--Maiden taiwan 17:17, 28 February 2008 (UTC)

String Replacement Error
So, I found a couple problems with this plugin: First off, if we have /w in the text of the article, it gets replaced with the website address, which is not proper behavior. I changed the str_replace line to: $bhtml = str_replace ('src="'.$wgScriptPath, 'src="'.$wgServer       . $wgScriptPath, $bhtml); which seems like a far more logical thing to do. I also did a $pageFile = str_replace(":", "", $page); and changed the output file in                         header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page)); to header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $pageFile)); This causes the filename to have the semicolons removed, which is illegal on Windows for sure, and I had some trouble on OSX as well.

How to set user permissions to access this extension
Hi there, I was thinking that one possible sollution to decrease the ammount of bandwith used when used this extension could be to limit the access to it to certain users, for example anonymous or other types of users (custom users even). Is there an easy way to do this just like when you put an extension to work only with sysops (merge and delete, for instance) and avoid touching the whitelist? the example is that it should be like this -> $wgGroupPermissions['*']['pdfexport'] = false;

Thanks! --Juanan 16:47, 13 March 2008 (UTC)

Fatal Error when i click on Special-Pages on "PDF-Druck" (MW 1.12.0)
When I click under Special-Pages on den "PDF-Druck" Link then I've got the following message:

Fatal error: Argument 1 passed to Article::__construct must not be null, called in /srv/wiki/techwiki/extensions/PdfExport/PdfExport.php on line 57 and defined in /srv/wiki/techwiki/includes/Article.php on line 44

When I click on the Main-Page under the Tools-Menu on "Print as PDF" then it works fine.

What's wrong?

Thanks! --Hampa 09:42, 13 May 2008 (UTC)

parsing page breaks?
HTMLdoc seems to support page breaks by inserting  but pdfExport doesnt seem to parse it or pass it to HTMLdoc.

forcing the HTML comment to show in the final rendered page either with code pre or nowiki includes the comment inline but still doesnt render properly on PDF output

...

Does anyone else have an update on this? Updating some documentation into the Wiki, and it would be useful to have some page breaks for dumping pdfs out for users

Scott.harman 14:00, 28 May 2008 (UTC)

...

Quick and dirty way of getting this to work... After the code $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); add the line $bhtml = str_replace ('&l_t;!-- PAGE BREAK --\&g_t;','', $bhtml);"

Remove the underscores in &l_t and &g_t. I just couldn't get them to show in the viewed page otherwise.

07:18, 7 July 2008 (UTC)

"??????" signs instead of russian letters
When doing export of russian language pages we have all "?" signs instead of all russian letters!!! :( Pages are in UTF-8 encoding.

Me to. How to fix it?

And me. Tried converting hmtml body to cp1251 and passing different encoding as htmldoc parameter

-- So, any way to make it work?

Solution
passthru("htmldoc -t pdf14 --charset iso-8859-1 --color ... to passthru("htmldoc -t pdf14 --charset cp-1251 --color ...
 * Edit file PdfExport_body.php
 * change two appearances of utf8_decode($...) to iconv("UTF-8", "cp1251", $...)
 * change
 * Download Cyrillic fonts from here (direct link) and put them instead of ones in /usr/share/htmldoc/fonts/
 * PROFIT!

Not solved
Links to fonts are both broken. Will this solution work for Windows?

Solved for me
Cyr Fonts found with Google at upload.com.ua Ubuntu server LAMP, UTF-8, htmldoc 1.8.27 < Kod.connect 13:13, 16 April 2010 (UTC) >

Blank Page
I'm just getting a blank page.


 * RHEL 5
 * HTMLDOC 1.8.27-4.99
 * Linux/Firefox

'''Check what $wgTmpDirectory is set too, default is "{$wgUploadDirectory}/tmp". Quick fix create a tmp folder in the images directory'''

How can I make the TOC to be Bookmarks in the PDF?
Is it possible to write a simple code that converts article headlines to PDF bookmarks? Any idea?

Possible to change extension to use SkinTemplateToolboxEnd hook in 1.13
Has anyone tried rewriting this extension to use the new SkinTemplateToolboxEnd hook in MW 1.13, rather than MonoboookToolboxEnd that it currently uses?

When using this extension with the Modern template, the URl never appears because Modern doesn't use MonobookToolboxEnd or anything similar.

Italian translation
This is the snippet of code to be add to localization file for the italian translation:


 * I have added your translation to the code. Best regards --Marbot 19:56, 17 August 2009 (UTC)

Exported File Name
For some reason, setting $wgPdfExportAttach to true didn't help me with the download file name. Even then, I was still getting "index.php". So I commented out the if statement, but then I got ".pdf" because $page doesn't seem to be defined inside outputpdf. So I made the following changes to function outputpdf inside PdfExport_body.php:

becomes:

Then, a little futher down:

becomes:

Problems with path definitions
It took me a long while to realize why wasn't it working. I am using and Windows Server 2003 + XAMPP and it appeared that the path for the creation of the temporary file is different than the one needed for the HTMLDoc. The one for the file creation needed C:\\xampp\\htdocs\\mediawiki\\images\\; while the one for the HTMLDoc needed C:/xampp/htdocs/mediawiki/images/. I hope this works for some of you how experienced the same problem!

Adding Line to LocalSettings.php Results in Blank Page
Crazy problem here...I am missing something Redhat 4 Running MediaWiki 1.7.1 Standard Template LDAP authentication to Domino HTMLDOCS has been installed Source pages have been created as per the instructions

When I add the following to the LocalSettings.php:

require_once("extensions/PdfExport/PdfExport.php");

All I get is a blank page Main Page.

As soon as I remove the line the wiki works again. I am unable to get any of this extension to work within the wiki, never see any buttons regarding the ability to export to PDF from within a post or via the special pages.

Any suggestions?

Thanks

Bernard

I had the same problem
It was a cut and paste error

In my error logs i found: [Mon Oct 19 11:07:06 2009] [error] [client ::1] PHP Parse error: syntax error, unexpected T_STRING, expecting ')' in /usr/share/mediawiki/extensions/PdfExport/PdfExport.php on line 9 and when i looked @ line 8 a ' was missing.

Cut and paste is a terrible way to distribute an extension.

Problems with Long Line rendering
Has anyone had any luck getting this extension to not cut off long lines of preformatted text. I have some pages with generated output that is longer than can fit in typical 80-characters. If I click on the Mediawiki "Printable" page, these get conveniently resized to fit on a standard letter size paper. However, if I send the page to PDFExport, these lines just get cut off. Any suggestions would be appreciated.

Thanks, Dan

Adding Logo and current date to pdf output
Hi all,

I want to have a logo in the header of each page, which is converted to pdf. I also would like to have the current date and time to be printed in the footer. It seems to be possible if you start the conversion by the GUI of HTMLDOC, but is it also possible when starting HTMLDOC via MediaWiki?

Thankful for every kind of help!

Greeting, Stefan

Re : Adding Logo and current date
You can customize htmldoc cmd as follow :

--logoimage PATH_TO_YOUR_IMAGE --header l d

more info using man htmldoc

Re : Adding Logo and current date to pdf output
Yrs It will be usefull to have a way to customize the PDF Header with any content (wikitags ?, HTML code ? php code to request Article object data like owner, catégorie etc ?)

Fabrice

Problem with generating images
I use the Version PdfExport 2.0 with MW 1.13.0. By generating no images are in the document, in place of the images are points.

Any suggestions would be appreciated.

Thanks, Markus

I have fixed the problem, the same as in 13.1.4

http://www.mediawiki.org/wiki/Extension_talk:Pdf_Export#Images_When_Wiki_Is_Root_of_Domain

1k PDF on 1.15.0
Installed extension on 1.15.0 today. I get a 1k file. We downloaded HTMLDoc and compiled from source. Set path to it. We verified that HTMLDoc is generating valid PDFs. Any ideas? Chanur 20:10, 24 June 2009 (UTC)

Complex diagrams and Thumb-nails
Readers often expect to be able to zoom in to complex diagrams in a pdf to see the details. I would like to re-create this effect in the pdf files I generate from MediaWiki. I can achieve the desired effect by maually changing the references in the intermediate html that is used as input to HTMLDoc. The change is in the IMG tag

replace /mediawiki/images/thumb/9/99/foo.jpg/680px-foo.jpg width=680 height=477 with /mediawiki/images/9/99/foo.jpg width=100%

This has the effect that HTMLDoc picks up the original full size image rather than a thumb-nail and then shrinks it to 100% of the page width. It displays correctly and the pdf reader can zoom in for more detail.

I would like to see this included, either as a default or configurable option, in the PdfExport extension,

Niall Teskey 27 July 2009

IIS7 and Mediawiki 1.15.1 does not work
Only receive Error 500 when i try to generate a PDF.

Need usage instructions
Okay, so we've got installation instructions, but what about usage? Is there content or link that could be provided on how users (even noobs) could evoke this extension? 140.80.199.91 14:34, 27 August 2009 (UTC)

--

I have put in all four source files, and included the file in LocalSettings. I can verify that PdfExport.php IS being included, but nothing is happening. I have no link to "Export a Pdf" in my toolbox. Nothing new in special pages. No changes, anywhere. MediaWiki 1.15

Sections of pages possible?
I'd like to be able to export just a portion of a page, is there anyway to append the code to write a portion of a page to the tempdir so that only a specified section gets created as a pdf?

for example everything under a certain heading...

UTF-8
Where is the problem that this extension does not work entirely in utf-8? 91.155.182.79 17:03, 21 November 2009 (UTC)

Datei beginnt nicht mit "%PDF-" / File does not start with "%PDF-"
Hi, I have just installed this extension. When trying to print a page Acrobat provides the error message as stated above. Is there someone out there to help? Thank you and cheers --kgh 20:14, 5 December 2009 (UTC)
 * My mistake. Please do not bother about this. Cheers --kgh 22:08, 5 December 2009 (UTC)

Specify custom CSS?
Hi.. I have installed PdfExport 2.2.2 in MediaWiki 1.15.1 running on a Windows 2000 Server, IIS server with PHP 5.2.10 and HTMLDOC 1.8.27. Everything works beautifully!! Just had to add the HTMLDOC-folder to the system PATH, and viola! This plugin looks very nice, but the resulting PDF could be "spiffed up" abit... Is it possible to specify a custom CSS-file that should be used? I'd like to create a separate CSS-file for the PdfExport, instead of using the existing CSS-files in MediaWiki.. Thanks for sharing the code, Thomas! -- Tor Arne

CSS Support in Prince
HTMLDOC does not support CSS in the 1.8 versions -- apparently this is planned for 1.9, but yet to be seen. Someone above recommending Prince, and I did some substantive re-writing of the extension source to allow me to use Prince. Works like a charm; much better quality output, and you can use CSS files. It's a little less reliable, however, and tends to throw errors on some unexpected html characters. I'm troubleshooting right now, but if there's interest once it's a little more solid I'll post my source. -- Ram

php-errors inside pdf
Hi!

Im running Apache 2.2, PHP 5.3.1 and the newest mediawiki. When I try to generate .pdf's with your extionsion I get a 500byte PDF with the following output:

Notice: Undefined variable: f in D:\Program Files\Apache Software Foundation\Apache2.2\htdocs\wiki\extensions\PdfExport\PdfExport_body.php on line 83

Notice: Undefined variable: process in D:\Program Files\Apache Software Foundation\Apache2.2\htdocs\wiki\extensions\PdfExport\PdfExport_body.php on line 112

Warning: proc_close expects parameter 1 to be resource, null given in D:\Program Files\Apache Software Foundation\Apache2.2\htdocs\wiki\extensions\PdfExport\PdfExport_body.php on line 112

What can I do?


 * Hi Suven, I suppose this problem is related to PHP. See Download. Cheers --kgh 15:34, 3 February 2010 (UTC)

I get same error. I see that process is opened with $htmldoc_process = proc_open(blah blah) on line 103 but then line 112 says proc_close($process). How can this not be a bug in what I downloaded? I'm going to try to give proc_close the same variable used with proc_open and see what happens.

$f on line 83 is not mentioned anywhere else in the file. Where should this have been set?

foreach ($pages as $pg) { $pagestring .= $this->save1page ($pg); if ($f == null) { continue; }                         }

Same here!! Using: PHP 5.2.0-8+etch16 (cli) (built: Nov 24 2009 11:14:47) Copyright (c) 1997-2006 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2006 Zend Technologies Adn Wiki 1.15.1 Getting Errors: [01-Mar-2010 13:22:56] PHP Notice: Undefined variable: f in /var/www/help9/extensions/PdfExport/PdfExport_body.php on line 92 [01-Mar-2010 13:29:15] PHP Notice: Undefined variable: process in /var/www/help9/extensions/PdfExport/PdfExport_body.php on line 126 [01-Mar-2010 13:29:15] PHP Warning: proc_close expects parameter 1 to be resource, null given in /var/www/help9/extension/PdfExport/PdfExport_body.php on line 126

It hangs very long (~5 min) on fpasstrough. And when it's finished no images are in the pdf file. -> Solved: Server is in a DMZ. So he could not resolve the html links.

I have the same problem, file of 191 bytes: Warning: proc_close expects parameter 1 to be resource, null given in /homez.318/forumims/www/wiki/extensions/PdfExport/PdfExport_body.php on line 112

I changed the variable process to htmldoc_process, but now, the file is 3 bytes and empty...

When I try directly htmldoc on a html file, it works... :( Help please

I was getting these same errors in my httpd error log, but the extension was happily producing the file (with images after I commented out the str_replace on line 57 of PdfExport_body.php since I have a virtual Apache host). So, I don't think these errors are the source of any real problem. To make them go away, I simply surrounded the offending code in PdfExport_body.php with an if stanza using the isset function. In my else stanza, I executed the same thing he would have had the variable existed but been null or 0. i.e.:

at line 83 if (isset($f)) { if ($f == null) { continue; } } else { continue; }

at line 114 if (isset($process)) { $returnStatus = proc_close($process); } else { $returnStatus = 0; } --jcw 21:15, 15 April 2010 (UTC)

Getting article HTML
Hi.
 * MediaWiki version: 1.15.2
 * PHP version: 5.3.0
 * MySQL version: 5.1.36
 * URL:

I'm writing new Mediawiki extension based on Extension:Pdf Export. I'm trying to export an article.

The PDF Export extension includes the following function:

At the end of this function,  is empty. I tried to print out (wrote to a log file)  after line 24, but it's empty too.

What could be the problem? Is there another way to get the HTML of an article?

Thanks!

Nahsh 11:10, 11 April 2010 (UTC)

Why does PdfExport ignore several style definitions of commonPrint.css ?
Netzschrauber: I have installed PdfExport (Version 2.3 (2010-01-28)) and Htmldoc on a Ubuntu 8.04 LTS Server, using with MediaWiki 1.15.1. Everything works fine but output seems to ignore some (important) CSS style definitions made with commonPrint.css: Thanks!
 * No borders (tables, headlines, images)
 * Text alignment of td elements in . No <tt> vertical-align: top; </tt>
 * Alignment of thumbnails thumb|right . All images are left-aligned.
 * Font-style of <dd> elements
 * Color of elements. For example <tt> color: #999; </tt>

Documents are only 3 bytes big.
Hi,

the Documents i create are only 3 bytes big. Of course the File is corruptet, but whats my failure?

Mediawiki 1.15.0 PHP 5.2.10 (isapi)

Hi, did you restart after installing htmldoc? that did the trick on my system.

can´t open pdfs
Hello,

sorry for my bad english, but i need help, please.

I have installed this extension on Windows with Xampp (PHP5.2.3 (apache2handler) MySQL 5.0.45-community-nt) with MediaWiki 1.15.3. In my Wiki i have got a button "download as PDF" now. When i klick on this button i can download the page as pdf. But when i try to open ist FoxitReader says "format error: not a PDF or corruptet". AdobeReader can´t open it, too.

I don´t understand in the manual where to "put htmldoc in Path enviroment variable". HTMLDOC for Windows was installed under C:\Programme\HTMLDOC\

Is this right? I don´t see any other configurations.

What`s wrong with my installation, im very confused ?

Johannes741 13:30, 25 May 2010 (UTC)

Restart the system :-)

seen above