Extension talk:Pdf Export/LQT Archive 1

ERROR PDFExport if set User rigths
I set the user rights: $wgGroupPermissions['*'   ]['read']         = false; $wgGroupPermissions['user' ]['read']        = true; $wgWhitelistRead = array("Main Page", 'Special:PdfPrint', 'Special:Userlogin',); If I give the URL to HTMLDOC - after I changed the user rights -, there is not any content in the generated PDF file, except the navigation menu on the left-hand side of the wiki's page. How can I get the whole content of the page? Can I somehow change the USER ID if I generate the PDF file?

There seems to be a licensing problem with htmldoc for windows installations ?

I did not manage to open pdf file, so I had to add a header line to download the pdf

Furthermore, htmldoc does not support unicode, I began a translation utf8 to latin1 for displaying french characters correctly , this may need enhancements.
 * I filled in the gaps for german umlauts based on the site you linked to. ~gandm

mailto:sancelot@free.fr here is my working file with windows  'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

// thanks to interesting http://klaus.e175.net/code/latin_utf8.phps link // only french cars are done function utf8_latin1($text){ return strtr($text,array( "\xC3\x9F"=>"&amp;szlig;", "\xC3\xA4"=>"&amp;auml;", "\xC3\xAB"=>"&amp;euml;", "\xC3\xAF"=>"&amp;iuml;", "\xC3\xBC"=>"&amp;uuml;", "\xC3\xB6"=>"&amp;ouml;", "\xC3\x84"=>"&amp;Auml;", "\xC3\x8B"=>"&amp;Euml;", "\xC3\x8E"=>"&amp;Iuml;", "\xC3\x9C"=>"&amp;Uuml;", "\xC3\x96"=>"&amp;Ouml;", "\xC3\xA2"=>"&amp;acirc;", "\xC3\xAA"=>"&amp;ecirc;", "\xC3\xAE"=>"&amp;icirc;", "\xC3\xB4"=>"&amp;ocirc;", "\xC3\xBB"=>"&amp;ucirc;", "\xC3\x82"=>"&amp;Acirc;", "\xC3\x8A"=>"&amp;Ecirc;", "\xC3\x8E"=>"&amp;Icirc;", "\xC3\x94"=>"&amp;Ocirc;", "\xC3\x9B"=>"&amp;Ucirc;", "\xC3\xA0"=>"&amp;agrave;", "\xC3\xA8"=>"&amp;egrave;", "\xC3\xB9"=>"&amp;ugrave;", "\xC3\xA9"=>"&amp;eacute;", "\xC3\x80"=>"&amp;Agrave;", "\xC3\x88"=>"&amp;Egrave;", "\xC3\x99"=>"&amp;Ugrave;", "\xC3\x89"=>"&amp;Eacute;", "\xC3\xA7"=>"&amp;ccedil;" //     "%3A"=>"/" )); } function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Sauvegarder en PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText;

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('href="#', 'href="' . $wgServer . '/' . $page . '#', $bhtml); $bhtml=utf8_latin1($bhtml); #$html = "  ". $page. "  " . $bhtml. " ";                        # thanks to mediawiki AT sandeman.freesurf.fr : $html = "  ". utf8_decode($page). "  " . $bhtml. " ";                        // make a temporary directory with an unique name $mytemp = "c:\\temp\\f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1"); # Write the content type to the client... header("Content-Type:application/pdf"); header('Content-Disposition: attachment;filename="'.$page.'.pdf"'); flush;

# Run HTMLDOC to provide the PDF file to the user... passthru("htmldoc -t pdf14 --color  --quiet --jpeg --webpage $mytemp "); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); } function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage}" ) )                ); return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?>msg( 'pdf_print_link' ); ?>data['nav_urls']['pdfprint']['href'] ) ?>">msg( 'pdf_print_link' ); ?>

Is it possible to intergrate multiple page with windows version
Is it possible to intergrate multiple page with windows version?

PDF in Spanish
With the diff below I could solve the ñ and accent problem

diff

61c61 < --- >                        $bhtml = utf8_decode($bhtml); 76c76 <                        passthru("htmldoc -t pdf --quiet --jpeg --webpage '$mytemp'"); --- >                        passthru("htmldoc -t pdf --charset 8859-1 --quiet --jpeg --webpage '$mytemp'");

save diff to patch.txt next execute

patch SpecialPdf.php patch.txt

--Esacchi 20:12, 2 August 2006 (UTC)

SpecialPDF and MimeTeX
This is a really cool and useful extension. HOwever, it is ignoring an extension we added to support MimeTeX so that the output does not include any math created through that extension.

Our MimeTeX extension replaces Any LaTeX formula

with 

Is there another way we should be generating this so that SpecialPDF can capture the image? DavidJameson 20:32, 4 August 2006 (UTC)

Updated for unicode, multiple articles, and images
Searched around for a way to do multiple articles to PDF, had to combine what was listed here and what was contained in wiki2pdf. Works the same way as SpecialPDF.php, put it in your extensions folder. HTML files are created (for processing) in your /webroot/wikiroot/pdfs folder (so create it if you don't have it) or another folder of your choice. It still uses HTMLDOC with some switches to format headers and footers, and there are string substitutions for the images exported out of the Wiki...

 'myPdf',        'author' =>' Thomas Hempel, Simon Wheatley, and others',        'description' => 'prints a collection of articles as a pdf book',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfmyPDFNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfmyPDFToolbox';

function wfmyPDF { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint2' => 'PdfPrint2' , 'pdf_print_link2' => 'Export PDF book'));

class myPDF extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function myPDF { SpecialPage::SpecialPage( 'PdfPrint2' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer; //Get the name of the main article from which this routine was called // - this will be used for the book/file name $page=isset($par) ? $par:$wgRequest->getText('page'); $title=Title::newFromText($page); $article=new Article($title); //write a header file with the title HTML tab for the book - header.html //all pdfs will be written to /webroot/wikiroot/pdfs $doctitle=str_replace("_", " ", $page); //write the header file: $mytemp = $_SERVER["DOCUMENT_ROOT"].$wgScriptPath."/pdfs/header.html"; $article_f = fopen($mytemp, 'w'); $doctitle=str_replace("_", " ", $page); fwrite($article_f, "  ".$doctitle."     "); fclose($article_f);

//add this header file to the list of files that htmldoc will process $filelist=$mytemp; $c=1;

//get the article content, i.e. a list of articles to print to pdf //each one is denoted by curly braces $SaveText=$article->getContent; $i = strpos($SaveText,"{"); while ($i >= 0) { $j = strpos($SaveText,"}"); if ($j <= $i) break; $art = trim(substr($SaveText, $i+1,$j-$i-1)); $SaveText=substr($SaveText, $j+1); //Go fetch the article that was listed $title1 = Title::newFromURL( $art ); $article1 = new Article($title1); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = true; //parse the article into HTML $parserOutput = $wgParser->parse( $article1->preSaveTransform( $article1->getContent ) ."\n\n",                               $title1, $parserOptions ); //get the html content, then format it to remove any wiki escape chars $bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml); //make sure all links are absolute $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); //make sure all image tags are true $bhtml = str_replace ('&lt;img', '', $bhtml); //write a new title and H1 heading - used for the chapter in the pdf book $html = "  ".$art."   ".$art." \n".$bhtml."  "; //output article to next html file in list: $mytemp = $_SERVER["DOCUMENT_ROOT"].$wgScriptPath."/pdfs/file".$c.".html"; $article_f = fopen($mytemp, 'w'); fwrite($article_f, $html); fclose($article_f); $c=$c+1; $filelist=$filelist." ".$mytemp; $i = strpos($SaveText,"{"); //limit output files to 100 - used in testing in case things get out of hand if ($c > 100) break; }                       putenv("HTMLDOC_NOCGI=1");

# Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user... passthru("htmldoc --book -t pdf14 --bodyfont Helvetica --header t.1 --footer c.1 --no-links --linkstyle plain --charset 8859-1 --color --quiet --jpeg --webpage ".$filelist); unlink ($filelist); }       }        SpecialPage::addPage (new myPDF); }

function wfmyPDFNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint2'] = array(                       'text' => wfMsg( 'pdf_print_link2' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint2', "page=". wfUrlencode( "{$skintemplate->thispage}" ) )                ); return true; }

function wfmyPDFToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint2'] ) ) if ( $monobook->data['nav_urls']['pdfprint2']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link2' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint2']['href'] ) ?>"><?php echo $monobook->msg( 'pdf_print_link2' ); ?></a><?php ?></li><?php }       return true; } ?>

MP 09:45, 30 August 2006 (UTC)
This doesn't work for me - I've added the php file to the extensions folder and altered Localsettings.php to include it. When I navigate to my site I just get a blank page rather than logon. I'm obviously missing something blindingly obvious.....


 * Windows XP
 * Apache 2
 * php 5.1.4
 * MySQL 4.1.16
 * Mediawiki 1.7.1

---CheShA Says: "You haven't set permissions on the SpecialPdf.php file; Apache can't access it"

SCW 3:14pm, 2nd Sept, 2006 (CT)


require_once("extensions/myPDF.php"); hghg

Hello Editing the Page

Bad argument to HTMLDOC
myPDF is not working for me. It's generating a bogus arg to HTMLDOC and so instead of getting a PDF file, I get a file with the error message:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: Bad option argument "--charse "!

Note that there should be a 't' at the end of the argument but instead there is an embedded CR control character

DavidJameson 14:22, 5 September 2006 (UTC)

SCW, 05sept06, 10:09CT
A line break got into the cut and paste on that line for htmldoc passthru. take out the linebreak to join the lines together to make the argument to htmldoc '--charset' and all should be okay.

I'll be posting a newer version that has th option for creating PDF books with a title page, TOC and nice nesting of articles...

Simon.

No HTML files found
I must be missing something else....after fixing the linebreak problem, I opened a wikipage and then clicked on Export PDF book.

This time I got a file with the error:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

Usage: htmldoc [options] filename1.html [ ... filenameN.html ] htmldoc filename.book

What's this notion about a "collection" of articles? Is there something I'm supposed to do to "collect" some articles together before I can print them? How would I do that, and why would I need to? The "Export PDF Book" link shows up in Toolbox when I'm viewing a particular page - So how can generate THAT page as PDF but including all images?

DavidJameson 15:36, 5 September 2006 (UTC)

okay, the following script is based of the original, temporary PDF files are saved to /tmp, like they were in the original. This script will not deal with putting multiple articles into one PDF file (see above for that). It will handle image files.

<?php

if (!defined('MEDIAWIKI')) die; require_once ("$IP/includes/SpecialPage.php");

$wgExtensionFunctions[] = 'wfSpecialPdf'; $wgExtensionCredits['specialpage'][] = array(       'name' => 'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Print as PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml);

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('&lt;img', '<img', $bhtml); $bhtml = str_replace ('/&gt;', '/>', $bhtml); $html = "  ". $page. "  " . $bhtml. " ";                        // make a temporary directory with an unique name $mytemp = "/tmp/f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1"); # Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user...                       passthru("htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage '$mytemp'"); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); }

function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage} " ) )                ); return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['href'] ) ?>"> <?php echo $monobook->msg( 'pdf_print_link' ); ?></a><?php ?></li><?php }       return true; } ?>

Almost there (grin)
Well, this version almost works perfectly - it gave me a nice PDF file with explicit images referenced in the wiki page.

However, it crashes when trying to handle one of the image tags produced by our MimeTex math generator.

E.g.

<img src="/cgi-bin/mimetex.cgi?\green f(\xi)=\int_{-\infty}^\xi e^{-\tau^2}d\tau { {x \atop y } }" align="absmiddle" border="0" alt="TeX Formula">

seems to cause a crash.

I wonder if the code that tweaks the IMG tag itself is getting confused with the more sophisticated stuff in this particular image reference.

DavidJameson 12:10, 6 September 2006 (UTC)

PHP error
Just for grins, I ran the process through a PHP debugger. The debugger barfed with the error below. Error: E_ERROR Call to a member function getNamespace on a non-object at /var/www/html/wikiroot/riskit/includes/Article.php line 155

The line it's complaining about is			if ( $this->mTitle->getNamespace == NS_MEDIAWIKI ) { which is found in the function getContent in Article.php

(P.S.....someone needs to install the GESHI syntax highlighting extension on wikimedia.org)

DavidJameson 12:18, 6 September 2006 (UTC)

PHP error
OK - I've figured out both problems (to an acceptable extent)

1) The PHP bug is due to the fact that sometimes (and I don't know why it's only sometimes), the URL associated with the Print As PDF command in the toolbox has %0D%0A tacked on at the end. No idea where this is coming from but it can be removed by modifying the echo statement on line 103, adding a str_replace function to remove the extra characters. php echo htmlspecialchars( str_replace('%0D%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] ))

2) The reason the math images weren't being processed was because the URL in the SRC attribute of the IMG tag did not include the server. I modified my mimetex extension to include the server but this problem will arise again for anyone else who references an image without using a server in the URL. The full solution will require the text to be searched for all image urls, examine them to see if there's a server part and if not, insert the $wgServer into the string. Probably can be done quickly with a regex. DavidJameson 20:31, 6 September 2006 (UTC)


 * What's wrong with just doing this: "$bhtml = str_replace ('/images/',$wgServer . '/images/', $bhtml);" instead of the $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); that you have there? Made the math images all work for me. --dgrant 18:30, 25 October 2006 (UTC)

Error
On some pages I get this error. I'm using DavidJameson's code above. --dgrant 18:34, 25 October 2006 (UTC)

Fatal error: Call to a member function getNamespace on a non-object in /var/www/mediawiki-checkout/includes/Article.php on line 150


 * Ok, now I'm getting this on all pages for some reason. 216.13.217.231 01:30, 7 November 2006 (UTC)

Does not work with 1.8.2 of mediawiki.
No way no how. Neither does any of the code on this page.

Error: Fatal error: Call to a member function getNamespace on a non-object in /var/www/includes/Article.php on line 150


 * I have this Error in Mediawiki 1.8.2 and 1.9.0 when calling Special:PdfPrint. It works for me when clicking on the link in the toolbox. --Ikiwaner 23:57, 15 January 2007 (UTC)


 * I'm running 1.6.7 on BluWiki and getting the exact same error... If someone figures this out can they email me? --SamOdio 14:29, 20 March 2007 (UTC)

No Images
Great extension, works really well except it doesn't appear to include images in the export. Can anyone please confirm that this is normal and that I haven't done anything wrong?

Thanks, CheShA.

Still No Images
Hi, can anyone tell me what to do that my PDF Exported File includes the Pictures from the Original article?

I´ve heard there´s a PDF Hack to fix that problem. If anyone had an idea,... please let me know

THX [mailto:DarkManX1@gmx.net X-Cident]

Still No Images
Hi there, someone can help us about this problem ? the PDF exporting is great, but the link is incorrect for the image "Image:Image_Name.gif", and the not the direct link itself.

Help please ! th3_gooroo@hotmail.com

Empty Files
Help, i only get empty files. But when i execute this line on the console i get a working pdf... htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage '$mytemp' > test.pdf Whats wrong ? Maybe an apache-issue or something ?


 * I had the same problem. I changed this line to  and it worked. The path before   is usually the same as   in LocalSettings.php.   is another usual path to try. --Ikiwaner 20:06, 20 December 2006 (UTC)


 * I had problems with the permissions on  which I fixed by running   147.209.216.245 01:31, 15 June 2007 (UTC)

Clean code
Hi there, would be nice if you could post the clean code with all above corrections included. I tried them all with mediawiki 1.8.2 but I can't get it to work. Cheers Florian

Test and working on 1.9
this was tried on Linux/Apache/PHP5 with 1.9 in a software configuration very simillar to Wikipedia and works fine. tom

Working code for Windows with MediaWiki v1.9
The PDF export worked for me on Windows after I fixed the path AND more importantly on line 80, I had to change '$mytemp' to $mytemp - i.e. remove the single quotes aroung $mytemp.

-Vivek Agarwal

Here is the complete source:

<?php

if (!defined('MEDIAWIKI')) die; require_once ("$IP/includes/SpecialPage.php");

$wgExtensionFunctions[] = 'wfSpecialPdf'; $wgExtensionCredits['specialpage'][] = array(       'name' => 'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Print as PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }

function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml);

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('<img', '<img', $bhtml); $bhtml = str_replace ('/>', '/>', $bhtml);

$html = "  ". $page. "  " . $bhtml. " ";

// make a temporary directory with an unique name $mytemp = "d:/tmp/f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1");

# Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user...                       passthru("htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage $mytemp"); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); }

function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage} " ) )                );

return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( str_replace('%0D%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] )) ?>"> <?php echo $monobook->msg( 'pdf_print_link' ); ?></a><?php ?></li><?php }       return true; } ?>

Page rendering
Hello, I'd wish some improvement of this useful extension. While it works technically it's a fact that the pages look better when printed to a PDF printer over your web browser. To have an improvement compared to web broweser PDFs it should look more LaTeX-style. --Ikiwaner 00:00, 16 January 2007 (UTC)
 * Ever tried Extension:Wiki2LaTeX? --Flominator 10:30, 14 August 2007 (UTC)

Errors in the last version in discussion
Using the very last iteration of the code posted in the discussions, I get the following error when I click the Print as PDF link:

Fatal error: Call to a member function getNamespace on a non-object in /srv/www/htdocs/wiki/includes/Article.php on line 150

Working with sites starting with http://www...not with sites http://...
Seems to be working only with sites which include www in their adress. Is this possible?--87.2.110.219 21:53, 23 January 2007 (UTC)

A problem with htmldoc encoding
htmldoc is very sensitive about the encoding.

In file SpecialPdf.php, line 89 passthru("/usr/bin/htmldoc -t pdf14 --charset 8859-1 --color --quiet --jpeg --webpage..., for the new version of htmldoc --charset should be iso-8859-1 Ivan

What is the purpose of HTMLDOC, if it's Windows app?
Hi all

I don't quite understand, if HTMLDOC is a windows application, how will this help me if my web server is a Linux server? There is also a Linux Version!

A problem with PHP passthru
Thanks for the extension! I'm using MediaWiki 1.8.2 on Windows 2003 and it's work. I had a problem with the passthru fonction that i solved by copying cmd.exe in the php installation folder.

Where do I download extension
Can't seem to find SpecialPage.php, where do I download the file?


 * Just cut and paste the code above into a text file with that name and extension. Jschroe

Scaling Images to fit paper width
I found an argument to htmldoc, that allows you to specify the 'width' of the page in pixels, which is sort of the opposite of scaling, but rather, setting the viewable resolution for the images. So, I have an image that is 900 pixels wide, I'd want to set my browser width to something greater than 900 to see the whole image at once. passthru("htmldoc -t pdf14 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'"); would become: passthru("htmldoc -t pdf14 --browserwidth 950 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'"); --DavidSeymore 18:48, 14 May 2007 (UTC)

Title of Wiki-Article in PDF
Hi,

i was wondering if it it possible to display the title of the Wiki article in the generated PDF?!THX

Name of the generated file
The extension works fine except for the fact that is outputs a file called index.php, renaming this to something.pdf works as it is in fact an pdf file. But I m wondering how I could fix it so it outputs <article_name.pdf>. Any suggestions?

Solution
I needed to uncomment this line to get <article_name.pdf> instead of <index.php>.
 * 1) header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));

Generated PDF is 1kb and corrupted
I am testing MW 1.10.1 on a LAMP system (upgraded from 1.6.7). Then tried the instructions on the main article page. After clicking on the Print to PDF link though I get a 1kb PDF. Any ideas as to what could be wrong? How do i go about fixing this? SellFone 07:23, 3 August 2007 (UTC)

Solution
I downloaded the HTMLDOC Binary and i didnt realize that you had to pay for it. When I ran it, it asked for a license and so I went ahead and installed gcc & gcc-c++ so i could compile from source and now its working.

Patch to Export Multiple Pages to PDF
This patch creates a Special Page form similar to Special:Export for specifying multiple pages to export to PDF. This patch was created against the 1.1 (19-July-2007) version of PdfExport and tested in MediaWiki 1.11.

'''Please post the whole files, insteat the patches. Thx'''

Patch for PdfExport.php:

Patch for PdfExport.i18n.php:

--Johnp125 18:09, 25 September 2007 (UTC)

Tried out the extra code. I get a htmldoc error when going to pdfprint. the regular print as pdf seems to work just fine just not the additional items.

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group. ERROR: No HTML files! Usage: htmldoc [options] filename1.html [ ... filenameN.html ] htmldoc filename.book Options: --batch filename.book --bodycolor color --bodyfont {courier,helvetica,monospace,sans,serif,times} --bodyimage filename.{bmp,gif,jpg,png} --book --bottom margin{in,cm,mm} --browserwidth pixels --charset {cp-874...1258,iso-8859-1...8859-15,koi8-r} --color --compression[=level] --continuous --cookies 'name="value with space"; name=value' --datadir directory --duplex --effectduration {0.1..10.0} --embedfonts --encryption --firstpage {p1,toc,c1} --fontsize {4.0..24.0} --fontspacing {1.0..3.0} --footer fff {--format, -t} {ps1,ps2,ps3,pdf11,pdf12,pdf13,pdf14,html,htmlsep} --gray --header fff --header1 fff --headfootfont {courier{-bold,-oblique,-boldoblique}, helvetica{-bold,-oblique,-boldoblique}, monospace{-bold,-oblique,-boldoblique}, sans{-bold,-oblique,-boldoblique}, serif{-bold,-italic,-bolditalic}, times{-roman,-bold,-italic,-bolditalic}} --headfootsize {6.0..24.0} --headingfont {courier,helvetica,monospace,sans,serif,times} --help --helpdir directory --hfimage0 filename.{bmp,gif,jpg,png} --hfimage1 filename.{bmp,gif,jpg,png} --hfimage2 filename.{bmp,gif,jpg,png} --hfimage3 filename.{bmp,gif,jpg,png} --hfimage4 filename.{bmp,gif,jpg,png} --hfimage5 filename.{bmp,gif,jpg,png} --hfimage6 filename.{bmp,gif,jpg,png} --hfimage7 filename.{bmp,gif,jpg,png} --hfimage8 filename.{bmp,gif,jpg,png} --hfimage9 filename.{bmp,gif,jpg,png} --jpeg[=quality] --landscape --left margin{in,cm,mm} --linkcolor color --links --linkstyle {plain,underline} --logoimage filename.{bmp,gif,jpg,png} --no-compression --no-duplex --no-embedfonts --no-encryption --no-links --no-localfiles --no-numbered --no-overflow --no-pscommands --no-strict --no-title --no-toc --numbered --nup {1,2,4,6,9,16} {--outdir, -d} dirname {--outfile, -f} filename.{ps,pdf,html} --overflow --owner-password password --pageduration {1.0..60.0} --pageeffect {none,bi,bo,d,gd,gdr,gr,hb,hsi,hso,vb,vsi,vso,wd,wl,wr,wu} --pagelayout {single,one,twoleft,tworight} --pagemode {document,outline,fullscreen} --path "dir1;dir2;dir3;...;dirN" --permissions {all,annotate,copy,modify,print,no-annotate,no-copy,no-modify,no-print,none} --portrait --proxy http://host:port --pscommands --quiet --referer url --right margin{in,cm,mm} --size {letter,a4,WxH{in,cm,mm},etc} --strict --textcolor color --textfont {courier,times,helvetica} --title --titlefile filename.{htm,html,shtml} --titleimage filename.{bmp,gif,jpg,png} --tocfooter fff --tocheader fff --toclevels levels --toctitle string --top margin{in,cm,mm} --user-password password {--verbose, -v} --version --webpage fff = heading format string; each 'f' can be one of:. = blank / = n/N arabic page numbers (1/3, 2/3, 3/3) : = c/C arabic chapter page numbers (1/2, 2/2, 1/4, 2/4, ...) 1 = arabic numbers (1, 2, 3, ...) a = lowercase letters A = uppercase letters c = current chapter heading C = current chapter page number (arabic) d = current date D = current date and time h = current heading i = lowercase roman numerals I = uppercase roman numerals l = logo image t = title text T = current time

This shows at the top of the page.

Below that is the options to convert different documents to pdf, but it does not work.

Fedora C 4 fix
I have installed the pdf export extension and added the code in localsettings.php and my wiki just shows a blank screen when I install this extension. I have installed htmldoc and it works from command prompt.

Has anyone a solution,, to get this run under windows ?
Please post here. Export works already fine, but the pdf file is empty.

ThX

Fatal Error
Call to undefined method: specialpdf->__construct in /usr/home/admin/domains/< >/public_html/mediawiki-1.6.10/extensions/PdfExport/PdfExport.php on line 51

This occurs when opening any page in MediaWiki. HTMLDoc installed

To fix this problem, replace parent::__construct( 'PdfPrint' );

with SpecialPage::SpecialPage ('PdfPrint');

Got it working on Win2k3 and MediaWiki 1.10.1 (Finally)
Here's the solution. Copy and paste the code from above, and make the following modifications:


 * Download the latest version of HTMLDoc (which, despite the claim that it has an installer, it does not)
 * Extract the contents of the HTMLDoc zip file to C:\Program Files\HTMLDoc\
 * Add "C:\Program Files\HTMLDoc\" to the PATH environment variable
 * Set IUSR_<MACHINE-NAME> "Read" and "Read & Execute" permissions on C:\Windows\System32\cmd.exe
 * Set IUSR_<MACHINE-NAME> "Full Control" on C:\Windows\Temp\
 * Copy and paste the new PdfExport.php code from Working Code for Windows with MediaWiki v1.9
 * Change the value of $mytemp to $mytemp = "C:\\Windows\\Temp\\f" .time. "-" .rand . ".html";

That was enough to do it for me - hope this helps some of you!

~JT

Got it working with Win2k3 and MediaWiki 1.11.0
~Mark E.
 * I used the above instructions for 1.10, except the value of $mytemp indicated by JT was wrong - I have changed it to include double backslashes.
 * If it still doesn't work, try copying your CMD.exe to your \PHP folder. Make sure your PHP folder has IUSR_MACHINENAME read and read & execute permissions.

Could you explain for beginners what we are suppose to do with HTMLDoc. How to use this extension on my website ? Thanks for your help. Marcjb 22:32, 27 August 2007 (UTC)

Working on Debian unstable with Mediawiki 1.7
But I'm not getting any images either.

Datakid 01:55, 5 September 2007 (UTC)

More robust way of ensuring URL's are absolute
I've had to make URL's absolute for a couple of other extensions and found a more robust way than doing a replacement on the parsed text. Instead just set the following three globals before the parser is called and it will make them all absolute for you: --Nad 04:00, 5 September 2007 (UTC)

Nad, Is the parser that you refer to already in the php script that is on the front page? I've added those lines at the top of the file, after $wgHooks and before function wfSpecialPdf, and still no images?
 * I also tried putting those lines in SpecialPdf.execute after $wgServer; and before  $page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); Still no joy. Datakid 01:31, 6 September 2007 (UTC)
 * Ideally they should be defined just before the $wgParser->parse is called in the execute function. But I've just grep'd the html output of one of my exported PDF's from Extension:Pdf Book which uses this method of "absoluterising", but it hasn't worked for the images, maybe best to stick with the current text-replacement until I sort out that problem. The replacement used to make the url's absolute is:


 * Just realised that you also need to modify $wgUploadPath to make image src attributes absolute too, but I also have a problem with images showing up even with the src attributes being absolute... --Nad 04:13, 6 September 2007 (UTC)

Newest development is to allow an arbitrary number of such repositories per installation. Since it is likely for them to have different individual algorithms to 'absolutize' their path names, it might be a good idea to rely on an image-related function for the task. Images 'know' which repository they are in, including the local file system, or the upload directory. The philosophy of image related code is to always address Image objects, and usually not deal with repositories directly. So when you find an image in the source text, get a  object, and you probably have a function returning an absolute URL right with it. If it's not there, add it, or file a bug. If you cannot do it yourself, you could also ask me to add it, and I might even do so (-; I'm dealing with image-related functions anyways atm. ;-)
 * Holla,
 * Keep in mind that,  above is highly installation dependant, and may be quite different between installations, actually, it can be any string from   upwards.
 * Keep in mind that, not every image resides in the $wgUploadPath or a subdirectory thereof. Images might be in a shared repository, like WikiMedia Commons.
 * --Purodha Blissenbach 07:55, 8 September 2007 (UTC)


 * Having had a glace at the code, I am pretty certain that, the structurally correct way is, to let the parser object take care of asking image objects for absolute URLs. That means, add a  or similar, which is likely not yet possible, but imho pretty easily added to the parser. Investigate! It may be already there. Else, see above.
 * --Purodha Blissenbach 08:13, 8 September 2007 (UTC)

Bug: /tmp/ not writeable due to openbasedir restriction.
We did not get a pdf file, but got a series of php output of the type:

(paths truncated for brevity)

instead. We suggest, not to send these downstream (with wrong http headers, btw.) but rather display a decent error message on the special page. Likely, using  before writing into the file would do. --Purodha Blissenbach 16:48, 8 September 2007 (UTC)

Bug and Fix: Pdf Export blocks Semantic MediaWiki
We have several extensions installed, and included Pdf Export before including SemanticMediaWiki (SMW). The outcome was SMW not working: Thus the installation of SMW could not be completed. It requires the Special:SMWAdmin page to be accessed.
 * Special:Version looked as expected, but
 * Special:Specialpages did not show any of SMWs special pages,
 * URL-calling the page Special:SMWAdmin, or entering it via the seach box, yielded a "nonexisting special page" error.

Pdf Export was not working, see above. When we removed it from LocalSettings.php, we could use SMW. When we placed its call after the inclusion and activation of SMW in LocalSettings.php, SMW continued to work. See also bug 11238.

--Purodha Blissenbach 16:48, 8 September 2007 (UTC)


 * A fix for this bug is described in bug 11238. --Markus Krötzsch 10:26, 2 October 2007 (UTC)

Bug and Fix: Pdf Export forces all special pages to load during init, thus slowing down the wiki

 * A fix for this bug is described in bug 11238. --Markus Krötzsch 10:26, 2 October 2007 (UTC)

Table and image flow / Image alignment
Tables and images have all of their text forced below, as if they were followed by: Also, images that are right aligned: end up on the left side of the page and all text is forced to appear after the image. Is this a limitation of htmldoc or something that can be fixed in the extension? -- 66.83.143.246 18:53, 13 September 2007 (UTC)
 * I've temporarily fixed the image problem by wrapping them into a table, but this does not solve the general problem why htmldoc is forcing breaks after floats. 66.83.143.246 13:28, 14 September 2007 (UTC)

Error page
--Johnp125 03:10, 23 September 2007 (UTC)

When I click on the print to pdf button all I get is this.

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved.

This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

checked error log in apache and I'm getting errors on line 55,57,58.

[client 192.168.1.102] PHP Warning: fopen(/var/www/html/wiki/pdfs/header.html) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 55, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fwrite: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 57, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fclose: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 58, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Fatal error: Call to a member function getNamespace on a non-object in /var/www/html/wiki/includes/Article.php on line 160, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fopen(/var/www/html/wiki/pdfs/header.html) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 55, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fwrite: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 57, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fclose: supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 58, referer: http://192.168.1.99/wiki/index.php/Main_Page

I'm running fedora c4. the htmldoc is in the /usr/bin folder.

Blue boxes on images and empty table of contents entry
htmldoc was adding blue borders on images that didn't have the frame attribute since they all had anchor tags around them and a slot in the table of contents for the mediawiki generated table of contents. I removed these with the following regular expressions in the execute function:

// Remove the table of contents completely $bhtml = preg_replace(      '/<table id="toc".*?\/script>/ms',       '',       $bhtml );

// Remove any links from images to avoid blue boxes in the PDF output. $bhtml = preg_replace(      '/<a href="[^"]*Image:.*?>(<.*?>)<\/a>/',       "$1",       $bhtml );

The usual caveats about parsing HTML with regular expressions apply -- it will fail if the alt text or caption includes a closing > or if any other number of things change.

66.83.143.246 12:45, 26 September 2007 (UTC)

PdfExport on mediawiki 1.11.0rc1
I'm not quite sure weither it's a bug or just strange behaviour due to the mediawiki version I'm using, but I'm encountering an error. On the special page overview (list of special pages), the link PdfExport causes an error (obviously because no page is given to render as PDF).


 * Question1: Is the PdfExport link ment to exist on the list of special pages (Spcial:SpecialPages)?
 * Question2: Is the returned error (Fatal error: Call to a member function getNamespace on a non-object in...) a feature, when clicked on the link mentioned above?
 * Question3: If not a feature but a bug, is there a neat solution to this?

I have made a quick fix 'surpressing' the error and giving the user an existing page as PDF by adding the following line if($page==''){$page='PdfAbout';} after $page = isset( $par ) ? $par : $wgRequest->getText( 'page' );

Fix? (request for confirmation)
I found a way to remove the 'PdfExport' link from the list of SpecialPages.

change parent::__construct( 'PdfPrint' ); to parent::__construct( 'PdfPrint', '', false );

I assume that the __construct function refers to 	function SpecialPage( $name = , $restriction = , $listed = true, $function = false, $file = 'default', $includable = false ) { (found somewhere near line 555 in includes/SpecialPage.php) and therefore the third parameter sets listed to false.

It seems to work, but I cannot really grasp the __construct concept.

ERROR PDFExport if set User rigths
I set the user rights: $wgGroupPermissions['*'   ]['read']         = false; $wgGroupPermissions['user' ]['read']        = true; $wgWhitelistRead = array("Main Page", 'Special:PdfPrint', 'Special:Userlogin',); If I give the URL to HTMLDOC - after I changed the user rights -, there is not any content in the generated PDF file, except the navigation menu on the left-hand side of the wiki's page. How can I get the whole content of the page? Can I somehow change the USER ID if I generate the PDF file?