Extension talk:Pdf Export/archive 1

There seems to be a licensing problem with htmldoc for windows installations ?

I did not manage to open pdf file, so I had to add a header line to download the pdf

Furthermore, htmldoc does not support unicode, I began a translation utf8 to latin1 for displaying french characters correctly , this may need enhancements.
 * I filled in the gaps for german umlauts based on the site you linked to. ~gandm

mailto:sancelot@free.fr here is my working file with windows  'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

// thanks to interesting http://klaus.e175.net/code/latin_utf8.phps link // only french cars are done function utf8_latin1($text){ return strtr($text,array( "\xC3\x9F"=>"&amp;szlig;", "\xC3\xA4"=>"&amp;auml;", "\xC3\xAB"=>"&amp;euml;", "\xC3\xAF"=>"&amp;iuml;", "\xC3\xBC"=>"&amp;uuml;", "\xC3\xB6"=>"&amp;ouml;", "\xC3\x84"=>"&amp;Auml;", "\xC3\x8B"=>"&amp;Euml;", "\xC3\x8E"=>"&amp;Iuml;", "\xC3\x9C"=>"&amp;Uuml;", "\xC3\x96"=>"&amp;Ouml;", "\xC3\xA2"=>"&amp;acirc;", "\xC3\xAA"=>"&amp;ecirc;", "\xC3\xAE"=>"&amp;icirc;", "\xC3\xB4"=>"&amp;ocirc;", "\xC3\xBB"=>"&amp;ucirc;", "\xC3\x82"=>"&amp;Acirc;", "\xC3\x8A"=>"&amp;Ecirc;", "\xC3\x8E"=>"&amp;Icirc;", "\xC3\x94"=>"&amp;Ocirc;", "\xC3\x9B"=>"&amp;Ucirc;", "\xC3\xA0"=>"&amp;agrave;", "\xC3\xA8"=>"&amp;egrave;", "\xC3\xB9"=>"&amp;ugrave;", "\xC3\xA9"=>"&amp;eacute;", "\xC3\x80"=>"&amp;Agrave;", "\xC3\x88"=>"&amp;Egrave;", "\xC3\x99"=>"&amp;Ugrave;", "\xC3\x89"=>"&amp;Eacute;", "\xC3\xA7"=>"&amp;ccedil;" //     "%3A"=>"/" )); } function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Sauvegarder en PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText;

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('href="#', 'href="' . $wgServer . '/' . $page . '#', $bhtml); $bhtml=utf8_latin1($bhtml); #$html = "  ". $page. "  " . $bhtml. " ";                        # thanks to mediawiki AT sandeman.freesurf.fr : $html = "  ". utf8_decode($page). "  " . $bhtml. " ";                        // make a temporary directory with an unique name $mytemp = "c:\\temp\\f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1"); # Write the content type to the client... header("Content-Type:application/pdf"); header('Content-Disposition: attachment;filename="'.$page.'.pdf"'); flush;

# Run HTMLDOC to provide the PDF file to the user... passthru("htmldoc -t pdf14 --color  --quiet --jpeg --webpage $mytemp "); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); } function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage}" ) )                ); return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?>msg( 'pdf_print_link' ); ?>data['nav_urls']['pdfprint']['href'] ) ?>">msg( 'pdf_print_link' ); ?>

Patch for version 2.3
I've extended version 2.3 (2010-01-28) with some additional features:


 * special page
 * allow fontface to be picked
 * allow fontsize to be set
 * setting pdf-permissions to either "all" or "none"
 * allow margins (top/sides/bottom) to be defined


 * other
 * defined "$wgPdfExportBackground" which can be used to set an image which is then applied as background to each and every site of the resulting pdf

'What's the process for a review of the patch? How should I provide that new version? Post it here?'

Is it possible to intergrate multiple page with windows version
Is it possible to intergrate multiple page with windows version?

PDF in Spanish
With the diff below I could solve the ñ and accent problem

diff

61c61 < --- >                        $bhtml = utf8_decode($bhtml); 76c76 <                        passthru("htmldoc -t pdf --quiet --jpeg --webpage '$mytemp'"); --- >                        passthru("htmldoc -t pdf --charset 8859-1 --quiet --jpeg --webpage '$mytemp'");

save diff to patch.txt next execute

patch SpecialPdf.php patch.txt

--Esacchi 20:12, 2 August 2006 (UTC)

SpecialPDF and MimeTeX
This is a really cool and useful extension. HOwever, it is ignoring an extension we added to support MimeTeX so that the output does not include any math created through that extension.

Our MimeTeX extension replaces Any LaTeX formula

with 

Is there another way we should be generating this so that SpecialPDF can capture the image? DavidJameson 20:32, 4 August 2006 (UTC)

Updated for unicode, multiple articles, and images
Searched around for a way to do multiple articles to PDF, had to combine what was listed here and what was contained in wiki2pdf. Works the same way as SpecialPDF.php, put it in your extensions folder. HTML files are created (for processing) in your /webroot/wikiroot/pdfs folder (so create it if you don't have it) or another folder of your choice. It still uses HTMLDOC with some switches to format headers and footers, and there are string substitutions for the images exported out of the Wiki...

 'myPdf',        'author' =>' Thomas Hempel, Simon Wheatley, and others',        'description' => 'prints a collection of articles as a pdf book',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfmyPDFNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfmyPDFToolbox';

function wfmyPDF { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint2' => 'PdfPrint2' , 'pdf_print_link2' => 'Export PDF book'));

class myPDF extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function myPDF { SpecialPage::SpecialPage( 'PdfPrint2' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer; //Get the name of the main article from which this routine was called // - this will be used for the book/file name $page=isset($par) ? $par:$wgRequest->getText('page'); $title=Title::newFromText($page); $article=new Article($title); //write a header file with the title HTML tab for the book - header.html //all pdfs will be written to /webroot/wikiroot/pdfs $doctitle=str_replace("_", " ", $page); //write the header file: $mytemp = $_SERVER["DOCUMENT_ROOT"].$wgScriptPath."/pdfs/header.html"; $article_f = fopen($mytemp, 'w'); $doctitle=str_replace("_", " ", $page); fwrite($article_f, "  ".$doctitle."     "); fclose($article_f);

//add this header file to the list of files that htmldoc will process $filelist=$mytemp; $c=1;

//get the article content, i.e. a list of articles to print to pdf //each one is denoted by curly braces $SaveText=$article->getContent; $i = strpos($SaveText,"{"); while ($i >= 0) { $j = strpos($SaveText,"}"); if ($j <= $i) break; $art = trim(substr($SaveText, $i+1,$j-$i-1)); $SaveText=substr($SaveText, $j+1); //Go fetch the article that was listed $title1 = Title::newFromURL( $art ); $article1 = new Article($title1); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = true; //parse the article into HTML $parserOutput = $wgParser->parse( $article1->preSaveTransform( $article1->getContent ) ."\n\n",                               $title1, $parserOptions ); //get the html content, then format it to remove any wiki escape chars $bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml); //make sure all links are absolute $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); //make sure all image tags are true $bhtml = str_replace ('&lt;img', '', $bhtml); //write a new title and H1 heading - used for the chapter in the pdf book $html = "  ".$art."   ".$art." \n".$bhtml."  "; //output article to next html file in list: $mytemp = $_SERVER["DOCUMENT_ROOT"].$wgScriptPath."/pdfs/file".$c.".html"; $article_f = fopen($mytemp, 'w'); fwrite($article_f, $html); fclose($article_f); $c=$c+1; $filelist=$filelist." ".$mytemp; $i = strpos($SaveText,"{"); //limit output files to 100 - used in testing in case things get out of hand if ($c > 100) break; }                       putenv("HTMLDOC_NOCGI=1");

# Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user... passthru("htmldoc --book -t pdf14 --bodyfont Helvetica --header t.1 --footer c.1 --no-links --linkstyle plain --charset 8859-1 --color --quiet --jpeg --webpage ".$filelist); unlink ($filelist); }       }        SpecialPage::addPage (new myPDF); }

function wfmyPDFNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint2'] = array(                       'text' => wfMsg( 'pdf_print_link2' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint2', "page=". wfUrlencode( "{$skintemplate->thispage}" ) )                ); return true; }

function wfmyPDFToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint2'] ) ) if ( $monobook->data['nav_urls']['pdfprint2']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link2' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint2']['href'] ) ?>"><?php echo $monobook->msg( 'pdf_print_link2' ); ?></a><?php ?></li><?php }       return true; } ?>

MP 09:45, 30 August 2006 (UTC)
This doesn't work for me - I've added the php file to the extensions folder and altered Localsettings.php to include it. When I navigate to my site I just get a blank page rather than logon. I'm obviously missing something blindingly obvious.....


 * Windows XP
 * Apache 2
 * php 5.1.4
 * MySQL 4.1.16
 * Mediawiki 1.7.1

---CheShA Says: "You haven't set permissions on the SpecialPdf.php file; Apache can't access it"

SCW 3:14pm, 2nd Sept, 2006 (CT)


require_once("extensions/myPDF.php"); hghg

Hello Editing the Page

Bad argument to HTMLDOC
myPDF is not working for me. It's generating a bogus arg to HTMLDOC and so instead of getting a PDF file, I get a file with the error message:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: Bad option argument "--charse "!

Note that there should be a 't' at the end of the argument but instead there is an embedded CR control character

DavidJameson 14:22, 5 September 2006 (UTC)

SCW, 05sept06, 10:09CT
A line break got into the cut and paste on that line for htmldoc passthru. take out the linebreak to join the lines together to make the argument to htmldoc '--charset' and all should be okay.

I'll be posting a newer version that has th option for creating PDF books with a title page, TOC and nice nesting of articles...

Simon.

No HTML files found
I must be missing something else....after fixing the linebreak problem, I opened a wikipage and then clicked on Export PDF book.

This time I got a file with the error:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

Usage: htmldoc [options] filename1.html [ ... filenameN.html ] htmldoc filename.book

What's this notion about a "collection" of articles? Is there something I'm supposed to do to "collect" some articles together before I can print them? How would I do that, and why would I need to? The "Export PDF Book" link shows up in Toolbox when I'm viewing a particular page - So how can generate THAT page as PDF but including all images?

DavidJameson 15:36, 5 September 2006 (UTC)

okay, the following script is based of the original, temporary PDF files are saved to /tmp, like they were in the original. This script will not deal with putting multiple articles into one PDF file (see above for that). It will handle image files.

<?php

if (!defined('MEDIAWIKI')) die; require_once ("$IP/includes/SpecialPage.php");

$wgExtensionFunctions[] = 'wfSpecialPdf'; $wgExtensionCredits['specialpage'][] = array(       'name' => 'Pdf',        'author' =>' Thomas Hempel',        'description' => 'prints a page as pdf',        'url' => 'http://www.netapp.com' );

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav'; $wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

function wfSpecialPdf { global $IP, $wgMessageCache;

$wgMessageCache->addMessages(               array( 'pdfprint' => 'PdfPrint' , 'pdf_print_link' => 'Print as PDF'));

class SpecialPdf extends SpecialPage { var $title; var $article; var $html; var $parserOptions; var $bhtml;

function SpecialPdf { SpecialPage::SpecialPage( 'PdfPrint' ); }               function execute( $par ) { global $wgRequest; global $wgOut; global $wgUser; global $wgParser; global $wgScriptPath; global $wgServer;

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); $title = Title::newFromText( $page ); $article = new Article ($title); $wgOut->setPrintable; $wgOut->disable; $parserOptions = ParserOptions::newFromUser( $wgUser ); $parserOptions->setEditSection( false ); $parserOptions->setTidy(true); $wgParser->mShowToc = false; $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent ) ."\n\n",                                       $title, $parserOptions );

$bhtml = $parserOutput->getText; $bhtml = utf8_decode($bhtml);

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml); $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); $bhtml = str_replace ('&lt;img', '<img', $bhtml); $bhtml = str_replace ('/&gt;', '/>', $bhtml); $html = "  ". $page. "  " . $bhtml. " ";                        // make a temporary directory with an unique name $mytemp = "/tmp/f" .time. "-" .rand. ".html"; $article_f = fopen($mytemp,'w'); fwrite($article_f, $html); fclose($article_f); putenv("HTMLDOC_NOCGI=1"); # Write the content type to the client... header("Content-Type: application/pdf"); header("Content-Disposition: attachment; filename=\"$page.pdf\""); flush;

# Run HTMLDOC to provide the PDF file to the user...                       passthru("htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage '$mytemp'"); unlink ($mytemp);

}       }        SpecialPage::addPage (new SpecialPdf); }

function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) { $nav_urls['pdfprint'] = array(                       'text' => wfMsg( 'pdf_print_link' ),                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=". wfUrlencode( "{$skintemplate->thispage} " ) )                ); return true; }

function wfSpecialPdfToolbox( &$monobook ) { if ( isset( $monobook->data['nav_urls']['pdfprint'] ) ) if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) { ?><?php echo $monobook->msg( 'pdf_print_link' ); ?></li><?php } else { ?><?php ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['href'] ) ?>"> <?php echo $monobook->msg( 'pdf_print_link' ); ?></a><?php ?></li><?php }       return true; } ?>

Almost there (grin)
Well, this version almost works perfectly - it gave me a nice PDF file with explicit images referenced in the wiki page.

However, it crashes when trying to handle one of the image tags produced by our MimeTex math generator.

E.g.

<img src="/cgi-bin/mimetex.cgi?\green f(\xi)=\int_{-\infty}^\xi e^{-\tau^2}d\tau { {x \atop y } }" align="absmiddle" border="0" alt="TeX Formula">

seems to cause a crash.

I wonder if the code that tweaks the IMG tag itself is getting confused with the more sophisticated stuff in this particular image reference.

DavidJameson 12:10, 6 September 2006 (UTC)

PHP error
Just for grins, I ran the process through a PHP debugger. The debugger barfed with the error below. Error: E_ERROR Call to a member function getNamespace on a non-object at /var/www/html/wikiroot/riskit/includes/Article.php line 155

The line it's complaining about is			if ( $this->mTitle->getNamespace == NS_MEDIAWIKI ) { which is found in the function getContent in Article.php

(P.S.....someone needs to install the GESHI syntax highlighting extension on wikimedia.org)

DavidJameson 12:18, 6 September 2006 (UTC)

PHP error
OK - I've figured out both problems (to an acceptable extent)

1) The PHP bug is due to the fact that sometimes (and I don't know why it's only sometimes), the URL associated with the Print As PDF command in the toolbox has %0D%0A tacked on at the end. No idea where this is coming from but it can be removed by modifying the echo statement on line 103, adding a str_replace function to remove the extra characters. php echo htmlspecialchars( str_replace('%0D%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] ))

2) The reason the math images weren't being processed was because the URL in the SRC attribute of the IMG tag did not include the server. I modified my mimetex extension to include the server but this problem will arise again for anyone else who references an image without using a server in the URL. The full solution will require the text to be searched for all image urls, examine them to see if there's a server part and if not, insert the $wgServer into the string. Probably can be done quickly with a regex. DavidJameson 20:31, 6 September 2006 (UTC)


 * What's wrong with just doing this: "$bhtml = str_replace ('/images/',$wgServer . '/images/', $bhtml);" instead of the $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml); that you have there? Made the math images all work for me. --dgrant 18:30, 25 October 2006 (UTC)

Error
On some pages I get this error. I'm using DavidJameson's code above. --dgrant 18:34, 25 October 2006 (UTC)

Fatal error: Call to a member function getNamespace on a non-object in /var/www/mediawiki-checkout/includes/Article.php on line 150


 * Ok, now I'm getting this on all pages for some reason. 216.13.217.231 01:30, 7 November 2006 (UTC)

Does not work with 1.8.2 of mediawiki.
No way no how. Neither does any of the code on this page.

Error: Fatal error: Call to a member function getNamespace on a non-object in /var/www/includes/Article.php on line 150


 * I have this Error in Mediawiki 1.8.2 and 1.9.0 when calling Special:PdfPrint. It works for me when clicking on the link in the toolbox. --Ikiwaner 23:57, 15 January 2007 (UTC)


 * I'm running 1.6.7 on BluWiki and getting the exact same error... If someone figures this out can they email me? --SamOdio 14:29, 20 March 2007 (UTC)

24.10.2007 by Dirk:

Hi, I have the same Problem, my pdf Export wont work @ all. System:SQL 5.0.45 & PHP 5.2.3

Host: all-inkl.com (htmldoc installed)

Mediawiki: 1.11.0

I've started an PDF print with the attribute page, print (http://www.carpc-wiki.info/index.php?title=Spezial:PdfPrint&page=Hauptseite) as url I get an empty/corrupt pdf back. If I go to the Spezialpages and klick PDF Export, I get this error: Error: Fatal error: Call to a member function getNamespace on a non-object in /var/www/includes/Article.php on line 150

too. what may be wrong? I corrected the Temp path, the htmldoc path too, the temp does have chmod 777 (must have it right?)

the Wiki ist: http://www.carpc-wiki.info

thanks for information. email: info(at)carpc-wiki.info

No Images
Great extension, works really well except it doesn't appear to include images in the export. Can anyone please confirm that this is normal and that I haven't done anything wrong?

Thanks, CheShA.

Still No Images
Hi, can anyone tell me what to do that my PDF Exported File includes the Pictures from the Original article?

I´ve heard there´s a PDF Hack to fix that problem. If anyone had an idea,... please let me know

THX [mailto:DarkManX1@gmx.net X-Cident]