Extension:PdfHandler

This is the README file for the PdfHandler extension for MediaWiki software. The extension is only useful if you've got a MediaWiki installation; it can only be installed by the administrator of the site.

The extension shows uploaded pdf files in a multipage preview layout. With enabled WebStore the extension generates automatically Images from the specified page.

This is the first version of the extension and it's almost sure to have bugs. See the BUGS section below for info on how to report problems.

License
Copyright 2007 xarax (s:de:Benutzer:Xarax)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307  USA

Pre-requisites
This software was tested with 1.11alpha (r25574) – 1.12alpha (r28001) – 1.13.0rc1, and also 1.16wmf. It may or may not work with earlier or later versions, but please test it.
 * currently live on Wikipedia, running 1.16wmf4 (r71190).

It requires the following packages:


 * gs-gpl for gs to render the page images (http://www.ghostscript.com/)
 * imagemagick for convert to resize the extracted images (http://www.imagemagick.org/script/index.php)
 * xpdf-utils for pdfinfo to extract metadata from pdf (http://www.foolabs.com/xpdf/download.html). The poppler-utils package may be substituted for xpdf-utils on Ubuntu and Debian systems.

It needs at least PHP 5.1.3 to work (dependency on SimpleXMLElement::addChild).

Installation
To install, copy all the files in the archive you downloaded to the PdfHandler subdirectory of the extensions subdirectory of your MediaWiki installation.

In your MediaWiki LocalSettings.php, add the following line some place towards the bottom of the file:

Then you can set some variables in your LocalSettings.php (if you wish to override one of the defaults):


 * $wgPdfProcessor – path to your ghostscript implementation (default to gs)
 * $wgPdfPostProcessor – path to your imagemagick convert (default to convert)
 * $wgPdfInfo – path to your pdfinfo (default to pdfinfo)
 * $wgPdfOutputExtension – defaults to jpg
 * $wgPdfHandlerDpi – defaults to 150 dpi
 * The extension extracts a bitmap image for each page of the PDF, using this resolution (dpi = dots per inch). For example, a PDF page with the European size A4 is 210 mm wide, corresponding to 595 points (at 72 dpi). This yields an image 1240 pixels wide (at 150 dpi). If instead this parameter is set to 300 dpi, the width will be 2480 pixels.
 * $wgPdfCreateThumbnailsInJobQueue - defaults to false</tt>
 * Put creating pages' thumbnails into a job queue, so they do not have to be created while browsing a file page, but during normal wikibrowsing. Be advised that setting this to true</tt> may significantly increase CPU load of your webserver on a high-traffic website. Job queue was designed to perform quick tasks on page views, and creating thumbnails can be included into that queue, according to this definition. Nevertheless, as quick as it is, it also requires certain CPU load to convert PDF pages to another format. The solution to that would be either setting $wgJobRunRate to rather small value (ie. 0.05) or disabling job queue on the high-traffic wiki and setting up another server to do just job queue (like Wikimedia did).

Variables below are not specific to this extension:
 * Enable PDF uploads, if you haven't already: $wgFileExtensions[] = 'pdf';</tt>
 * $wgMaxShellMemory</tt> - memory limit for gs, convert and pdfinfo. The default value of 102400 kB (100 MB) might be too low.

Example for a WindowsXP installation:

If $wgImageMagickConvertCommand is not already defined, use the definition alike

Example for a Ubuntu installation:

Ubuntu Note: You must have already installed the following 3 packages; imagemagick ghostscript xpdf-utils

Usage
The latter is apparently no longer true; existing files also get a preview. I tested it on my wiki, and also recall this stated in bugzilla/mailinglist/wherever, although I can't seem to locate the reference --Dror Snir 00:40, 25 August 2010 (UTC)
 * The main usage of the PdfHandler extension is without user interaction. If you upload a new pdf file, the metadata will be stored in the database, and then this file can be shown in a multipage preview layout like the djvu handler does. The extension does not render PDF files that were uploaded prior to installation; they will be displayed in the same way as before.
 * Another option, introduced quite long ago (r25575), is to use it to display PDF files as an image, showing a single page at a time, like so: myPdfFile.pdf . the page and size parameters are optional; the default page is page #1.
 * If you would like to present a 2-page pdf, for example, do the following: myPdfFile.pdf myPdfFile.pdf

Bugs and enhancements
Bugs can be reported at Bugzilla

Bug list:
 * PdfHandler bug reports, all of them, old and new

Enhancements

 * There is a patch to run a background job for thumbnails to be created. Thanks to that, the thumbnails are created during normal browsing through wiki pages, instead of during browsing the PDF file page. The patch saves time by creating one thumbnail at the time, not causing any load deficiencies, unlike rendering them on PDF file pages.