Extension:PdfHandler

The extension shows uploaded pdf files in a multipage preview layout. With WebStore enabled, the extension automatically generates images from the specified page. With the Proofread Page extension enabled, pdfs can be displayed side-by-side with text for transcribing books and other documents, as is commonly done with DjVu files (particularly in Wikisource).

This is the first version of the extension, and it's almost sure to have bugs. See the BUGS section below for info on how to report problems.

License
Copyright 2007 xarax (s:de:Benutzer:Xarax)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Pre-requisites
This software was tested with 1.11alpha (r25574) – 1.12alpha (r28001) – 1.13.0rc1, and also 1.16wmf forward. It may or may not work with earlier versions, but please test it.

It requires the following packages:


 * gs-gpl for gs to render the page images (http://www.ghostscript.com/)
 * imagemagick for convert to resize the extracted images (http://www.imagemagick.org/script/index.php)
 * xpdf-utils for pdfinfo to extract metadata from pdf (http://www.foolabs.com/xpdf/download.html). The poppler-utils package may be substituted for xpdf-utils on Ubuntu and Debian systems.
 * If you don't know if you have these, you can type into a shell "which gs convert pdfinfo" to see if you have them installed. 

It needs at least PHP 5.1.3 to work (dependency on SimpleXMLElement::addChild).

Installation
To install, copy all the files in the archive you downloaded to the PdfHandler subdirectory of the extensions subdirectory of your MediaWiki installation.

In your MediaWiki LocalSettings.php, add the following line some place towards the bottom of the file:

Then you can set some variables in your LocalSettings.php (if you wish to override one of the defaults):


 * $wgPdfProcessor – path to your ghostscript implementation (default to gs)
 * $wgPdfPostProcessor – path to your imagemagick convert (default to convert)
 * $wgPdfInfo – path to your pdfinfo (default to pdfinfo)
 * $wgPdfOutputExtension – defaults to jpg
 * $wgPdfHandlerDpi – defaults to 150 dpi
 * The extension extracts a bitmap image for each page of the PDF, using this resolution (dpi = dots per inch). For example, a PDF page with the European size A4 is 210 mm wide, corresponding to 595 points (at 72 dpi). This yields an image 1240 pixels wide (at 150 dpi). If instead this parameter is set to 300 dpi, the width will be 2480 pixels.
 * $wgPdfCreateThumbnailsInJobQueue - defaults to false</tt>
 * Put creating pages' thumbnails into a job queue, so they do not have to be created while browsing a file page, but during normal wikibrowsing. Be advised that setting this to true</tt> may significantly increase CPU load of your webserver on a high-traffic website. Job queue was designed to perform quick tasks on page views, and creating thumbnails can be included into that queue, according to this definition. Nevertheless, as quick as it is, it also requires certain CPU load to convert PDF pages to another format. The solution to that would be either setting $wgJobRunRate to rather small value (ie. 0.05) or disabling job queue on the high-traffic wiki and setting up another server to do just job queue (like Wikimedia did).

Variables below are not specific to this extension:
 * Enable PDF uploads, if you haven't already: $wgFileExtensions[] = 'pdf';</tt>
 * $wgMaxShellMemory</tt> - memory limit for gs, convert and pdfinfo. The default value of 102400 kB (100 MB) might be too low.

Example for a WindowsXP installation:

If $wgImageMagickConvertCommand is not already defined, use the definition alike

Example for a Ubuntu installation:

Ubuntu Note: You must have already installed the following 3 packages; imagemagick ghostscript xpdf-utils

Usage

 * The main usage of the PdfHandler extension is without user interaction. If you upload a new pdf file, the metadata will be stored in the database, and then this file can be shown in a multipage preview layout like the djvu handler does. Without this extension, pdfs will not display properly when uploaded.
 * Additionally, this extension allows Extension:ProofreadPage to handle pdfs in side-by-side view for transcribing/proofreading, as is done on Wikisource
 * Another option, introduced quite long ago (r25575), is to use it to display PDF files as an image, showing a single page at a time, like so: myPdfFile.pdf . The page and size parameters are optional; the default page is page #1. Instead of a size-parameter, you can also use the thumb-parameter, with or without captions: myPdfFile.pdf.
 * Because PdfHandler extends ImageHandler, you can use all the arguments that you would for an Image -- for example: thumb, right/left, caption, border, link, etc.
 * If you would like to present a 2-page pdf, for example, do the following: myPdfFile.pdf myPdfFile.pdf

Bugs and enhancements
Bugs can be reported at Bugzilla

Bug list:
 * PdfHandler bug reports, all of them, old and new

Enhancements

 * There is a patch to run a background job for thumbnails to be created. Thanks to that, the thumbnails are created during normal browsing through wiki pages, instead of during browsing the PDF file page. The patch saves time by creating one thumbnail at the time, not causing any load deficiencies, unlike rendering them on PDF file pages.