Release status: stable
|Description||Allows to handle PDF files like multipage DJVU|
|Author(s)|| Martin Seidel (xarax)|
<jodeldi at gmx dot de>
|Example|| usability.wikimedia.org example|
|Check usage and version matrix; code metrics|
|Bugs: list open list all report|
The PdfHandler extension shows uploaded pdf files in a multipage preview layout. With the Proofread Page extension enabled, pdfs can be displayed side-by-side with text for transcribing books and other documents, as is commonly done with DjVu files (particularly in Wikisource).
License[edit | edit source]
Copyright 2007 xarax (s:de:Benutzer:Xarax)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Pre-requisites[edit | edit source]
This software was tested with 1.11alpha (r25574) – 1.12alpha (r28001) – 1.13.0rc1, and also 1.16wmf forward. It may or may not work with earlier versions, but please test it.
It requires the following packages:
- gs-gpl for gs to render the page images (http://www.ghostscript.com/)
- imagemagick for convert to resize the extracted images (http://www.imagemagick.org/script/index.php)
- xpdf-utils for pdfinfo to extract metadata from pdf (http://www.foolabs.com/xpdf/download.html). The poppler-utils package may be substituted for xpdf-utils on Ubuntu and Debian systems.
- If you don't know if you have these, you can type into a shell "which gs convert pdfinfo" to see if you have them installed.
It needs at least PHP 5.1.3 to work (dependency on SimpleXMLElement::addChild).
Download[edit | edit source]
You can download the extension directly from the MediaWiki source code repository (browse code). You can get:
- One of the extensions tags
Not all extensions have tags. Some extensions have tags for each release, in which case those tags have the same stability as the release. To download a tag
- Go to the tags list
- Click the name of the tag you want to download
- Click "snapshot"
- The latest version of one of the extensions branches
Each extension has a master branch containing the latest code (might be unstable). Extensions can have further branches as well.
- Go to the branches list
- Click the branch name
- Click "snapshot"
- A snapshot made during the release of a MediaWiki version.
This might be unstable and is not guaranteed to work with the associated MediaWiki version.
After you've got the code, save it into the extensions/PdfHandler directory of your wiki.
If you are familiar with git and have shell access to your server, you can obtain the extension, with all its tags and branches, as follows:
cd extensions git clone
Installation[edit | edit source]
To install, copy all the files in the archive you downloaded to the PdfHandler subdirectory of the extensions subdirectory of your MediaWiki installation.
In your MediaWiki LocalSettings.php, add the following line some place towards the bottom of the file:
Then you can set some variables in your LocalSettings.php (if you wish to override one of the defaults):
- $wgPdfProcessor – path to your ghostscript implementation (default to gs)
- $wgPdfPostProcessor – path to your imagemagick convert (default to convert)
- $wgPdfInfo – path to your pdfinfo (default to pdfinfo)
- $wgPdfOutputExtension – defaults to jpg
- $wgPdfHandlerDpi – defaults to 150 dpi
- The extension extracts a bitmap image for each page of the PDF, using this resolution (dpi = dots per inch). For example, a PDF page with the European size A4 is 210 mm wide, corresponding to 595 points (at 72 dpi). This yields an image 1240 pixels wide (at 150 dpi). If instead this parameter is set to 300 dpi, the width will be 2480 pixels.
- $wgPdfCreateThumbnailsInJobQueue - defaults to false
- Put creating pages' thumbnails into a job queue, so they do not have to be created while browsing a file page, but during normal wikibrowsing. Be advised that setting this to true may significantly increase CPU load of your webserver on a high-traffic website. Job queue was designed to perform quick tasks on page views, and creating thumbnails can be included into that queue, according to this definition. Nevertheless, as quick as it is, it also requires certain CPU load to convert PDF pages to another format. The solution to that would be either setting $wgJobRunRate to rather small value (ie. 0.05) or disabling job queue on the high-traffic wiki and setting up another server to do just job queue (like Wikimedia did).
Variables below are not specific to this extension:
- Enable PDF uploads, if you haven't already: $wgFileExtensions = 'pdf';
- $wgMaxShellMemory - memory limit for gs, convert and pdfinfo. The default value of 102400 kB (100 MB) might be too low.
Example for a WindowsXP installation:
$wgPdfProcessor = 'C:\Programme\gs\gs8.60\bin\gswin32.exe'; $wgPdfPostProcessor = $wgImageMagickConvertCommand; $wgPdfInfo = 'C:\Programme\xpdf-3.02pl1-win32\pdfinfo.exe'; $wgPdftoText = 'C:\Programme\xpdf-3.02pl1-win32\pdftotext.exe';
If $wgImageMagickConvertCommand is not already defined, use the definition alike
$wgPdfPostProcessor = 'C:\Programme\ImageMagick-6.6.2-Q16\convert.exe';
Example for a Ubuntu installation:
$wgPdfProcessor = 'gs'; $wgPdfPostProcessor = $wgImageMagickConvertCommand; $wgPdfInfo = 'pdfinfo';
Ubuntu Note: You must have already installed the following 3 packages; imagemagick ghostscript xpdf-utils
Usage[edit | edit source]
- The main usage of the PdfHandler extension is without user interaction. If you upload a new pdf file, the metadata will be stored in the database, and then this file can be shown in a multipage preview layout like the djvu handler does. Without this extension, pdfs will not display properly when uploaded.
- Additionally, this extension allows Extension:ProofreadPage to handle pdfs in side-by-side view for transcribing/proofreading, as is done on Wikisource
- Another option, introduced quite long ago (r25575), is to use it to display PDF files as an image, showing a single page at a time, like so: [[File:myPdfFile.pdf|page=1|600px]]. The page and size parameters are optional; the default page is page #1. Instead of a size-parameter, you can also use the thumb-parameter, with or without captions: [[File:myPdfFile.pdf|page=1|thumb|My PDF]].
- Because PdfHandler extends ImageHandler, you can use all the arguments that you would for an Image -- for example: thumb, right/left, caption, border, link, etc.
- If you would like to present a 2-page pdf, for example, do the following: [[File:myPdfFile.pdf|page=1]] [[File:myPdfFile.pdf|page=2]]
Bugs and enhancements[edit | edit source]
Bugs can be reported at Bugzilla
- PdfHandler bug reports, all of them, old and new
Enhancements[edit | edit source]
- There is a patch to run a background job for thumbnails to be created. Thanks to that, the thumbnails are created during normal browsing through wiki pages, instead of during browsing the PDF file page. The patch saves time by creating one thumbnail at the time, not causing any load deficiencies, unlike rendering them on PDF file pages.
See also[edit | edit source]
|This extension is being used on one or more Wikimedia projects. This probably means that the extension is stable and works well enough to be used by such high-traffic websites. Look for this extension's name in Wikimedia's CommonSettings.php and InitialiseSettings.php configuration files to see where it's installed. A full list of the extensions installed on a particular wiki can be seen on the wiki's Special:Version page.|