Extension:PdfHandler

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual
Crystal Clear action run.png
PdfHandler

Release status: stable

Implementation Media
Description Allows to handle PDF files like multipage DJVU
Author(s) Martin Seidel (xarax)
<jodeldi at gmx dot de>
MediaWiki 1.11+
Database changes No
License GPL
Download
Example usability.wikimedia.org example
j-crew.de example
Parameters
  • $wgPdfProcessor
  • $wgPdfPostProcessor
  • $wgPdfInfo
  • $wgPdftoText
  • $wgPdfOutputExtension
  • $wgPdfHandlerDpi
  • $wgPdfCreateThumbnailsInJobQueue
Hooks used
UploadVerifyFile

Translate the PdfHandler extension if it is available at translatewiki.net

Check usage and version matrix; code metrics
Bugs: list open list all report

The PdfHandler extension shows uploaded pdf files in a multipage preview layout. With the Proofread Page extension enabled, pdfs can be displayed side-by-side with text for transcribing books and other documents, as is commonly done with DjVu files (particularly in Wikisource).

See the BUGS section below for info on how to report problems.

License[edit | edit source]

Copyright 2007 xarax (s:de:Benutzer:Xarax)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Pre-requisites[edit | edit source]

This software was tested with 1.11alpha (r25574) – 1.12alpha (r28001) – 1.13.0rc1, and also 1.16wmf forward. It may or may not work with earlier versions, but please test it.

Type into a shell "which gs convert pdfinfo" to see if you have the below installed first.

PdfHandler requires the following packages:

Package Description Link
gs-gpl for gs Renders the page images http://www.ghostscript.com
imagemagick dynamic resizing and thumbnailing of images http://www.imagemagick.org/script/install-source.php for instructions on how to install
xpdf-utils for pdfinfo extract metadata from pdf http://www.foolabs.com/xpdf/download.html

The poppler-utils package may be substituted for xpdf-utils on Ubuntu and Debian systems.

PdfHandler needs at least PHP 5.1.3 to work (dependency on SimpleXMLElement::addChild).

Download[edit | edit source]

You can download the extension directly from the MediaWiki source code repository (browse code). You can get:

One of the extensions tags

Not all extensions have tags. Some extensions have tags for each release, in which case those tags have the same stability as the release. To download a tag

  • Go to the tags list
  • Click the name of the tag you want to download
  • Click "snapshot"
The latest version of one of the extensions branches

Each extension has a master branch containing the latest code (might be unstable). Extensions can have further branches as well.

  • Go to the branches list
  • Click the branch name
  • Click "snapshot"
A snapshot made during the release of a MediaWiki version.

This might be unstable and is not guaranteed to work with the associated MediaWiki version.

After you've got the code, save it into the extensions/PdfHandler directory of your wiki.

If you are familiar with git and have shell access to your server, you can obtain the extension, with all its tags and branches, as follows:

cd extensions
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/PdfHandler.git

Installation[edit | edit source]

To install, copy all the files in the archive you downloaded to the PdfHandler subdirectory of the extensions subdirectory of your MediaWiki installation.

In your MediaWiki LocalSettings.php, add the following line some place towards the bottom of the file:

require_once("$IP/extensions/PdfHandler/PdfHandler.php");

Then you can set some variables in your LocalSettings.php (if you wish to override one of the defaults):

  • $wgPdfProcessor – path to your ghostscript implementation (default to gs)
  • $wgPdfPostProcessor – path to your imagemagick convert (default to convert)
  • $wgPdfInfo – path to your pdfinfo (default to pdfinfo)
  • $wgPdfOutputExtension – defaults to jpg
  • $wgPdfHandlerDpi – defaults to 150 dpi
    • The extension extracts a bitmap image for each page of the PDF, using this resolution (dpi = dots per inch). For example, a PDF page with the European size A4 is 210 mm wide, corresponding to 595 points (at 72 dpi). This yields an image 1240 pixels wide (at 150 dpi). If instead this parameter is set to 300 dpi, the width will be 2480 pixels.
  • $wgPdfCreateThumbnailsInJobQueue - defaults to false
    • Put creating pages' thumbnails into a job queue, so they do not have to be created while browsing a file page, but during normal wikibrowsing. Be advised that setting this to true may significantly increase CPU load of your webserver on a high-traffic website. Job queue was designed to perform quick tasks on page views, and creating thumbnails can be included into that queue, according to this definition. Nevertheless, as quick as it is, it also requires certain CPU load to convert PDF pages to another format. The solution to that would be either setting $wgJobRunRate to rather small value (ie. 0.05) or disabling job queue on the high-traffic wiki and setting up another server to do just job queue (like Wikimedia did).

Variables below are not specific to this extension:

  • Enable PDF uploads, if you haven't already: $wgFileExtensions[] = 'pdf';
  • $wgMaxShellMemory - memory limit for gs, convert and pdfinfo. The default value of 102400 kB (100 MB) might be too low.


Example for a WindowsXP installation:

$wgPdfProcessor = 'C:\Programme\gs\gs8.60\bin\gswin32.exe';
$wgPdfPostProcessor = $wgImageMagickConvertCommand;
$wgPdfInfo = 'C:\Programme\xpdf-3.02pl1-win32\pdfinfo.exe';
$wgPdftoText = 'C:\Programme\xpdf-3.02pl1-win32\pdftotext.exe';

If $wgImageMagickConvertCommand is not already defined, use the definition alike

$wgPdfPostProcessor = 'C:\Programme\ImageMagick-6.6.2-Q16\convert.exe';

Example for a Ubuntu installation:

$wgPdfProcessor = 'gs';
$wgPdfPostProcessor = $wgImageMagickConvertCommand;
$wgPdfInfo = 'pdfinfo';

Ubuntu Note: You must have already installed the following 3 packages; imagemagick ghostscript xpdf-utils

Usage[edit | edit source]

  • The main usage of the PdfHandler extension is without user interaction. If you upload a new pdf file, the metadata will be stored in the database, and then this file can be shown in a multipage preview layout like the djvu handler does. Without this extension, pdfs will not display properly when uploaded.
  • Additionally, this extension allows Extension:ProofreadPage to handle pdfs in side-by-side view for transcribing/proofreading, as is done on Wikisource
  • Another option, introduced quite long ago (r25575), is to use it to display PDF files as an image, showing a single page at a time, like so: [[File:myPdfFile.pdf|page=1|600px]]. The page and size parameters are optional; the default page is page #1. Instead of a size-parameter, you can also use the thumb-parameter, with or without captions: [[File:myPdfFile.pdf|page=1|thumb|My PDF]].
  • Because PdfHandler extends ImageHandler, you can use all the arguments that you would for an Image -- for example: thumb, right/left, caption, border, link, etc.
If you would like to present a 2-page pdf, for example, do the following: [[File:myPdfFile.pdf|page=1]] [[File:myPdfFile.pdf|page=2]]

Bugs and enhancements[edit | edit source]

Bugs can be reported at Bugzilla

Bug list:

Enhancements[edit | edit source]

  • There is a patch to run a background job for thumbnails to be created. Thanks to that, the thumbnails are created during normal browsing through wiki pages, instead of during browsing the PDF file page. The patch saves time by creating one thumbnail at the time, not causing any load deficiencies, unlike rendering them on PDF file pages.

See also[edit | edit source]