Extension talk:PdfHandler/Archive

A More Detailed Walkthrough?
I can't find half the files that are required as prerequisites. Perhaps an update is needed? Or at least some clarification. I found at least 10 different versions for ImageMagick, none of which was called ImageMagick exactly.

it would help if you told us which operating system, and version of MW/sql/webserver you're using. --Teststudent (talk) 00:38, 27 April 2012 (UTC)

I'll step in here. I'm running MediaWiki 1.22.0, PHP 5.4.24-nfs1, MySQL 5.3.12-MariaDB with InnoDB, and I'm on NearlyFreeSpeech. Any suggestions?

More info Please
I would really love to use this extension but its hard with some of the documentation missing :(

What I have tracked down is that it looks like someone wants it installed into wikisource

Here is the bug report https://bugzilla.wikimedia.org/show_bug.cgi?id=11215

An example of the extension is here http://www.xarax.eu/wiki/Datei:110.pdf which looks very cool!

I'm trying to gather more data on this extension. Anyone out there get it to work? How? Thanks!

--63.229.58.2 18:46, 27 May 2009 (UTC)


 * I got it to work, but only with PDF files uploaded after installation. Usage now clarified in the article. -- Jcpren 12:07, 16 July 2009 (UTC)

Error when uploading
When uploading a PDF (but no problem uploading normal images), I've got this error message on MediaWiki 1.10.0:

Fatal error: Call to undefined method Image::getPath in D:\www\mediawiki\extensions\PdfHandler\PdfHandler_body.php on line 101

If that helps, this the end of LocalSettings.php: require_once('extensions/PdfHandler/PdfHandler.php'); $wgPdfProcessor = "D:\gs\gs8.60\bin\gswin32.exe"; $wgPdfPostProcessor = $wgImageMagickConvertCommand; $wgPdfInfo = "D:\xpdf\pdfinfo.exe";

Is this because this extension requires MediaWiki 1.12.0?

--Flavien 15:01, 14 September 2007 (UTC)


 * This extension needs 1.11 at least, see the infobox in the upper right corner --Raymond


 * For version 1.10 you can replace $image->getPath with $image->getImagePath at line 101 --62.117.121.56 11:38, 26 September 2007 (UTC)


 * You should replace your double quotes with single quotes (e.g. change $wgPdfProcessor = "D:\gs\gs8.60\bin\gswin32.exe"; to $wgPdfProcessor = 'D:\gs\gs8.60\bin\gswin32.exe';, otherwise PHP will do funny things to your backspaces in the path. --Tbleher 13:17, 27 November 2007 (UTC)

Demo Page
Can I see a demo of this before I spend time installing it? I'd be happy to test, but I'm curious if it will suit my needs.

--216.226.127.136 15:08, 12 October 2007 (UTC)


 * http://www.xarax.eu/wiki/Datei:110.pdf -- Jcpren 12:08, 16 July 2009 (UTC)

theres is no xpdf
hi first of all thank you for this code. i was wondering will it still work even no xpdf is available?

Cool!
Hello, I just wanted to let you know that I saw this feature for the first time and think that it is really, really cool! --Langec 10:57, 26 November 2009 (UTC)

Other image size?
Hello, at first thanks for the great extension. One question: Is it possible to get larger preview images? Do I need to set a variable or do I need to change code for this?

Thanks for support! --Filburt 16:48, 4 February 2010 (UTC)


 * Ok, I figured it out: Just set the $wgImageLimits in your LocalSettings.php.


 * Greetings, --Filburt 18:41, 5 May 2010 (UTC)

Apache error
First of all thank you for this extension. Everything works, but when I checked apache error logs i got this:

operable program or batch file.'"pdftotext"' is not recognized as an internal or external command,operable program or batch file.

I dont know what that means?

thanks

Msevero 11:12, 23 March 2010 (UTC)

This means that xpdf-utils are not (fully) installed or your system and it can't find that program. Install it to get rid of the error.

Can't get this to work under Ubuntu 9.10
Hi.

I'm running MW 1.15.1. I have gs, pdfinfo and imagemagick installed. My settings are: $wgImageMagickConvertCommand = "/usr/bin/convert";

require_once( "$IP/extensions/PdfHandler/PdfHandler.php"); $wgPdfProcessor = '/usr/bin/gs'; $wgPdfPostProcessor = $wgImageMagickConvertCommand; $wgPdfInfo = '/usr/bin/pdfinfo'

However, when I upload a pdf I get blank grey panels with the following error in them: Error creating thumbnail: convert: no decode delegate for this image format `/tmp/magick-XX1umJ0i' @ magick/constitute.c/ReadImage/526. convert: missing an image filename `/home/webapps/wiki/images/thumb/2/23/Issue_2_amended.pdf /page1-424px-Issue_2_amended.pdf.jpg' @ wand/convert.c/ConvertImageCommand/2710.

Any ideas what might be wrong? Thanks! User:Mitchelln 17:12, 6th May 2010 (UTC)


 * I got it working on Kubuntu 9.10. Here's my relevant section from LocalSettings.php:

require_once("$IP/extensions/PdfHandler/PdfHandler.php"); $wgPdfProcessor = 'gs'; $wgPdfPostProcessor = $wgImageMagickConvertCommand; $wgPdfInfo = 'pdfinfo';

$wgImageMagickConvertCommand = "/usr/bin/convert";
 * It sounds like your problem is with ImageMagick. Try reinstalling that.--Rsberzerker 12:09, 8 May 2010 (UTC)
 * Solution: You have to increase your value for $wgMaxShellMemory in LocalSettings.php (in my case it was fixed by increasing to 1024000, default is 102400) -- Kirrmann 13:38, 18 August 2010 (UTC)

$wgMaxShellMemory = 1024000;


 * This solution worked for me when encountering the same problem. Thanks for posting it. Mike Peel 08:22, 28 August 2010 (UTC)
 * Also worked for me on Debian 5.0.4 and ImageMagick 6.3.7. Thanks.
 * Note that you may need to purge the cache (add ?action=purge to the end of your URL) to ensure the images are re-created after making this change.


 * I have the same problem. Testing it on command line both commands together (joined by a | ) fail, but testing it by processing 1st the  process save it to a file than 2nd process: process that image by   works fine. It turned out that   printed messages …   … and that, I guess—even the quiet option   is set—seems to conflict with , that reads from stdout of  . Unfortunately I found no way to suppress those warnings because they are printed to stdout and   reads exactly from that :-/ --Andreas P. Icon_External_Link_E-Mail.png 12:39, 6 January 2015 (UTC)

Landscape
Hi. PDFs with landscape mode look like crushed. Emijrp 19:59, 6 May 2010 (UTC)
 * See here for solution. --Danroa 13:35, 8 July 2010 (UTC)

Include "command not found" errors
Hey, maybe it would be a good idea to create an error log message when wfShellExec returns retval==127 Which means that the command (eg pdfinfo) is not found. An error message could say: xpdf-tools are not installed, for example.

Grey bars in most first page thumbnails
Everything seems to be installed correctly and working, but with some PDFs (not all) the first page thumbnail is just rows of evenly spaced thin grey bars. The subsequent pages seem to be fine. When you view the PDF file itself, there are no errors and everything looks fine, including the first page.

Alternatively, sometimes the thumbnail contains this error message:

Error creating thumbnail: jasper (code 0) jpc_dec_decodepkts failed jasper (code 0) error: cannot decode code stream unable to decode JPX image data.

GT gaidengt@gmail.com August 19, 2010

Update September 21, 2010 I had several problems that needed to be corrected.

1) Had to update to ghostscript 8.71 -- it seems the previous versions could not handle some of the newer PDF versions 2) Had to update MAX UPLOAD and MAX POST TIME variables in php.ini -- i believe larger PDFs were timing out before the thumbnail was created

The grey bar problem is gone now.

Problems Uploading PDFs
MW 1.16.0, your extension added. So, I can upload .pdf files from wikipedia and the IRS (US Govt). I am trying to upload from HUD (http://www.hud.gov/offices/hsg/sfh/nsc/mcm.cfm) and I get this:

'''Error creating thumbnail: **** Warning: An error occurred while reading an XREF table. The file has been damaged. This may have been caused by a problem while converting or transfering the file. Ghostscript will attempt to recover the data. ESP Ghostscript 815.02: Unrecoverable error, exit code 1 convert: no decode delegate for this image format `/tmp/magick-XXao3jiE' @ constitute.c/ReadImage/526. convert: missing an image filename `/home/pedia10/public_html/pmw/images/thumb/6/68/Ml1018qa.pdf/page2-463px-Ml1018qa.pdf.jpg' @ convert.c/ConvertImageCommand/2756.'''

The error seems to only be specific to HUD. It is a pure Adobe .pdf file as it's the US Government. Any .pdf from HUD is messed up.

I thought I ran across a Bug on this, but I'm not a programmer and the folks in #mediawiki IRC really had no idea. Any information would be great. BTW this is a new wiki install w/50 or so pages and only semantic-bundle installed; monobook theme. No bells and whistles. Thanx --Foreclosurepedia org 21:33, 3 November 2010 (UTC)


 * It's probably your version of Ghostscript, which doesn't handle certain PDFs well. One of my servers has GPL Ghostscript 8.62, which works just fine - while the other has ESP Ghostcript 815.02, just like you, and I'm getting the same error as you (it's the same database = same files on both). Upgrade is probably in order... --Dror Snir 12:03, 4 November 2010 (UTC)


 * Forgive my lack of understanding: I am on a hosted server; I pay for my space.  Is there some way to upgrade this from my end or do I contact them and ask them to upgrade this Ghostscript or am I way off and if so can you give me some detailed info on how to do such.  If, though, the upgrade is Extension-specific I understand that things take time and will eagerly await!  Thanx, also, for the blindingly fast reply!  Gives me encouragement!  :)  --Foreclosurepedia org 13:16, 4 November 2010 (UTC)


 * It is probably best for you to ask your provider to upgrade Ghostscript. Some host providers (I'm only acquainted with Dreamhost) allow you to install programs on your own, but that is less desirable and requires some knowledge on your part. --Dror Snir 13:59, 4 November 2010 (UTC)


 * Contacted them. Will respond back accordingly so that we can tick solved to this if appropriate.  --Foreclosurepedia org 16:39, 4 November 2010 (UTC)


 * My provider said that their version was the current, stable version 5.5 Not sure what that means, but they wouldn't change it.  Is there anything you might suggest or is there perhaps a way to take a .pdf from HUD and convert it somehow so that it's readable by this Extension?  Man, I love how it works! --Foreclosurepedia org 20:14, 5 November 2010 (UTC)


 * They probably meant they're running Linux CentOS 5.5 (it's an operating system), the same as me. It contains the older version of Ghostscript you're apparently stuck with. I can't find a solution that would fit your level of knowledge and especially your nonexistant server access, so I assume converting the PDFs (if there are not too many of them) will probably do the trick for you. However, I'm not familiar with pdf creation tools, so I can only wish you good luck... and hope that somebody here can help more.


 * All good! The extension is still incredible, in my opinion!  I don't have but a couple of pages anyway.  Thanks for all the help! --Foreclosurepedia org 00:22, 6 November 2010 (UTC)

Slow after migration to Ubuntu 10.04 LTS
PdfHandler is a very nice tool. But after migration to a server running Ubuntu 10.04 LTS it works very slow. I don't get any error message, but generating a preview takes about 20 seconds or longer. Any ideas?--Frickelpiet 20:30, 25 May 2011 (UTC)

Metadata contains entire text?
It looks to me like the entire text of the pdf is put into the img_metadata field in the mw_image table. A long document will result in an attempt to insert a large amount of data in this field. Is this really intended and if so why? --Obo 19:41, 18 December 2011 (UTC)
 * I was wondering if it has something to do with being able to search the full-text. This is actually a problem I've been trying to learn how to solve (without much luck), how can you index and search within pdfs so if a match comes, you can go right to the file. Here's a link describing the process in the abstract. https://bugzilla.wikimedia.org/show_bug.cgi?id=6422
 * Thanks for the link. I've gone ahead and commented out the appropriate lines in my own installed version of pdfhandler as the feature appears to serve no purpose at present and results in large pdf's causing database insert errors (for me anyway) and a significant increase in database size. --Obo 03:03, 3 January 2012 (UTC)

Using Thumbnails in Gallery
Is there a way to use pages of the rendered PDF in a gallery? Using the standard syntax

it is possible to show the first page of the document. How can I specify which page to display?

194.156.135.246 10:23, 16 March 2012 (UTC)

PdfHandler on Windows 2008 server
On a Windows 2008 R2 server, I have installed.
 * MediaWiki v1.18,
 * ImageMagick v6.7.6 Q16,
 * Ghostscript v9.05 (64 bit) and
 * Xpdf v3.03

ImageMagick is rendering thumbnails for MW successfully using the following settings: $wgUseImageMagick = true; $wgImageMagickConvertCommand = 'C:/ImageMagick-6.7.6-Q16/convert.exe'; $wgUploadPath = 'images'; $wgUploadDirectory = 'images'; $wgTmpDirectory = '{$wgUploadDirectory}/temp'; $wgImageMagickTempDir = $wgTmpDirectory;

However, the PdfHandler extension, for which I used these settings require_once("$IP/extensions/PdfHandler/PdfHandler.php"); $wgPdfProcessor = 'C:/gs/gs9.05/bin/gswin64c.exe'; $wgPdfPostProcessor = $wgImageMagickConvertCommand; $wgPdfInfo = 'C:/xpdfbin-win-3.03/bin64/pdfinfo.exe'; $wgPdftoText = 'C:/xpdfbin-win-3.03/bin64/pdftotext.exe'; doesn't work at all. No error, but also no thumbnails for PDF files (at least, not for existing ones).

So, far I tried debugging using MW's  and some additional   in almost all functions of the extension, but no luck. For an existing PDF file, neither metadata is fetched, nor a thumbnail is rendered.

In a test PHP file, I have split up PdfHandler's thumbnailing command into the commands for Ghostscript (GS) and for ImageMagick (IM). &lt;?php $array = array; echo "&lt;pre&gt;";

exec( 'C:/gs/gs9.05/bin/gswin64c.exe -sDEVICE=jpeg -sOutputFile=temporary.jpg -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -q test.pdf', $array ); exec( 'C:/ImageMagick-6.7.6-Q16/convert.exe temporary.jpg -depth 8 -resize 300px test.jpg', $array );
 * 1) call Ghostscript to extract the JPG
 * 1) call ImageMagick to resize

echo "&lt;br/&gt;".print_r($array)."&lt;br/&gt;"; echo "&lt;/pre&gt;"; ?&gt; Using the code posted above, both transforming the PDF file to a JPG file (GS) and resizing the image (IM) works as expected. But when I try to avoid the temporary file and recombine the calls again using a pipe (“|”), just like PdfHandler does, I have the same result as with PdfHandler: no thumbnail file, no error.

Did I miss an important setting? Is it an issue with pipes on 64 bit Windows? Any ideas?

--tom.dlh (talk) 15:49, 19 March 2012 (UTC)

Update: Behaviour on PDF uploads
Today I uploaded a PDF file to the MediaWiki and got a “Internal Server Error (500)” from IIS (v7.5): Module      : FastCgiModule Notification : ExecuteRequestHandler Handler     : PHP_via_FastCGI Error Code  : 0x00000000

--tom.dlh (talk) 11:44, 20 March 2012 (UTC)

Maximum PDF size
This is a great extension. I had success at first with small PDF files by following the instructions as described in the extension page. However my site has single page PDF files (high resolution maps) in the 10 - 15 mb size range. Here is there error I was getting:

Error creating thumbnail: [local path]/w/bin/ulimit4.sh: line 4: 3218 Done 'gs' -sDEVICE=jpeg -sOutputFile=- -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -q [input filename] ' 3219 File size limit exceeded| 'convert' -depth 8 -resize 180 - [output filename]

I have done some debugging was able to execute the command lines to convert the PDF to a jpeg using gs and then resize the jpeg using convert [part of imagemagick stick)separately with no errors. I found a post that suggested including in LocalSettings.php the line: $wgMaxShellFileSize = unlimited;

This has changed the error now to be:

Error creating thumbnail: convert: no decode delegate for this image format `/tmp/magick-XXNuz8A8' @ error/constitute.c/ReadImage/533. convert: missing an image filename `[my input file path and name].pdf/page1-180px-[file name].pdf.jpg' @ error/convert.c/ConvertImageCommand/2940.

To resolve the problem completely I had success by adding the following lines to LocalSettings.php:

$wgMaxShellMemory = unlimited; $wgUseImageResize = true;

Exception with 1.21
Please note bug 48834. This extension currently breaks uploads in MW 1.21. Do not use until this bug is resolved. 🐝 thingles (talk) 11:46, 26 May 2013 (UTC)

How to make PDFs searchable? (Oct. 2013)
Is it possible to make uploaded PDFs searchable in the Mediawiki search function, together with page content? The description mentions that text is extracted, but it does not seem to be indexed. Has anyone solved this (with a recent mediawiki version)? --Vigilius (talk) 09:20, 3 October 2013 (UTC)

not exactly compatible with 1.16
Version Info: MediaWiki : 1.16.1 (patched to enable page accessrestriction) PHP : 5.2.17 (apache2handler) MySQL : 5.1.58

Name      : ImageMagick Arch      : x86_64 Version   : 6.2.8.0 Release   : 15.el5_8

Name      : xpdf Arch      : x86_64 Epoch     : 1 Version   : 3.03 Release   : 8.el5.1

The above ImageMagick installed GhostScript

Was getting the following error: PHP Fatal error: Class 'BitmapMetadataHandler' not found in /var/www/html/mediawiki/extensions/PdfHandler/PdfHandler.image.php on line 207, referer: http://10.1.1.22/wiki/Special:Upload

Fixed this by copying the BitmapMetadataHandler.php from the NEW (1.22) version of mediawiki into the includes/media directory and manually adding a require_once into PdfHandler.image.php at the top.

So that error has been fixed, HOWEVER now Im getting the following: PHP Fatal error: Class 'XMPReader' not found in /var/www/html/mediawiki/extensions/PdfHandler/PdfHandler.image.php on line 300, referer: http://x.x.x.x/wiki/Special:Upload

Any ideas?

Thumbs quality
Please add in git: Or option.

Asian Fonts
It seems that the extension is not able to transform Asian fonts in images. While it works with other PDFs, I got this message at pages with Japanese PDFs: GPL Ghostscript 8.70: Unrecoverable error, exit code 1 convert: no decode delegate for this image format `/tmp/magick-XXVId71G' @ constitute.c/ReadImage/503. convert: missing an image filename `/tmp/transform_d3bd9d4b6b1f-1.jpg' @ convert.c/ConvertImageCommand/2800.

Since this is quite annoying, it would be fine to fix the bug or at least think of a more graceful error message. --193.171.198.6 11:01, 30 July 2014 (UTC)