Extension talk:PdfHandler/Archive

From mediawiki.org

A More Detailed Walkthrough?[edit]

I can't find half the files that are required as prerequisites. Perhaps an update is needed? Or at least some clarification. I found at least 10 different versions for ImageMagick, none of which was called ImageMagick exactly.


it would help if you told us which operating system, and version of MW/sql/webserver you're using. --Teststudent (talk) 00:38, 27 April 2012 (UTC)Reply

I'll step in here. I'm running MediaWiki 1.22.0, PHP 5.4.24-nfs1, MySQL 5.3.12-MariaDB with InnoDB, and I'm on NearlyFreeSpeech. Any suggestions?

More info Please[edit]

I would really love to use this extension but its hard with some of the documentation missing :(

What I have tracked down is that it looks like someone wants it installed into wikisource

Here is the bug report https://bugzilla.wikimedia.org/show_bug.cgi?id=11215

An example of the extension is here http://www.xarax.eu/wiki/Datei:110.pdf which looks very cool!

I'm trying to gather more data on this extension. Anyone out there get it to work? How? Thanks!

--63.229.58.2 18:46, 27 May 2009 (UTC)Reply

I got it to work, but only with PDF files uploaded after installation. Usage now clarified in the article. -- Jcpren 12:07, 16 July 2009 (UTC)Reply


Error when uploading[edit]

When uploading a PDF (but no problem uploading normal images), I've got this error message on MediaWiki 1.10.0:

Fatal error: Call to undefined method Image::getPath() in D:\www\mediawiki\extensions\PdfHandler\PdfHandler_body.php on line 101

If that helps, this the end of LocalSettings.php:

require_once('extensions/PdfHandler/PdfHandler.php');
$wgPdfProcessor = "D:\gs\gs8.60\bin\gswin32.exe";
$wgPdfPostProcessor = $wgImageMagickConvertCommand;
$wgPdfInfo = "D:\xpdf\pdfinfo.exe";

Is this because this extension requires MediaWiki 1.12.0?

--Flavien 15:01, 14 September 2007 (UTC)Reply

This extension needs 1.11 at least, see the infobox in the upper right corner --Raymond
For version 1.10 you can replace $image->getPath() with $image->getImagePath() at line 101 --62.117.121.56 11:38, 26 September 2007 (UTC)Reply
You should replace your double quotes with single quotes (e.g. change $wgPdfProcessor = "D:\gs\gs8.60\bin\gswin32.exe"; to $wgPdfProcessor = 'D:\gs\gs8.60\bin\gswin32.exe';, otherwise PHP will do funny things to your backspaces in the path. --Tbleher 13:17, 27 November 2007 (UTC)Reply

Demo Page[edit]

Can I see a demo of this before I spend time installing it? I'd be happy to test, but I'm curious if it will suit my needs.

--216.226.127.136 15:08, 12 October 2007 (UTC)Reply

http://www.xarax.eu/wiki/Datei:110.pdf -- Jcpren 12:08, 16 July 2009 (UTC)Reply

theres is no xpdf[edit]

hi first of all thank you for this code. i was wondering will it still work even no xpdf is available?

Cool![edit]

Hello, I just wanted to let you know that I saw this feature for the first time and think that it is really, really cool! --Langec 10:57, 26 November 2009 (UTC)Reply


Other image size?[edit]

Hello, at first thanks for the great extension. One question: Is it possible to get larger preview images? Do I need to set a variable or do I need to change code for this?

Thanks for support! --Filburt 16:48, 4 February 2010 (UTC)Reply

Ok, I figured it out: Just set the $wgImageLimits in your LocalSettings.php.
Greetings, --Filburt 18:41, 5 May 2010 (UTC)Reply

Apache error[edit]

First of all thank you for this extension. Everything works, but when I checked apache error logs i got this:

operable program or batch file.'"pdftotext"' is not recognized as an internal or external command,operable program or batch file.

I dont know what that means?

thanks

Msevero 11:12, 23 March 2010 (UTC)Reply

This means that xpdf-utils are not (fully) installed or your system and it can't find that program. Install it to get rid of the error.

Can't get this to work under Ubuntu 9.10[edit]

Hi.

I'm running MW 1.15.1. I have gs, pdfinfo and imagemagick installed. My settings are:
$wgImageMagickConvertCommand = "/usr/bin/convert";

require_once( "$IP/extensions/PdfHandler/PdfHandler.php");
$wgPdfProcessor = '/usr/bin/gs';
$wgPdfPostProcessor = $wgImageMagickConvertCommand;
$wgPdfInfo = '/usr/bin/pdfinfo'

However, when I upload a pdf I get blank grey panels with the following error in them:

Error creating thumbnail: convert: no decode delegate for this image format
`/tmp/magick-XX1umJ0i' @ magick/constitute.c/ReadImage/526.
convert: missing an image filename `/home/webapps/wiki/images/thumb/2/23/Issue_2_amended.pdf
/page1-424px-Issue_2_amended.pdf.jpg' @ wand/convert.c/ConvertImageCommand/2710.

Any ideas what might be wrong?
Thanks! User:Mitchelln 17:12, 6th May 2010 (UTC)

I got it working on Kubuntu 9.10. Here's my relevant section from LocalSettings.php:
require_once("$IP/extensions/PdfHandler/PdfHandler.php");
$wgPdfProcessor = 'gs';
$wgPdfPostProcessor = $wgImageMagickConvertCommand;
$wgPdfInfo = 'pdfinfo';

$wgImageMagickConvertCommand = "/usr/bin/convert";
It sounds like your problem is with ImageMagick. Try reinstalling that.--Rsberzerker 12:09, 8 May 2010 (UTC)Reply
Solution: You have to increase your value for $wgMaxShellMemory in LocalSettings.php (in my case it was fixed by increasing to 1024000, default is 102400) -- Kirrmann 13:38, 18 August 2010 (UTC)Reply
$wgMaxShellMemory = 1024000;
This solution worked for me when encountering the same problem. Thanks for posting it. Mike Peel 08:22, 28 August 2010 (UTC)Reply
Also worked for me on Debian 5.0.4 and ImageMagick 6.3.7. Thanks.
Note that you may need to purge the cache (add ?action=purge to the end of your URL) to ensure the images are re-created after making this change.
I have the same problem. Testing it on command line both commands together (joined by a | ) fail, but testing it by processing 1st the gs process save it to a file than 2nd process: process that image by convert works fine. It turned out that gs printed messages …
**** Warning: glyf overlaps cmap, truncating.
**** Warning: glyf overlaps cmap, truncating.
… and that, I guess—even the quiet option -q is set—seems to conflict with convert, that reads from stdout of gs. Unfortunately I found no way to suppress those warnings because they are printed to stdout and convert reads exactly from that :-/ --Andreas P. 12:39, 6 January 2015 (UTC)Reply

Landscape[edit]

Hi. PDFs with landscape mode look like crushed. Emijrp 19:59, 6 May 2010 (UTC)Reply

See here for solution. --Danroa 13:35, 8 July 2010 (UTC)Reply

Include "command not found" errors[edit]

Hey, maybe it would be a good idea to create an error log message when wfShellExec returns retval==127 Which means that the command (eg pdfinfo) is not found. An error message could say: xpdf-tools are not installed, for example.

Grey bars in most first page thumbnails[edit]

Everything seems to be installed correctly and working, but with some PDFs (not all) the first page thumbnail is just rows of evenly spaced thin grey bars. The subsequent pages seem to be fine. When you view the PDF file itself, there are no errors and everything looks fine, including the first page.

Alternatively, sometimes the thumbnail contains this error message:

Error creating thumbnail: jasper (code 0) jpc_dec_decodepkts failed jasper (code 0) error: cannot decode code stream unable to decode JPX image data.

GT gaidengt@gmail.com August 19, 2010

Update September 21, 2010 I had several problems that needed to be corrected.

1) Had to update to ghostscript 8.71 -- it seems the previous versions could not handle some of the newer PDF versions 2) Had to update MAX UPLOAD and MAX POST TIME variables in php.ini -- i believe larger PDFs were timing out before the thumbnail was created

The grey bar problem is gone now.

Problems Uploading PDFs[edit]

MW 1.16.0, your extension added. So, I can upload .pdf files from wikipedia and the IRS (US Govt). I am trying to upload from HUD (http://www.hud.gov/offices/hsg/sfh/nsc/mcm.cfm) and I get this:

Error creating thumbnail: **** Warning: An error occurred while reading an XREF table. The file has been damaged. This may have been caused by a problem while converting or transfering the file. Ghostscript will attempt to recover the data. ESP Ghostscript 815.02: Unrecoverable error, exit code 1 convert: no decode delegate for this image format `/tmp/magick-XXao3jiE' @ constitute.c/ReadImage/526. convert: missing an image filename `/home/pedia10/public_html/pmw/images/thumb/6/68/Ml1018qa.pdf/page2-463px-Ml1018qa.pdf.jpg' @ convert.c/ConvertImageCommand/2756.

The error seems to only be specific to HUD. It is a pure Adobe .pdf file as it's the US Government. Any .pdf from HUD is messed up.

I thought I ran across a Bug on this, but I'm not a programmer and the folks in #mediawiki IRC really had no idea. Any information would be great. BTW this is a new wiki install w/50 or so pages and only semantic-bundle installed; monobook theme. No bells and whistles. Thanx --Foreclosurepedia org 21:33, 3 November 2010 (UTC)Reply

It's probably your version of Ghostscript, which doesn't handle certain PDFs well. One of my servers has GPL Ghostscript 8.62, which works just fine - while the other has ESP Ghostcript 815.02, just like you, and I'm getting the same error as you (it's the same database = same files on both). Upgrade is probably in order... --Dror Snir 12:03, 4 November 2010 (UTC)Reply
Forgive my lack of understanding: I am on a hosted server; I pay for my space. Is there some way to upgrade this from my end or do I contact them and ask them to upgrade this Ghostscript or am I way off and if so can you give me some detailed info on how to do such. If, though, the upgrade is Extension-specific I understand that things take time and will eagerly await! Thanx, also, for the blindingly fast reply! Gives me encouragement!  :) --Foreclosurepedia org 13:16, 4 November 2010 (UTC)Reply
It is probably best for you to ask your provider to upgrade Ghostscript. Some host providers (I'm only acquainted with Dreamhost) allow you to install programs on your own, but that is less desirable and requires some knowledge on your part. --Dror Snir 13:59, 4 November 2010 (UTC)Reply
Contacted them. Will respond back accordingly so that we can tick solved to this if appropriate. --Foreclosurepedia org 16:39, 4 November 2010 (UTC)Reply
My provider said that their version was the current, stable version 5.5 Not sure what that means, but they wouldn't change it. Is there anything you might suggest or is there perhaps a way to take a .pdf from HUD and convert it somehow so that it's readable by this Extension? Man, I love how it works! --Foreclosurepedia org 20:14, 5 November 2010 (UTC)Reply
They probably meant they're running Linux CentOS 5.5 (it's an operating system), the same as me. It contains the older version of Ghostscript you're apparently stuck with. I can't find a solution that would fit your level of knowledge and especially your nonexistant server access, so I assume converting the PDFs (if there are not too many of them) will probably do the trick for you. However, I'm not familiar with pdf creation tools, so I can only wish you good luck... and hope that somebody here can help more.
All good! The extension is still incredible, in my opinion! I don't have but a couple of pages anyway. Thanks for all the help! --Foreclosurepedia org 00:22, 6 November 2010 (UTC)Reply

Slow after migration to Ubuntu 10.04 LTS[edit]

PdfHandler is a very nice tool. But after migration to a server running Ubuntu 10.04 LTS it works very slow. I don't get any error message, but generating a preview takes about 20 seconds or longer. Any ideas?--Frickelpiet 20:30, 25 May 2011 (UTC)Reply

Metadata contains entire text?[edit]

It looks to me like the entire text of the pdf is put into the img_metadata field in the mw_image table. A long document will result in an attempt to insert a large amount of data in this field. Is this really intended and if so why? --Obo 19:41, 18 December 2011 (UTC)Reply

I was wondering if it has something to do with being able to search the full-text. This is actually a problem I've been trying to learn how to solve (without much luck), how can you index and search within pdfs so if a match comes, you can go right to the file. Here's a link describing the process in the abstract. https://bugzilla.wikimedia.org/show_bug.cgi?id=6422
Thanks for the link. I've gone ahead and commented out the appropriate lines in my own installed version of pdfhandler as the feature appears to serve no purpose at present and results in large pdf's causing database insert errors (for me anyway) and a significant increase in database size. --Obo 03:03, 3 January 2012 (UTC)Reply


Using Thumbnails in Gallery[edit]

Is there a way to use pages of the rendered PDF in a gallery? Using the standard syntax

 <gallery>
 Image:filename.pdf
 </gallery>

it is possible to show the first page of the document. How can I specify which page to display?

194.156.135.246 10:23, 16 March 2012 (UTC)Reply

PdfHandler on Windows 2008 server[edit]

On a Windows 2008 R2 server, I have

  • MediaWiki v1.18,
  • ImageMagick v6.7.6 Q16,
  • Ghostscript v9.05 (64 bit) and
  • Xpdf v3.03

installed.

ImageMagick is rendering thumbnails for MW successfully using the following settings:

$wgUseImageMagick = true;
$wgImageMagickConvertCommand = 'C:/ImageMagick-6.7.6-Q16/convert.exe';
$wgUploadPath = 'images'; 
$wgUploadDirectory = 'images';
$wgTmpDirectory = '{$wgUploadDirectory}/temp';
$wgImageMagickTempDir = $wgTmpDirectory;

However, the PdfHandler extension, for which I used these settings

require_once("$IP/extensions/PdfHandler/PdfHandler.php");
$wgPdfProcessor = 'C:/gs/gs9.05/bin/gswin64c.exe';
$wgPdfPostProcessor = $wgImageMagickConvertCommand;
$wgPdfInfo = 'C:/xpdfbin-win-3.03/bin64/pdfinfo.exe';
$wgPdftoText = 'C:/xpdfbin-win-3.03/bin64/pdftotext.exe';

doesn't work at all. No error, but also no thumbnails for PDF files (at least, not for existing ones).

So, far I tried debugging using MW's $wgDebugLogFile and some additional wfDebug( __METHOD__ ); in almost all functions of the extension, but no luck. For an existing PDF file, neither metadata is fetched, nor a thumbnail is rendered.

In a test PHP file, I have split up PdfHandler's thumbnailing command into the commands for Ghostscript (GS) and for ImageMagick (IM).

<?php
$array = array();
echo "<pre>";

# call Ghostscript to extract the JPG
exec( 'C:/gs/gs9.05/bin/gswin64c.exe -sDEVICE=jpeg -sOutputFile=temporary.jpg -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -q test.pdf', $array );
# call ImageMagick to resize
exec( 'C:/ImageMagick-6.7.6-Q16/convert.exe temporary.jpg -depth 8 -resize 300px test.jpg', $array );

echo "<br/>".print_r($array)."<br/>"; 
echo "</pre>";
?>

Using the code posted above, both transforming the PDF file to a JPG file (GS) and resizing the image (IM) works as expected. But when I try to avoid the temporary file and recombine the calls again using a pipe (“|”), just like PdfHandler does, I have the same result as with PdfHandler: no thumbnail file, no error.

Did I miss an important setting? Is it an issue with pipes on 64 bit Windows? Any ideas?

--tom.dlh (talk) 15:49, 19 March 2012 (UTC)Reply

Update: Behaviour on PDF uploads[edit]

Today I uploaded a PDF file to the MediaWiki and got a “Internal Server Error (500)” from IIS (v7.5):

Module       : FastCgiModule
Notification : ExecuteRequestHandler
Handler      : PHP_via_FastCGI
Error Code   : 0x00000000

--tom.dlh (talk) 11:44, 20 March 2012 (UTC)Reply

Maximum PDF size[edit]

This is a great extension. I had success at first with small PDF files by following the instructions as described in the extension page. However my site has single page PDF files (high resolution maps) in the 10 - 15 mb size range. Here is there error I was getting:

Error creating thumbnail: [local path]/w/bin/ulimit4.sh: line 4: 3218 Done 'gs' -sDEVICE=jpeg -sOutputFile=- -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -q [input filename] ' 3219 File size limit exceeded| 'convert' -depth 8 -resize 180 - [output filename]

I have done some debugging was able to execute the command lines to convert the PDF to a jpeg using gs and then resize the jpeg using convert [part of imagemagick stick)separately with no errors. I found a post that suggested including in LocalSettings.php the line: $wgMaxShellFileSize = unlimited;

This has changed the error now to be:

Error creating thumbnail: convert: no decode delegate for this image format `/tmp/magick-XXNuz8A8' @ error/constitute.c/ReadImage/533. convert: missing an image filename `[my input file path and name].pdf/page1-180px-[file name].pdf.jpg' @ error/convert.c/ConvertImageCommand/2940.

To resolve the problem completely I had success by adding the following lines to LocalSettings.php:

$wgMaxShellMemory = unlimited; $wgUseImageResize = true;

Exception with 1.21[edit]

Please note bug 48834. This extension currently breaks uploads in MW 1.21. Do not use until this bug is resolved. 🐝 thingles (talk) 11:46, 26 May 2013 (UTC)Reply

How to make PDFs searchable? (Oct. 2013)[edit]

Is it possible to make uploaded PDFs searchable in the Mediawiki search function, together with page content? The description mentions that text is extracted, but it does not seem to be indexed. Has anyone solved this (with a recent mediawiki version)? --Vigilius (talk) 09:20, 3 October 2013 (UTC)Reply

not exactly compatible with 1.16[edit]

Version Info: MediaWiki : 1.16.1 (patched to enable page accessrestriction) PHP : 5.2.17 (apache2handler) MySQL : 5.1.58

Name  : ImageMagick Arch  : x86_64 Version  : 6.2.8.0 Release  : 15.el5_8

Name  : xpdf Arch  : x86_64 Epoch  : 1 Version  : 3.03 Release  : 8.el5.1

The above ImageMagick installed GhostScript

Was getting the following error: PHP Fatal error: Class 'BitmapMetadataHandler' not found in /var/www/html/mediawiki/extensions/PdfHandler/PdfHandler.image.php on line 207, referer: http://10.1.1.22/wiki/Special:Upload

Fixed this by copying the BitmapMetadataHandler.php from the NEW (1.22) version of mediawiki into the includes/media directory and manually adding a require_once into PdfHandler.image.php at the top.

So that error has been fixed, HOWEVER now Im getting the following: PHP Fatal error: Class 'XMPReader' not found in /var/www/html/mediawiki/extensions/PdfHandler/PdfHandler.image.php on line 300, referer: http://x.x.x.x/wiki/Special:Upload


Any ideas?

Thumbs quality[edit]

Please add in git:

		$cmd .= " | " . wfEscapeShellArg(
			$wgPdfPostProcessor,
			"-depth",
			"8",
			"-quality",
			"100",

Or option.

Asian Fonts[edit]

It seems that the extension is not able to transform Asian fonts in images. While it works with other PDFs, I got this message at pages with Japanese PDFs:

GPL Ghostscript 8.70: Unrecoverable error, exit code 1 convert: no decode delegate for this image format `/tmp/magick-XXVId71G' @ constitute.c/ReadImage/503. convert: missing an image filename `/tmp/transform_d3bd9d4b6b1f-1.jpg' @ convert.c/ConvertImageCommand/2800.

Since this is quite annoying, it would be fine to fix the bug or at least think of a more graceful error message.

--193.171.198.6 11:01, 30 July 2014 (UTC)Reply