Extension talk:PdfHandler

About this board

PdfHandler Talk Archive


School4schools (talkcontribs)

Have all packages installed (using AlmaLinux v8.9.0) along with poppler-utils.

Software:

MediaWiki  1.41.0
PHP        8.1.27 (fpm-fcgi)
ICU        69.1
MariaDB	   10.6.17-MariaDB
Lua        5.1.5
Pygments   2.16.1

Output "which gs convert pdfinfo pdftotext" returns:

/bin/gs
/bin/convert
/bin/pdfinfo
/bin/pdftotext

PDFs uploaded still read "0 x 0 px". However on the server, using

/pdfinfo "path/filename.pdf"

shows the file size ("612 x 79s pgs (letter)

I have run suggested maintenance scripts ("refreshImageMetadata.php -f" and "rebuildImages.php")

What am I doing wrong?

Reply to ""0 × 0 pixel""

/bin/bash: Permission denied

1
Bcmpinc (talkcontribs)

I'm getting an error during thumbnail generation. This is the error when viewing the file page: Fout bij het aanmaken van de miniatuurafbeelding: sh: 1: /bin/bash: Permission denied

Something goes wrong in doTransform in PdfHandler.php, but I don't understand what. I suspect the command line gets mangled somewhere. I've replaced $err = wfShellExecWithStderr( $cmd, $retval ); in that function with exec($cmd, $err, $retval);. This seems be a functional workaround.

Reply to "/bin/bash: Permission denied"

Installing xpdf-utils

3
TelePointHistory (talkcontribs)

I am trying to use PdfHandler to show thumbnails but know I don't have the pre-requisites.

Running the command in SSH: which gs convert pdfinfo pdftotext

shows that I have ghostscript but no packages for xpdf-utils

/bin/gs

/bin/convert

/usr/bin/which: no pdfinfo etc

/usr/bin/which: no pdftotext etc


Looking at this page https://www.xpdfreader.com/download.html I don't know what file to download or where to put it once it's downloaded. I am on Siteground hosting with WM 1.35.3 if that makes a difference. I'm hoping someone can give me a basic rundown for how it's meant to work. thanks.

School4schools (talkcontribs)

Did you ever get this extension working w Xpdf-utils / XpdfReader installation?

Kghbln (talkcontribs)

Siteground has to install this for you on the server I guess. Best way is to contact their support.

Reply to "Installing xpdf-utils"

No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31

6
Tommyheyser (talkcontribs)

MW 1.31.1 running on Windows Server 2012 R2 IIS 8.5

I'm getting the following error (from $wgDebugLogFile output log file) for all execution of pdfinfo and pdftotext.

[exec] Error running "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf": 'pdfinfo" "-enc" "UTF-8" "-meta" "C:' is not recognized as an internal or external command, operable program or batch file.

I'm not sure if this is the result of the new Shell framework introduced in 1.30, Manual:Shell framework, which replaces wfShellExec(). The debug log line before the error is:

[exec] MediaWiki\Shell\Command::execute: "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf"

Tommyheyser (talkcontribs)
Tommyheyser (talkcontribs)

In case someone else is having this issue of not seeing PDF and is running MW 1.31 on Windows Server 2012 R2.

  1. I added the path to pdfinfo.exe and pdftotext.exe to System variables path (mine was C:\Program Files\xpdf-tools-win-4.00\bin64).
  2. Then, I edit {mediawiki install path}/extensions/PdfHandler/includes/PdfImage.php function retrieveMetaData.

a. Replacing:

$cmdMeta = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-meta',         # Report XMP metadata
$this->mFilename,
];

with

$cmdMeta = "pdfinfo.exe -enc UTF-8 -meta " . $this->mFilename;

b. Replacing

$cmdPages = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-l', '9999999', # Report page sizes for all pages
$this->mFilename,
];

with

$cmdPages = "pdfinfo.exe -enc UTF-8 -l 9999999 " . $this->mFilename;

c. Replacing

$cmd = [ $wgPdftoText,  $this->mFilename, '-' ];

with

$cmd = "pdftotext.exe " . $this->mFilename;


It's a bit of a hack, but it works. This should last until the issue is properly fixed.

173.77.3.157 (talkcontribs)
TomRamm (talkcontribs)

Since the source code has changed considerably in the meantime, this approach no longer works. I have done the following to make it work for me:

created a new file in the scripts subfolder

scripts/retrieveMetaData.cmd

@echo off

if NOT "%PDFHANDLER_INFO%" == "" call:runInfo
if NOT "%PDFHANDLER_TOTEXT%" == "" call:runToText

EXIT /B %ERRORLEVEL%

:runInfo
	call "%PDFHANDLER_INFO%" -enc UTF-8	-meta file.pdf > meta
	call "%PDFHANDLER_INFO%" -enc UTF-8 -l 9999999 file.pdf > pages
EXIT /B 0

:runToText
	call "%PDFHANDLER_TOTEXT%" file.pdf - > text
	echo %ERRORLEVEL% > text_exit_code

EXIT /B 0

in includes/PdfImage.php In the function retrieveMetaData, I changed the call of the script depending on the operating system. Under Linux the original code is used, under Windows the .cmd script is called instead of the .sh script, and the script is not passed as a parameter but directly.

if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
	# 'This is a server using Windows!'
	$result = $command
		->params( 'scripts/retrieveMetaData.cmd' )
		->inputFileFromFile(
			'scripts/retrieveMetaData.cmd',
			__DIR__ . '/../scripts/retrieveMetaData.cmd' )
		->inputFileFromFile( 'file.pdf', $this->mFilename )
		->outputFileToString( 'meta' )
		->outputFileToString( 'pages' )
		->outputFileToString( 'text' )
		->outputFileToString( 'text_exit_code' )
		->environment( [
			'PDFHANDLER_INFO' => $wgPdfInfo,
			'PDFHANDLER_TOTEXT' => $wgPdftoText,
		] )
		->execute();
} else {
	# 'This is a server not using Windows!'
	$result = $command
		->params( $wgPdfHandlerShell, 'scripts/retrieveMetaData.sh' )
		->inputFileFromFile(
			'scripts/retrieveMetaData.sh',
			__DIR__ . '/../scripts/retrieveMetaData.sh' )
		->inputFileFromFile( 'file.pdf', $this->mFilename )
		->outputFileToString( 'meta' )
		->outputFileToString( 'pages' )
		->outputFileToString( 'text' )
		->outputFileToString( 'text_exit_code' )
		->environment( [
			'PDFHANDLER_INFO' => $wgPdfInfo,
			'PDFHANDLER_TOTEXT' => $wgPdftoText,
		] )
		->execute();
}		

--~~~~

Mwgbell (talkcontribs)

I had a similar problem using ImageMagick 7.1.0-19 Q16-HDRI with MedaiWiki 1.37.1 on Windows 11. To fix it, in extensions\PdfHandler\includes\PdfHandler.php

Change this line:

$cmd .= " | " . wfEscapeShellArg(

$wgPdfPostProcessor,

"-depth",

"8",

"-quality",

$wgPdfHandlerJpegQuality,

"-resize",

$width,

"-",

$dstPath

);

To this: (i.e. move the "-" to the first thing after the $wgPdfPostProcessor, line):

$cmd .= " | " . wfEscapeShellArg(

$wgPdfPostProcessor,

"-",

"-depth",

"8",

"-quality",

$wgPdfHandlerJpegQuality,

"-resize",

$width,

$dstPath

);

Reply to "No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31"

Previous/Next Page functionality

1
47.186.29.164 (talkcontribs)

On the file upload page for the PDF I see navigation buttons for next/previous pages, but I see no such navigation buttons on the page where the file is displayed. What am I doing wrong?

Reply to "Previous/Next Page functionality"

Direct linking to PDF page, When clicking to direct media

2
Gmillerd (talkcontribs)

Does anyone have a modification of the extension to make click of the PDF when a page is specified to go to that page?

/mediawiki/index.php?title=File:Filename.pdf&page=25

to the following, to make the browser skip to the specified page?

/images/0/0b/Filename.pdf#page=25

I am able to do it in javascript, but the PHP evades me.

$("#file.fullImageLink").find("a:first").each(function() {
    $(this).attr("href", $(this).attr("href") + "#page=" + getUrlParameter("page"));
});
212.59.13.226 (talkcontribs)

Use # instead of ...&page=25

No PDF images displayed

4
Darlig Gitarist (talkcontribs)

PDFHandler extension is supposed to allow viewing of pdf files. However, this does not appear to be working as advertised.

We've gone through the troubleshooting area of MediaWiki for this plugin and double-checked the paths to PDF converters. We re-ran the maint scripts for images and image meta. We checked the logs.

There is no indication of errors other than the images not showing up.

MediaWiki 1.35.5
PHP 7.4.27 (fpm-fcgi)
MySQL 5.7.37-0ubuntu0.18.04.1-log
ICU 60.2
Lua 5.1.5
PDF Handler – (16eda4b) 20:58, 2022 January 23

Any help or suggestions would be appreciated.

Cboltz (talkcontribs)

Wild guess: Some Linux distributions (for example openSUSE) have disabled rendering of PDF files in their default ImageMagick config because it has been a steady source of security issues (for example "ImageTragick"). In openSUSE, you'd need to install the ImageMagick-config-7-upstream package to enable rendering of PDF files.

Note: I don't know if Ubuntu did something similar with the ImageMagick config.

If unsure, test if converting a PDF to an image in the shell works: convert foo.pdf foo.png

Drewsaur (talkcontribs)

I have done this, and still can't get the extension to work. Any other ideas?

Michele.Fella (talkcontribs)
..check /etc/ImageMagick-<your_version>/policy.xml 

if <policy domain="coder" rights="none" pattern="PDF" /> means convert is not allowed to perform its job..

you might change rights="read | write"

but you should be aware and responsible of the security risks this might bring (as Cbolts mentioned)

Reply to "No PDF images displayed"

PDFHandler not working. Still displays File Link

3
199.27.199.51 (talkcontribs)

PDFhandler is confirmed installed on Special:Versions, which returns all the required directories, and no extensions could be interfering with the install. What's the issue?

Drewsaur (talkcontribs)

I am having this issue too. I have changed the settings in ImageMagick so that PDFs are able to be converted; verified this at the command line; verified that all 4 related utilities are working at the command line; run all the maintenance scripts; and...nothing.

Michele.Fella (talkcontribs)
..check /etc/ImageMagick-<your_version>/policy.xml 

if <policy domain="coder" rights="none" pattern="PDF" /> means convert is not allowed to perform its job..

you might change rights="read | write"

but you should be aware and responsible of the security risks this might bring (check post below from Cbolts)

Reply to "PDFHandler not working. Still displays File Link"
87.165.252.36 (talkcontribs)

Fehler beim Erstellen des Vorschaubildes: limit.sh: timed out executing command "('/usr/bin/gs' '-sDEVICE=jpeg' '-sOutputFile=-' '-sstdout=%stderr' '-dFirstPage=1' '-dLastPage=1' '-dSAFER' '-r150' '-dBATCH' '-dNOPAUSE' '-q' '/opt/mediawiki/images/7/7e/A.K.2023.pdf' | '/usr/bin/convert' '-depth' '8' '-quality' '95' '-resize' '800' '-' '/tmp/transform_96151e2ec90e.jpg')"

I already changed some settings but obviously not the right ones. What to do to get it work on a PDF file with A LOT of pixels in each direction?

Reply to "Timeout"

Any workaround for phabricator T220680/T211754

1
Pspviwki (talkcontribs)

I tried to achieve the same functionality like for example on the page in the commons File:PDF metadata.pdf having pdf browser in the single wiki page where it is possible to browse pdf file page by page by giving page number and go by using gallery tag and pdf handler. Unfortunately, it does not work it always shows only the first page, the end result was T220680 for pdf handler and gallery that went to T211754. Suggested hack from Russian wiki does not work. It really is blocking, the functionality works on the file page, it does not work using gallery tag but if it works on the file page, how to achieve it, is there any work around? Various PDFembed extensions are unusable. Thanks.

Reply to "Any workaround for phabricator T220680/T211754"