Topic on Extension talk:PdfHandler

No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31

6
Tommyheyser (talkcontribs)

MW 1.31.1 running on Windows Server 2012 R2 IIS 8.5

I'm getting the following error (from $wgDebugLogFile output log file) for all execution of pdfinfo and pdftotext.

[exec] Error running "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf": 'pdfinfo" "-enc" "UTF-8" "-meta" "C:' is not recognized as an internal or external command, operable program or batch file.

I'm not sure if this is the result of the new Shell framework introduced in 1.30, Manual:Shell framework, which replaces wfShellExec(). The debug log line before the error is:

[exec] MediaWiki\Shell\Command::execute: "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf"

Tommyheyser (talkcontribs)
Tommyheyser (talkcontribs)

In case someone else is having this issue of not seeing PDF and is running MW 1.31 on Windows Server 2012 R2.

  1. I added the path to pdfinfo.exe and pdftotext.exe to System variables path (mine was C:\Program Files\xpdf-tools-win-4.00\bin64).
  2. Then, I edit {mediawiki install path}/extensions/PdfHandler/includes/PdfImage.php function retrieveMetaData.

a. Replacing:

$cmdMeta = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-meta',         # Report XMP metadata
$this->mFilename,
];

with

$cmdMeta = "pdfinfo.exe -enc UTF-8 -meta " . $this->mFilename;

b. Replacing

$cmdPages = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-l', '9999999', # Report page sizes for all pages
$this->mFilename,
];

with

$cmdPages = "pdfinfo.exe -enc UTF-8 -l 9999999 " . $this->mFilename;

c. Replacing

$cmd = [ $wgPdftoText,  $this->mFilename, '-' ];

with

$cmd = "pdftotext.exe " . $this->mFilename;


It's a bit of a hack, but it works. This should last until the issue is properly fixed.

173.77.3.157 (talkcontribs)
TomRamm (talkcontribs)

Since the source code has changed considerably in the meantime, this approach no longer works. I have done the following to make it work for me:

created a new file in the scripts subfolder

scripts/retrieveMetaData.cmd

@echo off

if NOT "%PDFHANDLER_INFO%" == "" call:runInfo
if NOT "%PDFHANDLER_TOTEXT%" == "" call:runToText

EXIT /B %ERRORLEVEL%

:runInfo
	call "%PDFHANDLER_INFO%" -enc UTF-8	-meta file.pdf > meta
	call "%PDFHANDLER_INFO%" -enc UTF-8 -l 9999999 file.pdf > pages
EXIT /B 0

:runToText
	call "%PDFHANDLER_TOTEXT%" file.pdf - > text
	echo %ERRORLEVEL% > text_exit_code

EXIT /B 0

in includes/PdfImage.php In the function retrieveMetaData, I changed the call of the script depending on the operating system. Under Linux the original code is used, under Windows the .cmd script is called instead of the .sh script, and the script is not passed as a parameter but directly.

if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
	# 'This is a server using Windows!'
	$result = $command
		->params( 'scripts/retrieveMetaData.cmd' )
		->inputFileFromFile(
			'scripts/retrieveMetaData.cmd',
			__DIR__ . '/../scripts/retrieveMetaData.cmd' )
		->inputFileFromFile( 'file.pdf', $this->mFilename )
		->outputFileToString( 'meta' )
		->outputFileToString( 'pages' )
		->outputFileToString( 'text' )
		->outputFileToString( 'text_exit_code' )
		->environment( [
			'PDFHANDLER_INFO' => $wgPdfInfo,
			'PDFHANDLER_TOTEXT' => $wgPdftoText,
		] )
		->execute();
} else {
	# 'This is a server not using Windows!'
	$result = $command
		->params( $wgPdfHandlerShell, 'scripts/retrieveMetaData.sh' )
		->inputFileFromFile(
			'scripts/retrieveMetaData.sh',
			__DIR__ . '/../scripts/retrieveMetaData.sh' )
		->inputFileFromFile( 'file.pdf', $this->mFilename )
		->outputFileToString( 'meta' )
		->outputFileToString( 'pages' )
		->outputFileToString( 'text' )
		->outputFileToString( 'text_exit_code' )
		->environment( [
			'PDFHANDLER_INFO' => $wgPdfInfo,
			'PDFHANDLER_TOTEXT' => $wgPdftoText,
		] )
		->execute();
}		

--~~~~

Mwgbell (talkcontribs)

I had a similar problem using ImageMagick 7.1.0-19 Q16-HDRI with MedaiWiki 1.37.1 on Windows 11. To fix it, in extensions\PdfHandler\includes\PdfHandler.php

Change this line:

$cmd .= " | " . wfEscapeShellArg(

$wgPdfPostProcessor,

"-depth",

"8",

"-quality",

$wgPdfHandlerJpegQuality,

"-resize",

$width,

"-",

$dstPath

);

To this: (i.e. move the "-" to the first thing after the $wgPdfPostProcessor, line):

$cmd .= " | " . wfEscapeShellArg(

$wgPdfPostProcessor,

"-",

"-depth",

"8",

"-quality",

$wgPdfHandlerJpegQuality,

"-resize",

$width,

$dstPath

);

Reply to "No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31"