Extension talk:PdfHandler

Jump to navigation Jump to search

About this board

PdfHandler Talk Archive


Huwmanbeing (talkcontribs)

I run MediaWiki 1.33 on Ubuntu with the latest version of PdfHandler, but thumbnails aren't being shown for PDFs. There's no obvious error being generated, the thumbnail images just don't appear. (Embedding a link like [[File:Sample.pdf|page=1|thumb|Test]] produces only the text link with no image, and the file page itself shows only the generic PDF file type icon.)

I've verified I have all the necessary prerequisites:

  • /usr/bin/gs
  • /usr/bin/convert
  • /usr/bin/pdfinfo
  • /usr/bin/pdftotext

I've run the recommended maintenance scripts, but still no luck. I've also made sure it's appropriately configured in my LocalSettings.php — the settings are present and identical to the default settings. Does anyone have any ideas/suggestions as to what might be causing thumbnails not to appear? Any assistance would be much appreciated! Huwmanbeing (talk) 14:33, 7 September 2019 (UTC)

Kghbln (talkcontribs)

Did you set $wgGenerateThumbnailOnParse = false;? If yes, than you also need to set $wgThumbnailScriptPath = "{$wgScriptPath}/thumb.php";. This assumes that the rewrite rules in your virtual host / htaccess are correct. If no, when I do not know.

Huwmanbeing (talkcontribs)

Thanks very much for the suggestions — unfortunately no luck. No thumbnails in links, and the file page still just shows the PDF icon and the 0 x 0 size.

I seem to have hit a wall here: everything's apparently set correctly as far as I can tell, and there are no obvious errors, but it just doesn't work. Could anyone suggest ways to debug this?

Kghbln (talkcontribs)

Try this:

  1. Have a look if you have a "temp" directory at path/to/images/temp
  2. If not create this directory: mkdir path/to/images/temp
  3. Make it writable for the webserver chown www-data:www-data path/to/images/temp
  4. Run php path/to/maintenance/refreshImageMetadata.php -f
  5. Run php path/to/maintenance/rebuildImages.php
Huwmanbeing (talkcontribs)

I've confirmed that "images/temp" is present and chown-ed properly to www-data, and have run the maintenance scripts, but unfortunately still no luck.

I'm curious, though — what should one normally see when running rebuildImages.php? Here's what it's showing me:

Processing image... [...] Finished image... 0 of 852 rows updated Processing oldimage... Finished oldimage... 0 of 13 rows updated

Is the fact that it's reporting no rows updated in any way significant?

Kghbln (talkcontribs)

0 just means that there was nothing to update. The more interesting script is the one of point 4. If this ran without throwing errors you should be fine, i.e. no longer pdf files with 0 x 0 size. Cannot tell why it is not working for you now. :( I guess I am really out of ideas now. The permissions of "temp" are ok with 755?

Huwmanbeing (talkcontribs)

A breakthrough — I decided to start disabling all installed extensions just to see if there might be something there that's interfering, and I found that turning off TimedMediaHandler allows PDFs to be displayed and thumbnailed correctly. Now when refreshImageMetadata runs it does so successfully (well, mostly — it gives some warnings about some metadata inconsistencies in some of my .ogg files). I'll investigate further and see if I can find out more specifically what's happening, but at least it's working now.

Huwmanbeing (talkcontribs)

Here's the fault:

PHP Notice: Undefined offset: 0 in [...]/includes/libs/mime/MimeAnalyzer.php on line 811
[5cd1086a420507df361326be] [no req]   Error from line 15 of [...]/extensions/TimedMediaHandler/includes/handlers/ID3Handler/ID3Handler.php: Class 'getID3' not found
Backtrace:
#0 [...]/extensions/TimedMediaHandler/includes/handlers/ID3Handler/ID3Handler.php(51): ID3Handler->getID3(string)
#1 [...]/includes/utils/MWFileProps.php(84): ID3Handler->getMetadata(FSFile, string)
#2 [...]/includes/filerepo/FileRepo.php(1560): MWFileProps->getPropsFromPath(string, string)
#3 [...]/includes/filerepo/file/LocalFile.php(402): FileRepo->getFileProps(string)
#4 [...]/includes/filerepo/file/LocalFile.php(710): LocalFile->loadFromFile()
#5 [...]/maintenance/refreshImageMetadata.php(173): LocalFile->upgradeRow()
#6 [...]/maintenance/doMaintenance.php(96): RefreshImageMetadata->execute()
#7 [...]/maintenance/refreshImageMetadata.php(264): require_once(string)
#8 {main}
Kghbln (talkcontribs)

Oh, that's interesting. Good find. Are you also on REL1_33 for TimedMediaHandler? If yes, this is a bug which should be reported on Phabricator for TimedMediaHandler.

Huwmanbeing (talkcontribs)

Yep, my version of TimedMediaHandler is indeed the latest — REL1_33.

Looks like there's already a thread about it here and that other users have experienced something similar. It sounds like those (like me) who download the TimedMediaHandler extension as a file via the ExtensionDistributor will miss out on certain necessary dependencies, and that instead one must install it through composer. I'm afraid I'm not familiar with Phabricator, but if the problem hasn't already been raised there then it seems like it should be.

Thank you again for your help!

Reply to "Thumbnails not appearing"
213.211.236.242 (talkcontribs)

I'm running MWK 1.30.0 (PHP 5.6.35). And installed PHPHandler in the version of 26.Apr-2018.

I embedded an PDF in one of my wiki pages with: File:test.pdf|page=1|thumb|My PDF

But only the link to the file is shown an no image of the first page.

In the files overview of the wiki, only the default PDF icon for the document is shown.

I'm using the wiki on Windows 10 and these are the lines in my LocalSettings.php:

wfLoadExtension( 'PdfHandler' );

$wgGenerateThumbnailOnParse = true;


$wgUseImageMagick = true;

$wgImageMagickConvertCommand = 'C:\wamp64\ImageMagick-7.0.7-Q16\convert.exe';

$wgPdfProcessor = 'C:\wamp64\gs\gs9.23\bin\gswin64.exe';

$wgPdfPostProcessor = $wgImageMagickConvertCommand;

$wgPdfInfo = 'C:\wamp64\xpdf-tools-win-4.00\bin64\pdfinfo.exe';

$wgPdftoText = 'C:\wamp64\xpdf-tools-win-4.00\bin64\pdftotext.exe';

$wgPdfCreateThumbnailsInJobQueue = "false";

There are no error-logs generated and running these maintenance scripts also doesn't help:

php C:\wamp64\www\mediawiki\maintenance\refreshImageMetadata.php 
php C:\wamp64\www\mediawiki\maintenance\rebuildImages.php 
php C:\wamp64\www\mediawiki\maintenance\runjobs.php

Any idea what I can try or how I can test or debug the PDFHandler?

Huwmanbeing (talkcontribs)

I'm encountering the very same phenomenon and am not sure how to proceed. Did you ever figure out a solution? Huwmanbeing (talk) 11:42, 7 September 2019 (UTC)

Reply to "PDFHandler doesn't show images"
Bttfvgo (talkcontribs)

I keep receiving the following error,

[74c2d3075bfe95ae81e19be4] /Special:Upload MWException from line 497 of /var/www/html/includes/filerepo/file/LocalFile.php: Could not find data for image 'Flynn_statement_of_offense.pdf'.

Backtrace:
#0 /var/www/html/includes/filerepo/file/LocalFile.php(654): LocalFile->loadExtraFromDB()
#1 /var/www/html/includes/filerepo/file/LocalFile.php(922): LocalFile->load(integer)
#2 /var/www/html/extensions/PdfHandler/includes/CreatePdfThumbnailsJob.php(112): LocalFile->getMetadata()
#3 /var/www/html/includes/Hooks.php(174): CreatePdfThumbnailsJob::insertJobs(UploadFromFile, string, boolean)
#4 /var/www/html/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#5 /var/www/html/includes/upload/UploadBase.php(478): Hooks::run(string, array)
#6 /var/www/html/includes/upload/UploadBase.php(344): UploadBase->verifyFile()
#7 /var/www/html/includes/upload/UploadFromFile.php(95): UploadBase->verifyUpload()
#8 /var/www/html/includes/specials/SpecialUpload.php(506): UploadFromFile->verifyUpload()
#9 /var/www/html/includes/specials/SpecialUpload.php(204): SpecialUpload->processUpload()
#10 /var/www/html/includes/specialpage/SpecialPage.php(569): SpecialUpload->execute(NULL)
#11 /var/www/html/includes/specialpage/SpecialPageFactory.php(558): SpecialPage->run(NULL)
#12 /var/www/html/includes/MediaWiki.php(288): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#13 /var/www/html/includes/MediaWiki.php(865): MediaWiki->performRequest()
#14 /var/www/html/includes/MediaWiki.php(515): MediaWiki->main()
#15 /var/www/html/index.php(42): MediaWiki->run()
#16 {main}

, when trying to upload PDF files while running MW 1.33. It cannot find data but I can import the file via maintenance/importImages.php (and then just edit the description page) and it finds data then, but not when uploading using Upload Wizard. Anything I might be doing that is keeping this from working? It's worked for years, so this is frustrating! Thanks!

TieMichael (talkcontribs)

Same here!

Adding "$wgPdfCreateThumbnailsInJobQueue = false;" to LocalSettings.php seem to help

Reply to "Cannot upload PDF files"

Generating 1:1 size images from File: page?

4
Scarred Sun (talkcontribs)

Hi there,

I am interested in getting native-size PDF->JPG images available for reference in the File: display of a PDF but seem to be unable to get the raw size of the PDF page extracted. For example, when looking at https://necretro.org/File:KeithCourageinAlphaZones_PCE_HuCard_JP_Manual.pdf, you'll notice that the PDF's pages itself are 1,410 x 1,390 pts (1880px x 1853px), but the thumbnails generated under the preview are simply "Size of this JPG preview of this PDF file: 608 × 599 pixels. Other resolution: 243 × 240 pixels". How would I go about setting things up to get a full size preview? I know it must be feasible because by comparison https://en.wikipedia.org/wiki/File:JUA0680291.pdf lists the full size JPG preview as an option.

Dinoguy1000 (talkcontribs)

Have you looked at Extension:PdfHandler? One of its features is to provide thumbnail previews of a PDF file.

Scarred Sun (talkcontribs)

...this is the talk page for Extension:PdfHandler. I have it in use and yes, that would be the only way to provide the thumbnails. The issue is the specific size of thumbnail renders when in use.

Dinoguy1000 (talkcontribs)

Aah, sorry about that. I got here through notifications and lost track of what page this was on. =X

Reply to "Generating 1:1 size images from File: page?"
Tommyheyser (talkcontribs)

I'm sure this topic has come up before many times and from what I've found through searching were usually along the line of "just use PdfHandler" and not much details. I've gotten PdfHandler to work and it's showing the thumbnails on the File pages as well as creating text files of the pdf in the images folder. How does the MW built-in search engine, or other search engine (I got CirrusSearch/Elastica/ElasticSearch running) make use of the text files.

Is there a configuration setting I need to turn on for MW to recognise the generated text files when indexing contents?

I'm asking because I still don't see the content of the PDF in the search results, either using MW built-in search engine or CirrusSearch.

I hope it's alright that I'm posting this here. I've posted a similar question to this one in the Extension talk:CirrusSearch page as well.

Tommyheyser (talkcontribs)

Okay, not sure what happened, but since I'm running MW on Windows Server (sorry, forgot to mention this before), the standard PdfHandler extension with my "workaround" wasn't working 100%. Thumbnail creation was okay and I thought the pdftotext was working fine, but apparently not.

I tried using SeongMoon version of PdfHandler, ran maintenance/update.php, refreshImageMetadata.php, rebuildImages.php as well as extensions/CirrusSearch/maintenance/forceSearchIndex.php as per the https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README file and now it seemed to work and PDF contents are showing up in search results.

Reply to "Searching content of PDF files"

No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31

4
Tommyheyser (talkcontribs)

MW 1.31.1 running on Windows Server 2012 R2 IIS 8.5

I'm getting the following error (from $wgDebugLogFile output log file) for all execution of pdfinfo and pdftotext.

[exec] Error running "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf": 'pdfinfo" "-enc" "UTF-8" "-meta" "C:' is not recognized as an internal or external command, operable program or batch file.

I'm not sure if this is the result of the new Shell framework introduced in 1.30, Manual:Shell framework, which replaces wfShellExec(). The debug log line before the error is:

[exec] MediaWiki\Shell\Command::execute: "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf"

Tommyheyser (talkcontribs)
Tommyheyser (talkcontribs)

In case someone else is having this issue of not seeing PDF and is running MW 1.31 on Windows Server 2012 R2.

  1. I added the path to pdfinfo.exe and pdftotext.exe to System variables path (mine was C:\Program Files\xpdf-tools-win-4.00\bin64).
  2. Then, I edit {mediawiki install path}/extensions/PdfHandler/includes/PdfImage.php function retrieveMetaData.

a. Replacing:

$cmdMeta = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-meta',         # Report XMP metadata
$this->mFilename,
];

with

$cmdMeta = "pdfinfo.exe -enc UTF-8 -meta " . $this->mFilename;

b. Replacing

$cmdPages = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-l', '9999999', # Report page sizes for all pages
$this->mFilename,
];

with

$cmdPages = "pdfinfo.exe -enc UTF-8 -l 9999999 " . $this->mFilename;

c. Replacing

$cmd = [ $wgPdftoText,  $this->mFilename, '-' ];

with

$cmd = "pdftotext.exe " . $this->mFilename;


It's a bit of a hack, but it works. This should last until the issue is properly fixed.

173.77.3.157 (talkcontribs)
Reply to "No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31"

Direct linking to PDF page, When clicking to direct media

2
Gmillerd (talkcontribs)

Does anyone have a modification of the extension to make click of the PDF when a page is specified to go to that page?

/mediawiki/index.php?title=File:Filename.pdf&page=25

to the following, to make the browser skip to the specified page?

/images/0/0b/Filename.pdf#page=25

I am able to do it in javascript, but the PHP evades me.

$("#file.fullImageLink").find("a:first").each(function() {
    $(this).attr("href", $(this).attr("href") + "#page=" + getUrlParameter("page"));
});
212.59.13.226 (talkcontribs)

Use # instead of ...&page=25

Reply to "Direct linking to PDF page, When clicking to direct media"
151.61.39.181 (talkcontribs)

Using mediawiki 1.30 and the extension for this version (PdfHandler-REL1_30-53d9884.tar.gz) I could not get thumbnail generation on the image page, where I got, instead of images, a text error like:

Error creating thumbnail: convert: no decode delegate for this image format `' @ error/constitute.c/ReadImage/504. convert: no images defined `/var/www/fountainpen.it/mediawiki/images/tmp/transform_7d1af7cbffc4.jpg' @ error/convert.c/ConvertImageCommand/32

looking at the debug I got the command used to create the thumbnail, they are called around line 188 of PdfHandler_body.php. a pipe between gs and convert. The problem is reported by convert, but it's caused by ghostscript, that, for the PDF files I was using, added to standard output (altough -q option is present) some line like:

  **** Warning: considering '0000000000 XXXXX n' as a free entry.
  **** Warning: considering '0000000000 XXXXX n' as a free entry.
  **** Warning: considering '0000000000 XXXXX n' as a free entry.
  **** Warning: considering '0000000000 XXXXX n' as a free entry.

those lines went on top of the jpeg image created over the pipe passed to convert, who failed conversion. Saving the image and processing it manually gave no error. I could solve the issue adding a line:

"-sstdout=/dev/null",

to the parameters passed to ghostscript inside PdfHandler_body.php, with a patch like this:

--- a/PdfHandler/PdfHandler_body.php    2018-04-30 23:14:14.000000000 +0200
+++ b/PdfHandler/PdfHandler_body.php    2018-11-01 13:02:12.744146598 +0100
@@ -195,6 +195,7 @@
            "-r{$wgPdfHandlerDpi}",
            "-dBATCH",
            "-dNOPAUSE",
+            "-sstdout=/dev/null",
            "-q",
            $srcPath
        );
Reply to "Error creating thumbnail images"

Thumbnail creation exits with code '134'

2
Octfx (talkcontribs)

Trying to create thumbnails results in error code 134. Output from the Debug-Log:

PdfHandler::doTransform: called wfMkdirParents(/tmp)

MediaWikiShellCommand::execute: /bin/bash /var/www/<path>/includes/shell/limit.sh (/usr/bin/gs -sDEVICE=jpeg -sOutputFile=- -dFirstPage=1 -dLastPage=1 -dSAFER -r150 -dBATCH -dNOPAUSE -q <pathToPDF> | /usr/bin/convert -depth 8 -quality 95 -resize 120 - /tmp/transform_9f856aed71d9.jpg) MW_INCLUDE_STDERR=1;MW_CPU_LIMIT=0; MW_CGROUP=; MW_MEM_LIMIT=307200; MW_FILE_SIZE_LIMIT=102400; MW_WALL_CLOCK_LIMIT=180; MW_USE_LOG_PIPE=yes

[exec] Probably exited with signal 6: /bin/bash /var/www/<path>/includes/shell/limit.sh (/usr/bin/gs -sDEVICE=jpeg -sOutputFile=- -dFirstPage=1 -dLastPage=1 -dSAFER -r150 -dBATCH -dNOPAUSE -q <pathToPDF> | /usr/bin/convert -depth 8 -quality 95 -resize 120 - /tmp/transform_9f856aed71d9.jpg) MW_INCLUDE_STDERR=1;MW_CPU_LIMIT=0; MW_CGROUP=; MW_MEM_LIMIT=307200; MW_FILE_SIZE_LIMIT=102400; MW_WALL_CLOCK_LIMIT=180; MW_USE_LOG_PIPE=yes

RETURN CODE: 134

ERROR: /bin/bash: line 1: 27183 Done                    /usr/bin/gs -sDEVICE=jpeg -sOutputFile=- -dFirstPage=1 -dLastPage=1 -dSAFER -r150 -dBATCH -dNOPAUSE -q <pathToPDF>
27184 Aborted                 | /usr/bin/convert -depth 8 -quality 95 -resize 120 - /tmp/transform_9f856aed71d9.jpg

[thumbnail] Removing bad 0-byte thumbnail "/tmp/transform_9f856aed71d9.jpg". unlink() succeeded

Extension was setup following Extension:PdfHandler#Debian.

MW Version: 1.30

PHP: 7.1.2

Ghostscript / Poppler-Utils / Imagick are installed and functioning

151.61.39.181 (talkcontribs)

I got a similar error, not having thumbnail creation, I got a different error (I'm reporting it separately) but I solved adding "-sstdout=/dev/null", at parameter used for the ghostscript command invocatio.

Reply to "Thumbnail creation exits with code '134'"
Brunodapei (talkcontribs)
Reply to "Wrong font"