Topic on Talk:Quarry

Number of pages in DjVu files

6
Jarekt (talkcontribs)

To count number of one page PDF file one can use select count(*) from image where img_minor_mime='pdf' and img_metadata like '%\"Pages\";s:1:\"1";%'. I am looking for something similar for DjVu files. Are there better ways for counting pages in multipage files?

TheDJ (talkcontribs)

It seems that DjVu handler retrieves page count from img_metadata as well. It does so by parsing the xml, finding the 'metatree' element and then counting the amount of objects contained within it.

$count = count( $metatree->xpath( '//OBJECT' ) );

https://github.com/wikimedia/mediawiki/blob/master/includes/media/DjVuHandler.php#L292

A better way might be to use the api here, which has defined accessors for this (it's in the dimensions properties of images/media):

https://commons.wikimedia.org/w/api.php?action=query&titles=File:H.M.S.%20Pinafore.djvu&prop=imageinfo&iiprop=timestamp%7Cuser%7Curl%7Cdimensions


Jarekt (talkcontribs)

I know I can access number of pages from lua and I see I can also use API, but I was looking for a way to build SQL query to detect single page DjVu files. I think I got it: quarry:query/32028 seems to work.

Mitar (talkcontribs)

This does not seem to work for all DjVu files. Many seems to have as metadata only:


{

  "blobs": {

   "data": "tt:609532023",

   "text": "tt:609532024"

  },

  "data": []

}


Which are pointers into text table, but it is not really possible to access text table? At least not for me as I am working through SQL dumps and it seems it is not available there.

TheDJ (talkcontribs)
Mitar (talkcontribs)

Yes, I have noticed this. As I said, metadata is now moved to blob storage. But there are no dumps of blob storage? So how does one access metadata now in offline mode?

Reply to "Number of pages in DjVu files"