To count number of one page PDF file one can use select count(*) from image where img_minor_mime='pdf' and img_metadata like '%\"Pages\";s:1:\"1";%'
. I am looking for something similar for DjVu files. Are there better ways for counting pages in multipage files?
Topic on Talk:Quarry
It seems that DjVu handler retrieves page count from img_metadata as well. It does so by parsing the xml, finding the 'metatree' element and then counting the amount of objects contained within it.
$count = count( $metatree->xpath( '//OBJECT' ) );
https://github.com/wikimedia/mediawiki/blob/master/includes/media/DjVuHandler.php#L292
A better way might be to use the api here, which has defined accessors for this (it's in the dimensions properties of images/media):
I know I can access number of pages from lua and I see I can also use API, but I was looking for a way to build SQL query to detect single page DjVu files. I think I got it: quarry:query/32028 seems to work.
This does not seem to work for all DjVu files. Many seems to have as metadata only:
{
"blobs": {
"data": "tt:609532023",
"text": "tt:609532024"
},
"data": []
}
Which are pointers into text table, but it is not really possible to access text table? At least not for me as I am working through SQL dumps and it seems it is not available there.
Note that there was recently a major change to how djvu metadata is stored and parsed for MediaWiki. See also recent changes in https://phabricator.wikimedia.org/T275268 and https://phabricator.wikimedia.org/T192866
Depending on when your SQL dump originates, this might effect what you see
Yes, I have noticed this. As I said, metadata is now moved to blob storage. But there are no dumps of blob storage? So how does one access metadata now in offline mode?