Extension:CommonsMetadata

From MediaWiki.org
Jump to: navigation, search
Translate this page; This page contains changes which are not marked for translation.

Altre lingue:
English • ‎British English • ‎español • ‎italiano • ‎日本語 • ‎polski • ‎中文
MediaWiki extensions manual
Crystal Clear action run.png
CommonsMetadata

Release status: stable

Implementation API
Description Attempts at extracting metadata from commons pages
Author(s) Brian Wolff (bawolfftalk)
MediaWiki 1.22+
PHP 5.3+
Database changes No
License GPLv2
Download
Hooks used
GetExtendedMetadata

ValidateExtendedMetadataCache
UnitTestsList

Translate the CommonsMetadata extension if it is available at translatewiki.net

Check usage and version matrix; code metrics
Bugs: list open list all report

The CommonsMetadata extension is an attempt at extracting metadata from Wikimedia Commons pages. It adds some extra information to the imageinfo API, based on templates and categories in the image description.

The extension in its current form is intended to be a temporary solution, eventually replaced by Wikidata on Commons.

Motivation & design choices[edit | edit source]

See http://lists.wikimedia.org/pipermail/wikitech-l/2013-August/071593.html

The assumptions of this extension are the following.

  • At some point in the future, wikidata will take over handling metadata at commons. In order to avoid disruptive changes, which will soon need to be changed again, the extension should work with commons metadata as it currently is (so not introducing new parser functions). Hence screen scraping.
  • The content of many of the fields on a commons description page include rich formatting (In particular: Links, italics, bold. In some cases more complex things like embedded images)
    • As a result, extension outputs parsed html (wikitext sucks, plain text doesn't capture the data)
    • Futhermore, the data tends to be formatted for human display, rather than (for example) machine formatted dates. When the date field says something like "circa 1600s", its hard to convert that to a precise date (otoh, many examples can be).
    • To carry that forward, also apply formatting to exif metadata, which is controlled on wiki (For example, commons links the camera name to a wikipedia article)
  • If we can't extract info from the description page, but the file has the author tagged in exif/XMP/iptc metadata, we should use that as a fallback.
  • Ideally such a system would be as commons-inspecific as possible, with the commons and non-commons part separated.
  • Commons description pages have multilingual descriptions. Lots of users probably just want one language.
    • In this implementation, it applies per language conventions to dates and things. Additionally for explicitly multi-lingual fields (description), there is an option to return all, or just a single language. Even in single language mode, some things are still language specific (like the thousands seperator on numbers)

Installation[edit | edit source]

  • Download and extract the file(s) in a directory called CommonsMetadata in your extensions/ folder. If you're a developer and this extension is in a Git repository, then instead you should clone the repository using:
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CommonsMetadata.git
  • Add the following code at the bottom of your LocalSettings.php:
require_once "$IP/extensions/CommonsMetadata/CommonsMetadata.php";
  • Done! Navigate to "Special:Version" on your wiki to verify that the extension is successfully installed.

In a setup where there is a local wiki and a remote image repository, for optimal results CommonsMetadata should be installed on the remote (or both). When installed on the local wiki only, it will still provide some extra information about remotely hosted images, but not as much as it would the other way around.

Warning Warning: If you're developing or testing this extension, we do NOT suggest you copy the Commons templates for image metadata, as they take extremely long to compile and have complicated dependencies like Scribunto. Instead, get an expanded version that has only wikitext/HTML and put in the various parameter references (or don't) manually. You can find an example (to be used with Special:Import) here.

Usage[edit | edit source]

Use the imageinfo API, and include extmetadata as an image info property specified via iiprop.

Example usage:

https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&format=xml&iiprop=extmetadata&iilimit=10&titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

View this example in the API sandbox:

https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&prop=imageinfo&format=xml&iiprop=extmetadata&iilimit=10&titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

Returned data[edit | edit source]

The extension currently provides the following items in the extmetadata field of the response (the field names were chosen, where possible, to follow the IPTC-IIM format used in EXIF headers):

Data based on machine-readable data in the Information template:

  • ImageDescription - image description
  • Artist/Credit - authorship information
  • DateTimeOriginal - time of creation
  • ObjectName - title (for a book/painting)

Data based on machine-readable data in the Location template:

  • GPSLatitude - latitude
  • GPSLongitude - longitude

Data based on machine-readable data in the license template:

  • LicenseShortName - short human-readable license name
  • LicenseUrl
  • UsageTerms
  • Copyrighted - True or False (for public domain images)

For multi-licensed images these values are currently unreliable.

Other data:

  • CommonsMedadataExtension - this is just a convenient way of testing that the extension is installed
  • License - a best guess at the license of the image (mostly for internal use by MediaViewer, might change; LicenseShortName is probably more reliable)
  • Categories - a |-separated list of the categories of the image. (this is mostly broken at the moment)

See also[edit | edit source]