Extension:CommonsMetadata/ja

CommonsMetadata拡張機能はウィキメディア・コモンズのページからメタデータを抽出する試みで、ウィキメディアの他のどのプロジェクトでも同じデータを入手できます. 画像情報API（imageinfo API）に情報を追加する基準として、画像の説明文にあるテンプレートやカテゴリを使います. 取得したメタデータはさまざまな拡張機能やツールで利用すると、ライトボックスあるいは画像選択のダイアログが改善されます（例：マルチメディアビューアー、VisualEditor、MobileFrontend、Mobile-Content-Service (MCS)）.

拡張機能の現在の形式は、暫定的な対策を意図したもので、のちにウィキデータのメディア情報（Wikidata on Commons）と交換されました.

動機とデザインの選択肢
Wikitechのメーリングリストの過去号も参照

この拡張機能には次のような目的があります.


 * 将来的には、コモンズのメタデータを扱う役割はウィキデータに変更されます. 壊滅的な変更ですぐに再び修正する必要がないように、拡張機能はコモンズのメタデータを現在と同様に（新しいパーサ関数を導入しないで）扱う必要があります. したがって、スクリーンスクレイピングを採用します.


 * The content of many of the fields on a commons description page include rich formatting (In particular: Links, italics, bold. In some cases more complex things like embedded images)
 * As a result, extension outputs parsed html (wikitext sucks, plain text doesn't capture the data)
 * Futhermore, the data tends to be formatted for human display, rather than (for example) machine formatted dates. When the date field says something like "circa 1600s", its hard to convert that to a precise date (otoh, many examples can be).
 * To carry that forward, also apply formatting to exif metadata, which is controlled on wiki (For example, commons links the camera name to a wikipedia article)


 * If we can't extract info from the description page, but the file has the author tagged in exif/XMP/iptc metadata, we should use that as a fallback.
 * Ideally such a system would be as commons-inspecific as possible, with the commons and non-commons part separated.


 * Commons description pages have multilingual descriptions. Lots of users probably just want one language.
 * In this implementation, it applies per language conventions to dates and things. Additionally for explicitly multi-lingual fields (description), there is an option to return all, or just a single language. Even in single language mode, some things are still language specific (like the thousands seperator on numbers)

テスト
When testing with remote images (e.g. Commons images if you have enabled ), you can set   to force CommonsMetadata to parse the description page of the image and extract the metadata (normally, if the remote repository had CommonsMetadata installed as well, it would just copy the API output from there).

使用法
Use the imageinfo API, and include  as an image info property specified via.

使用例:


 * https://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=extmetadata&iilimit=10&titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

View this example in the API sandbox:


 * https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&prop=imageinfo&format=json&iiprop=extmetadata&iilimit=10&titles=File%3ACommon%20Kingfisher%20Alcedo%20atthis.jpg

戻り値
The extension currently provides the following items in the  field of the response (the field names were chosen, where possible, to follow the IPTC-IIM format used in EXIF headers):

Data based on machine-readable data in the Information template:
 * ImageDescription - image description
 * Artist - author name (might contain complex HTML, multiple authors etc)
 * Credit - source
 * DateTimeOriginal - time of creation (space-separated ISO 8601 timestamp whenever possible, but can be any other textual description of a date, possibly with HTML mixed in)
 * ObjectName - title (for a book/painting; otherwise just the file name)
 * Permission - contents of the Permission field of the template. Can be a lot of things (license template, OTRS id, details on how to attribute...)
 * AuthorCount - the number of templates with authors (e.g. Book, Photograph...). The number of actual authors might be higher if a template describes multiple authors in a single string.

Data based on machine-readable data in the Location template:
 * GPSLatitude - latitude
 * GPSLongitude - longitude
 * GPSMapDatum - coordinate type (only  supported for now)

Data based on machine-readable data in the license template: For multi-licensed images these values are currently unreliable.
 * LicenseShortName - short human-readable license name
 * LicenseUrl
 * UsageTerms
 * Copyrighted -  or   (for public domain images)
 * Attribution - custom attribution that should replace Artist + Credit (can also originate from the Information template)
 * AttributionRequired - booleanish (T86726), tells whether there is a legal requirement to attribute
 * NonFree - booleanish, true means the image is not under a free license. (Used for non-Commons images only.)

Other data:
 * CommonsMedadataExtension - contains the metadata parser version number; mostly for internal use
 * License - a best guess at the license of the image (mostly for internal use by MediaViewer, might change; LicenseShortName is probably more reliable)
 * Categories - a -separated list of the categories of the image.
 * Assessments - a -separated list of the assessments of the image (currently five values are supported: poty, potd, featured, quality, valued). Based on parsing category names, probably won't work for images not hosted on Commons.
 * Restrictions - reuse restrictions such as trademarks or personality rights; an array of keywords (the class names from this table, without the  prefix). See also the restrict-* icons in MediaViewer.
 * DeletionReason - if set, the template is being considered for deletion. (Based on the nuke template, probably not reliable outside Commons.) It contains a deletion reason, but it is phrased to be applicable for a log entry, so it might be misleading (e.g. past tense when actually it is not yet decided whether the image will be deleted).

関連項目

 * ファイルページのメタデータ修正/メタデータの修正方法
 * Manual:File metadata handling – ファイルのメタデータの扱い方
 * マルチメディアビューワーはCommonsMetadataから取得する情報を主に利用しています.
 * Request for comment on handling image information – コメント募集中：画像の情報について
 * Template detection on local wikis with locally uploaded files – Describes how to prepare the templates for fetching metadata and thus displaying them when using the MultimediaViewer extension.