Amsterdam Hackathon 2014/Topics/Artworks import from Wikimedia Commons

The goal is to import artworks in Wikidata from Wikimedia Commons files with the artwork template.

Wikimedia Commons extraction
An extraction of Wikimedia Commons files with the template Artworks has been done by a Wikimedian in september 2014.

The file: http://zone47.com/div/artworks0.zip

Result: 190.000 files with 105 different properties

Table of the 105 properties with occurences: http://framacalc.org/1u7az8lted

Preparation for Wikidata
For a wikidata import: Result: new version with 27 properties.
 * remove useless properties (detail, review, informations on the file...)
 * merge redundant properties (artist-creator..)

The file: http://zone47.com/div/artworks.zip

Table of the 27 properties: http://framacalc.org/8pewk3wre2

Issues

 * Fields values are heterogeneous
 * Artworks still in Wikidata and another with artworks with two or more occurences in the table.

Options
At this point, many options. Some proposals:
 * 1) Global processing on fields (example on date)
 * 2) Division in lots (by institution?)
 * 3) First option 1 for some properties then option 2.
 * 4) Extract files with well formed metadata first

Interested

 * Shonagon (talk) 01:14, 13 November 2014 (UTC)