Amsterdam Hackathon 2014/Topics/Artworks import from Wikimedia Commons

From mediawiki.org

The goal is to import artworks in Wikidata from Wikimedia Commons files with the artwork template.

Wikimedia Commons extraction[edit]

An extraction of Wikimedia Commons files with the template Artworks has been done by a Wikimedian in september 2014.

The file: http://zone47.com/div/artworks0.zip

Result: 190.000 files with 105 different properties

Table of the 105 properties with occurences: http://framacalc.org/1u7az8lted

Preparation for Wikidata[edit]

For a wikidata import:

  • remove useless properties (detail, review, informations on the file...)
  • merge redundant properties (artist-creator..)

Result: new version with 27 properties.

The file: http://zone47.com/div/artworks.zip

Table of the 27 properties: http://framacalc.org/8pewk3wre2

Issues[edit]

  • Fields values are heterogeneous
  • Artworks still in Wikidata and another with artworks with two or more occurences in the table.

Options[edit]

At this point, many options. Some proposals:

  1. Global processing on fields (example on date)
  2. Division in lots (by institution?)
  3. First option 1 for some properties then option 2.
  4. Extract files with well formed metadata first

Interested[edit]