EMWCon Spring 2016/Provenance working group

From mediawiki.org

The Provenance working group (Yingjie, BlaueBlüte) at EMWCon Spring 2016 developed some ideas on tracking provenance of data stored in Semantic-MediaWiki installation and on how to make use of such metadata.

Goals[edit]

  • Enable users to identify trustworthiness of data in SMW.
  • Capture the Where–When–Who of data to facilitate (content) management.
  • Enable analysis of history of sematic data—lineage.
  • Track connection between “classes” and “instances” over time—has the definition of a “class” changed after it was instantiated?

Use Facets[edit]

  • Define provenance metadata along with, e.g., property values.
  • View provenance data alongside page display, query results, etc.
  • Use provenance data in queries, e.g. to restrict queries based on trustworthiness.

Sources of provenance data[edit]

  • wiki-internal
    • contributors
    • edit timestamps
  • external
    • editor-provided, like references
  • hybrid (?)
    • external ratings of contributors
    • external ratings of individual pages (e.g., page rank?)

Implementiation Ideas[edit]

Strategies[edit]

  • Amending SMW syntax
  • Subobjects

Example[edit]

This shows how external provenance data could be defined:

'''[[Name::John Doe]]''' was born on [[Has birthdate::1978-03-12|ref=personal website|refurl=http://www.johndoe.me/]] and currently works for [[Has employer::NASA|ref=Humanity’s Quest to Reach Mars, The New York Times, May 27, 2016|refbrief=New York Times|refdate=2016-05-27]].


...or:

'''[[Name::John Doe]]''' was born on {{Provenance template|property=Has birthdate|value=1978-03-12|ref=personal website|refurl=http://www.johndoe.me/}} and currently works for {{Provenance template|property=Has employer|value=NASA|ref=Humanity’s Quest to Reach Mars, The New York Times, May 27, 2016|refbrief=New York Times|refdate=2016-05-27}}.

A query could then look like this:

{{#ask:
 [[Category:Person]]
 [[Has Employer::NASA]] 
 |?Name
 |?Has birthdate
 |newer_than=2015-01-01
 |show_unreliable=yes
 |show_provenance=yes
}}

This query might then return something like:

  Name Date of Birth Ref Last Edit
John Doe John Doe March 12, 1978[old!] New York Times 2 days ago
Jane Doe missing before January 1, 2015