Wikidata Query Service/Blank Node Skolemization

From mediawiki.org
Jump to navigation Jump to search

For information about skolemization in the RDF context please read RDF 1.1 Concepts - 3.5 Replacing Blank Nodes with IRIs.

Why skolemizing the blank nodes?[edit]

As part of the work to improve the performance of the Wikidata Query Service update process we decided to go with a patch approach. In the same vein as what was proposed in rdf-patch or TurtlePatch the idea is to mutate the graph with a set of trivial INSERT DATA and DELETE DATA operations. This is where blank nodes can't be used within these operations because they are by nature unidentifiable. By skolemizing the blank nodes we give an identity to the blank nodes and allow to apply such mutations on any triple store.

How does this affect my SPARQL query?[edit]

Queries using isBlank()[edit]

Queries using isBlank(?o) will stop functioning and have to be rewritten using the wikibase:isSomeValue(?o) function.

SELECT ?human WHERE {
  ?human wdt:P21 ?gender .
  FILTER isBlank(?gender)
}

Must be rewritten with:

SELECT ?human WHERE {
  ?human wdt:P21 ?gender .
  FILTER wikibase:isSomeValue(?gender)
}
To ease the transition, wikibase:isSomeValue is already usable on WDQS and will work even if blank nodes have not yet been skolemized on this service.

Queries using isIRI()[edit]

The skolem form being an IRI the use of isIRI() might conflate SomeValue nodes. To eliminate possible ambiguities !wikibase:isSomeValue(?o) can be used:

select ?entity ?id {
  ?entity wdt:P2520 ?id .
  FILTER isIRI(?id)
} LIMIT 10

can be rewritten as:

select ?entity ?id {
  ?entity wdt:P2520 ?id .
  FILTER(isIRI(?id) && !wikibase:isSomeValue(?id))
} LIMIT 10

Form of the skolem IRI in results[edit]

The form of the IRI will be compliant with the RDF recommendations for example: http://www.wikidata.org/.well-known/genid/a8d14fa93486370345412093add8f50c

These IRIs will now replace the t9283749 in the result sets.

SELECT ?human ?someValue WHERE {
  ?human wdt:P21 ?someValue .
  FILTER wikibase:isSomeValue(?someValue)
} LIMIT 2

instead of returning:

Blank nodes
human someValue
wd:Q10613691 t38348832
wd:Q15626781 t38348832

will return:

Skolem IRIs
human someValue
wd:Q10613691 http://www.wikidata.org/.well-known/genid/85cdf09ea8537248cb28182c131b623f
wd:Q15626781 http://www.wikidata.org/.well-known/genid/2a4afecd4ba3d3bbeb35e32a19fd179d

Changes to the RDF model (RDF dumps and Special:EntityData)[edit]

In order to limit the differences between what is served by Wikidata Query Service and the RDF representation of wikidata entities the RDF model used in the RDF dumps and Special:EntityData may change to include skolem IRIs instead of labelled blank nodes.

For example a statement including a SomeValue snak will be changed from:

 wd:Q3 a wikibase:Item, wdt:P2 _:e39d2a834262fbd171919ab2c038c9fb .
 wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ;
     ps:P2 _:fd30b9e2840921156210596f03414b05 ;
     wikibase:rank wikibase:NormalRank .

to

 wd:Q3 a wikibase:Item, wdt:P2 <http://www.wikidata.org/.well-known/genid/e39d2a834262fbd171919ab2c038c9fb> .
 wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ;
     ps:P2 <http://www.wikidata.org/.well-known/genid/fd30b9e2840921156210596f03414b05> ;
     wikibase:rank wikibase:NormalRank .

The skolemization function is trivial as it reuses the blank node label for the skolem IRI suffix. Note that blank node labels as generated by wikibase now allow to retain the identity of the blank node.

For consumers willing to stick to blank nodes semantic the function to generalize the skolemized graph is also trivial as all well known IRIs prefixed with http://www.wikidata.org/.well-known/genid/ can be transformed back to blank nodes labelled with the suffix of the skolem IRIs.

In other words for:

  • G: a Wikidata graph or subgraph containing properly labelled blank nodes
  • sk the skolemization function described above
  • unsk the function described above that transforms skolem IRIs back to blank nodes

It is guaranteed that G = unsk(sk(G)).

When this breaking change is applied (following proper announcement made to the wikidata mailing lists) RDF dumps and Special:Entity will start to emit G′ where G′ = sk(G).

The specification of the RDF model will be changed accordingly.