Wikibase/Indexing/RDF Notes

RDF is a WC3 spec for describing knowledge in triples that look like: wd:Q23 wd:P509 wd:Q356405. In English yeetthat is "George Washington's (Q23) cause of death (P509) is bloodletting (Q356405)". Technically RDF species the knowledge described as triples. It is usually written in a format called Turtle which looks like the example above. Turtle has some shortenings: wd:Q23 wd:P509 wd:Q356405 ; wd:P106 wd:Q82955 , wd:Q189290 , wd:Q131512 , wd:Q1734662.

In English that is "George Washington's (Q23) place of birth is Westmoreland County (Q356405) and he held the occupations (P106) of politician (Q82955), officer (Q189290), farmer (Q131512) and cartographer (Q1734662)." The  means "another triple about the same subject" and   means "another triple about the same subject with the same predicate". This syntax is reasonably terse which still being obvious to a computer and even mostly human readable. Turtle is pretty good.

You've probably noticed all those s.  RDF feels like it borrow's XML's namespace concept but, at least in Turtle, the syntax isn't so hideous. Here is what that first example would look like with the namespace properly specified: PREFIX wd:  wd:Q23 wd:P509 wd:Q356405.

Here is an example strait from the RDF primer: BASE   PREFIX foaf:  PREFIX xsd:  PREFIX schema:  PREFIX dcterms:  PREFIX wd:   a foaf:Person ; foaf:knows  ; schema:birthDate "1990-07-04"^^xsd:date ; foaf:topic_interest wd:Q12418. wd:Q12418 dcterms:title "Mona Lisa" ; dcterms:creator .  dcterms:subject wd:Q12418.

There is a whole bunch of stuff to notice here:
 * The primer references Wikidata. Neat.
 * Its perfectly ok to mix and match prefixes.
 * Some of these prefixes are actually pretty standard. foaf, xsd, dcterms are things I've seen a whole bunch.  schema.org is new to me but seems pretty sane.
 * They use a difference indentation style than I was using. I believe I got that indentation style from other places on the web.  Meh, I dunno what is canonical yet.
 * The  style syntax - that on in particular means .  Its how the BASE works.  I'm not sure why you'd want this syntax over a prefix.
 * Note that when you use the  syntax that doesn't prefix the BASE because the URL/URI/IRI is not relative.
 * The  syntax is syntactic sugar for   which I believe to be the same as   in meaning.

Observation: we'll probably be able to build some kind of rewrite rules from the standard prefiex (foaf, etc) to wd prefixes. Or infer them somehow if the relationship isn't 100% the same.

How we represent Wikidata
There exists a paper about representing Wikidata in RDF form and an implementation in the form of the Wikidata Toolkit. I'm going to provide some examples rewritten using some common Turtle's PREFIXes: PREFIX wd:  PREFIX wdo:  PREFIX schema:  PREFIX sco:  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX prov: <http://www.w3.org/ns/prov#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

Site links, descriptions, aliases, and labels are pretty darn clear: wd:Q80 a wdo:Item ; wdo:label "Tim Berners-Lee"@en ; schema:description "izumitelj World Wide Weba"@hr ; sco:altLabel "TimBL"@pt-br. <http://es.wikipedia.org/wiki/Tim_Berners-Lee> a wdo:Article ; schema:about wd:Q80 ; schema:inLanguage "es".

Statements are a bit more cumbersome: wd:Q23 wd:P509s wd:Q23Sce976010-412f-637b-c687-9fd2d52dc140. wd:Q23Sce976010-412f-637b-c687-9fd2d52dc140 rdf:type wdo:Statement ; wd:P509v wd:Q356405.

This means the same thing (George Washington died of bloodletting) as the original example: wd:Q23 wd:P509 wd:Q356405.

Its verbosity makes more sense when you think about qualifiers and references. Here is "George Washington held the rank of General of the Armies of the United States at 4 July 1976 as proven by an image on commons": wd:Q23 wd:P410s wd:Q23Sb981a673-4869-64c4-04bc-7c95847e042f. wd:Q23Sb981a673-4869-64c4-04bc-7c95847e042f rdf:type wdo:Statement ; wd:P410v wd:Q3100539 ; wd:P585q wd:VT65ac3895ffbc098161d0deb8fcb276ea ; prov:wasDerivedFrom wd:R0de5da620b46a53a4c10651344f20977. wd:VT65ac3895ffbc098161d0deb8fcb276ea rd:type wdo:TimeValue ; wdo:time "1976-07-04"^^<xsd:date> ; wdo:timePrecision 11 ; wdo:preferredCalendar wd:Q1985727. wd:R0de5da620b46a53a4c10651344f20977 rdf:type wdo:reference ; wd:P18r <http://commons.wikimedia.org/wiki/File:Orders_31-3.jpg>.

This much, much more information than: wd:Q23 wd:P410 wd:Q3100539.

On the other hand with RDR (a BigData supported extension of RDF) you could represent a bunch of that data like this: wd:Q23 wd:P410 wd:Q3100539. <<wd:Q23 wd:P410 wd:Q3100539>> wd:P585q "1976-07-04"^^<xsd:date> ; wd:P18r <http://commons.wikimedia.org/wiki/File:Orders_31-3.jpg>. If you were willing to lose data about the

Or, well, something like that. There are so many ways you could slice it. I imagine you could slice it like this if you didn't want to lose any data: wd:Q23 wd:P410 wd:Q3100539. <<wd:Q23 wd:P410 wd:Q3100539>> wd:P585q wd:VT65ac3895ffbc098161d0deb8fcb276ea ; wd:P18r <http://commons.wikimedia.org/wiki/File:Orders_31-3.jpg>. wd:VT65ac3895ffbc098161d0deb8fcb276ea rd:type wdo:TimeValue ; wdo:time "1976-07-04"^^<xsd:date> ; wdo:timePrecision 11 ; wdo:preferredCalendar wd:Q1985727.

In this case all of the data is still but it loses out on the rd:type stuff.