Extension:WikibaseLexeme/RDF mapping

From mediawiki.org

This is the specification of the RDF mapping of the Wikibase Lexeme data model. It is based on the Wikibase RDF dump format. If not stated otherwise the prefixes are defined by this document. When relevant it reuses the LEMON model by the Ontolex W3C community group.

Lexeme[edit]

Example:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .

wd:L64723 a wikibase:Lexeme , ontolex:LexicalEntry ;
     # lemma
     wikibase:lemma "hard"@en ;
     rdfs:label "hard"@en ;

     # language
     dct:language wd:Q1860 ;

     # lexical category
     wikibase:lexicalCategory wd:Q34698 ;

     # statements
     wdt:P2 wd:Q3 ;
     wdt:P7 "value1" , "value2" ;
     p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
     p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
          wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c ;

     # forms
     ontolex:lexicalForm wd:L64723-F1 ;

     # senses
     ontolex:sense wd:L64723-S1 .

Comments:

Classes
The lexeme concept of Wikibase aligns well with ontolex:LexicalEntry. A class wikibase:Lexeme is also used for consistency with wikibase:Item and wikibase:Property.
Lemma
We use the custom property wikibase:lemma. The closest lemon relation is ontolex:canonicalForm but its range is ontolex:Form. Using wikibase:lemma has instead of the generic rdfs:label just like item (and maybe also schema:name and skos:prefLabel) has the advantage of not having lexemes appearing in existing SPARQL queries that are using rdfs:label and allows to easily query only lexemes by label with just one triple pattern.
Language
We use the the Dublin Core language property just like lemon examples. We are not reusing directly schema:inLanguage because it is already used for Wikibase sitelinks representation with a BCP 47 language code range. It is planned but not implemented yet to emit this schema:inLanguage property as a derived value with as value the BCP 47 language code of the language when it exists.
Lexical category
We use our own wikibase:lexicalCategory property in order to avoid a slight abuse of the lexinfo:partOfSpeech from the lexinfo lemon extension that is restricted to parts of speech.
Statements
For consistency and simplicity we use the same schema as for items and properties.
Forms
The relation between Lexemes and Forms uses the ontolex:lexicalForm relation. See the Form section for forms representation.
Senses
The relation between Lexemes and Forms uses the ontolex:sense relation. See the Sense section for forms representation.

Form[edit]

Example:

@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .

wd:L64723-F1 a wikibase:Form , ontolex:Form ;
     # representation
     ontolex:representation "hard"@en ;
     rdfs:label "hard"@en ;

     # grammatical features
     wikibase:grammaticalFeature wd:Q1234 , wd:Q2345 ;

     # statements
     wdt:P2 wd:Q3 ;
     wdt:P7 "value1" , "value2" ;
     p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
     p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
          wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .

Comments:

Classes
The form concept of Wikibase aligns with ontolex:Form. The additional class wikibase:Form is also used.
Representation
We use the ontolex:representation relation from lemon. We do not use its sub property ontolex:writtenRep in order to not forbid representations in phonetic variants of languages even if the lemon specification recommends to not use ontolex:representation directly.rdfs:label is also emitted for interoperability reasons.
Grammatical Features
We use a custom property wikibase:grammaticalFeature because there is no such relation in lemon with ontolex:Form for domain.
Statements
For consistency and simplicity we use the same schema as for items and properties.

Sense[edit]

Example:

@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .

wd:L64723-S1 a wikibase:Sense , ontolex:LexicalSense ;
     # gloss
     skos:definition "presenting difficulty"@en ;
     rdfs:label "presenting difficulty"@en ;

     # statements
     wdt:P2 wd:Q3 ;
     wdt:P7 "value1" , "value2" ;
     p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
     p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
          wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .

Comments:

Classes
The sense concept of Wikibase aligns with ontolex:LexicalSense. The additional class wikibase:Sense is also used.
Gloss
We use skos:definition to provide gloss following Lemon usage. rdfs:label is also emitted for interoperability reasons even if a gloss is not really a label.
Statements
For consistency and simplicity we use the same schema as for items and properties.

Data node[edit]

Example:

 wdata:L64723 schema:version "59"^^xsd:integer ;
     schema:dateModified "2015-03-18T22:38:36Z"^^xsd:dateTime ;
     a schema:Dataset ;
     schema:about wd:L64723 .

For each Lexeme a data node should be returned with the URI wdata:L1 if the Lexeme is wd:L1. It should use the same schema as for Wikibase items and properties data node. It could also provide some statistics based on page properties just like items.

Note: There is no specific data node for forms and senses because the granularity of data nodes is the data container (wiki page). It is not a strong limitation because it is easy to retrieve the data node of the Lexeme they belong to with the property path schema:about/ontolex:lexicalForm or schema:about/ontolex:sense.

Wikidata Query Service[edit]

Wikidata Query Service does not provide the following features (mostly for performance reasons):

  • The wikibase:Lexeme, wikibase:Form and wikibase:Sense classes.
  • The rdfs:label relations (more specific equivalents exists for lexemes, forms and senses).
  • Just as for items and properties, the data node is integrated within the wd: node.

Related work[edit]