Extension:WikibaseLexeme/RDF mapping

This is a discussion on how to represent in RDF the Wikibase Lexeme data model. It is based on the RDF dump format.

When relevant it proposes to reuse the LEMON model by the Ontolex W3C community group.

TODO: review the use of terms from externals vocabularies and maybe reach the Ontolex/LEMON people for feedback.

Lexeme
Possible example:

Comments:


 * Classes
 * The lexeme concept of Wikibase aligns well with . Having a class  would be convenient for consistency with  < and  . It would be meaningful to have   in the ontology definition (use   seems a bad idea because it would mean that all   are.


 * Lemma
 * The closest Lemon relation is but its range is . So, it seems like a bad idea to use it. The simplest thing to do is to use   just like item (and maybe also   and  ). Or, if we prefer to have a more specific relation, introduce a property.


 * Language
 * The property is already used but with range language code and not item. So it is probably not a good idea to reuse it here. The Lemmon document examples are using the Dublin Core   property but it means to reuse an extra vocabulary. The third option is to introduce a   property.


 * Lexical category
 * The Lemon model is using  for a similar usage (but restricted to parts of speech). This is not from the lemon model but from an extension lexinfo. It seems safer to use our own property.


 * Statements
 * For consistency and simplicity use the same schema as for items and properties


 * Forms
 * The relation between Lexemes and Forms matches the . See the Form section for forms representation.


 * Senses
 * The relation between Lexemes and Forms matches the . See the Sense section for forms representation.

Form
Possible example:

Comments:


 * Classes
 * The form concept of Wikibase aligns well with.


 * Representation
 * Lemmon provides and its sub property . It could be useful to use   in addition for allowing to get a label of any Wikibase entity using the same RDF property. Using  instead of  do not forbid to allow representations in phonetic variants of languages but the Lemmon specification recommends to not use  directly.


 * Grammatical Features
 * It seems there is no property for this in Lemmon with for domain. So we should probably use a custom property like.


 * Statements
 * For consistency and simplicity use the same schema as for items and properties

Sense
Possible example:

Comments:


 * Classes
 * The form concept of Wikibase aligns well with.


 * Gloss
 * Lemmon suggests to use  to provide gloss. It seems nice to use it because we already use SKOS for items labels and aliases. It could be useful to use   in addition for allowing to get a label of any Wikibase entity using the same RDF property even if a gloss is not really a label.


 * Statements
 * For consistency and simplicity use the same schema as for items and properties

Data node
Example:

For each Lexeme a data node should be returned with the URI  if the Lexeme is. It should use the same schema as for Wikibase items and properties data node. It could also provide some statistics based on page properties just like items.

Note: There is no specific data node for forms and senses because the granularity of data nodes is the data container (wiki page). It is not a strong limitation because it is easy to retrieve the data node of the Lexeme they belong to with the property path  or.

Related work

 * the Wiktionary version of DBpedia