Extension:WikibaseLexeme/Data Model

This is a living document, describing the conceptual data model used by WikibaseLexeme. It is not a specification of any concrete binding, implementation, mapping, or serialization.

The data model of WikibaseLexeme describes the structure of the data that is handled as Lexemes in Wikibase. In particular, it specifies which kind of information users can contribute to the system. The data model is conceptual ("Which information do we have to support?") and does not specify how this data should be represented technically ("Which data structures should the software use?") or syntactically ("How should the data be expressed in a file?"). Separate documents describe the serialization of the Wikibase data model in JSON and in RDF (Resource Description Framework).

The Lexeme data model is based on the Wikibase data model. The Wikidata glossary and the Wikibase data model primer may be helpful in understanding this document. The Lexeme data model aims to align with the LEMON model where useful and practical. However, in the spirit of Wikibase, the Lexeme model is designed to be simple and flexible enough for casual collaborative editing, as opposed to the more formalized approach taken by LEMON.

Motivation
The Wikibase Lexeme extension provides improved modeling for lexical entities such as words and phrases. While it would be theoretically possible to model these things using Items, a more expressive specialized model helps to reduce complexity, and improve re-use and mappings to other vocabularies.

Outline

 * 1 Lemma (mostly for display purposes, e.g. infinitive form)
 * 1 Lexical category (e.g. verb, noun, etc., from Item space)
 * 1 Language (e.g. English, German, etc., from Item space)
 * Multiple Forms, each with
 * 1 Representation (the actual string)
 * Multiple Grammatical markers
 * Multiple Statements (e.g. region, period, pronunciation, etc.)
 * Multiple Senses
 * 1 Gloss per language (=definition)
 * Multiple Statements (e.g. translations, synonyms, connotation, register, usage example, refers-to-concept)
 * Multiple Statements (e.g. derived-from, pronunciation, region, period, etc.)