Extension:WikibaseLexeme/Data Model/cs

This is a living document, describing the conceptual data model used by WikibaseLexeme. It is not a specification of any concrete binding, implementation, mapping, or serialization.


 * Lexeme:
 * Lemma
 * Language
 * Lexical category
 * Statements
 * Forms:
 * Representation
 * Grammatical Features
 * Statements
 * Senses:
 * Gloss
 * Statements

Tato stránka popisuje strukturu dat, která jsou softwarem Wikibase zpracovávána jako lexémy, což jsou například slova a fráze. Přestože by teoreticky bylo možné taková data reprezentovat za pomocí běžných položek, specializovaný datový model je jednodušší, zvyšuje využitelnost ukládaných dat a zjednodušuje propojení s dalšími slovníky. Tento datový model je abstraktní ("Která data musí být podporována?") a nespecifikuje jakým způsobem by měla být tato data reprezentována technicky ("Které datové struktury by měl software používat?") ani syntakticky ("Jak by měla být data vyjádřena v souboru?"). Další dokumenty popisují serializaci datového modelu Wikibase ve formátech JSON a RDF (Resource Description Framework). Datový model lexémů také definuje základní koncepty a vztahy nutné k popisu lexémů, které tvoří pevně danou ontologii. Tato ontologie zajišťuje minimální strukturu, která umožňuje použití položek a výroků k podrobnému modelování lexému. Specifikace datového modelu lexémů je založena na datovém modelu Wikibase, slovník Wikidat a slabikář datového modelu Wikidat Vám tedy mohou pomoci porozumět této stránce. Tam kde je to užitečné a praktické usiluje datový model Lexémů o soulad s modelem LEMON vytvořeným komunitní skupinou W3C Ontolex. V duchu Wikibase je však datový model lexémů navrhnut tak, aby byl jednoduchý a dostatečně flexibilní pro kolaborativní editaci oproti formálnějšímu přístupu užívaného modelem LEMON.

Lexeme


A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:


 * An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. . These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
 * A Lemma for use as a human readable representation of the lexeme, e.g. "run".
 * The  Language  to which the lexeme belongs. This is a reference to a concrete Item, e.g. Q1860 for English.
 * The Lexical category  to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. Q34698 for adjective.
 * A list of Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)
 * A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense.
 * A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank).

Lemma
The lemma is a human readable representation of the lexeme (see Lemma on Wikipedia). Typically, the canonical form of the lexeme (e.g. the infinitive form of verbs) will be used as the lemma (see also lemon:canonicalForm). Lemmas are not simple strings, but MultilingualTextValues, since the same lemma may have multiple spellings. This is specially important for languages that use multiple scripts such as Serbian and Japanese.

A Lemma cannot be entirely empty, at least one variant has to be provided.

Note: Lemmas are not unique, nor is the combination of Lemma, Language, and Lexical category. Two distinct lexemes with the same lexical category can exist in the same language if they have different data, it may be gender, etymology, morphology (different forms), and so on.

Form
The morphology of the lexeme is understood as a set of Forms. Each form defines how a lexeme changes based on a specific syntactic role or mode it may take in a sentence (see also lemon:Form).

A Form is described using the following information:


 * An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. . These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Form.
 * A representation, spelling out the Form as a string.
 * A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. Q814722 for participle.
 * A list of Statements further describing the Form or its relations to other Forms or Items (e.g. pronunciation audio, rhymes with, used until, used in region)

Representation
A form's Representation is its written form, as used in a text (compare lemon:writtenRep). Just like Lemmas, Representations are not simple strings, but MultilingualTextValues, since the same form may have multiple spellings, possibly in multiple scripts.

A Representation cannot be entirely empty, at least one variant has to be provided.

Grammatical Feature
A form's grammatical features specify under which conditions or in which syntactic role that form is used (see lexinfo:morphosyntacticProperty and grammatical category on Wikipedia). Multiple grammatical features can be combined to express under which conditions the language's grammar requires a given form to be used. Grammatical features are represented as references to Items.

Sense
The senses of a lexeme are different meanings which it may represent in a text. The senses are given as natural language definitions or glosses (compare intensional definitions on Wikipedia).

A sense is described using the following information:
 * An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. . These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
 * A Gloss, defining the meaning of the Sense using natural language.
 * A list of Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

Gloss
A sense's gloss gives a natural definition of the sense (see Gloss on Wikipedia and skos:definition). Similar to Lemmas, Glosses are not simple strings, but MultilingualTextValues. However, the reason is not providing support for variants, but to allow the gloss to be given in entirely different languages. E.g. it would be quite useful for a German learning French to have a German gloss for a French word.

A Gloss cannot be entirely empty, at least one language has to be provided.