Extension:WikibaseLexeme/Data Model/ko

This is a living document, describing the conceptual data model used by WikibaseLexeme. It is not a specification of any concrete binding, implementation, mapping, or serialization.


 * Lexeme:
 * Lemma
 * Language
 * Lexical category
 * Statements
 * Forms:
 * Representation
 * Grammatical Features
 * Statements
 * Senses:
 * Gloss
 * Statements

The data model of WikibaseLexeme describes the structure of the data that is handled as "Lexemes" in Wikibase, such as words and phrases. While it would be theoretically possible to model these things using Items, a more expressive specialized model helps to reduce complexity, and improve re-use and mappings to other vocabularies. This data model is conceptual ("Which information do we have to support?") and does not specify how this data should be represented technically ("Which data structures should the software use?") or syntactically ("How should the data be expressed in a file?"). Separate documents describe the serialization of the Wikibase data model in JSON and in RDF (Resource Description Framework). The Lexeme data model defines basic concepts and relationships needed to describe lexemes, which act as a fixed ontology. This ontology provides a minimal scaffolding that allows Items and Statements to be used for detailed modeling of a lexeme. The specification of the Lexeme data model is based on the Wikibase data model, so the Wikidata glossary and the Wikibase data model primer may be helpful in understanding this document. The Lexeme data model aims to align with the LEMON model by the Ontolex W3C community group, where useful and practical. However, in the spirit of Wikibase, the Lexeme model is designed to be simple and flexible enough for casual collaborative editing, as opposed to the more formalized approach taken by LEMON.

어휘소


A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. 어휘소는 다음과 같은 정보를 담습니다.


 * An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. . These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
 * A Lemma for use as a human readable representation of the lexeme, e.g. "run".
 * The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. Q1860 for English.
 * The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. Q34698 for adjective.
 * A list of Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function).
 * A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense.
 * A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank).

Instance of
In Wikidata they generally use the most general lexical category possible, e.g. affix and then instead describe which type of affix it is using an instance of-statement.

Usage examples
In Wikidata the community decided to have usage examples in one place on the lexeme because then they know where to look for them. They have 2 demonstrates properties d:Property:P5830 and d:Property:P6072 to link to the correct sense and form. They can have multiple examples from different time periods e.g. different centuries and for formality/informality and written/spoken.

기본형
The lemma is a human readable representation of the lexeme (see Lemma on Wikipedia). Typically, the canonical form of the lexeme (e.g. the infinitive form of verbs) will be used as the lemma (see also lemon:canonicalForm). Lemmas are not simple strings, but MultilingualTextValues, since the same lemma may have multiple spellings. This is specially important for languages that use multiple scripts such as Serbian and Japanese.

기본형은 적어도 하나 이상은 표기하여야 합니다.

Note: Lemmas are not unique, nor is the combination of Lemma, Language, and Lexical category. Two distinct lexemes with the same lexical category can exist in the same language if they have different data, it may be gender, etymology, morphology (different forms), and so on.

형태
The morphology of the lexeme is understood as a set of Forms. Each form defines how a lexeme changes based on a specific syntactic role or mode it may take in a sentence (see also lemon:Form).

형태는 다음과 같은 정보를 담습니다.


 * An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. . These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Form.
 * A representation, spelling out the Form as a string.
 * A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. Q814722 for participle.
 * A list of Statements further describing the Form or its relations to other Forms or Items (e.g. pronunciation audio, rhymes with, used until, used in region)

표현
A form's Representation is its written form, as used in a text (compare lemon:writtenRep). Just like Lemmas, Representations are not simple strings, but MultilingualTextValues, since the same form may have multiple spellings, possibly in multiple scripts.

표현은 적어도 하나 이상은 표기하여야 합니다.

Multiple forms with the same representation are allowed to enable adding usage examples demonstrating each of them. Example in Wikidata

문법적 특징
A form's grammatical features specify under which conditions or in which syntactic role that form is used (see lexinfo:morphosyntacticProperty and grammatical category on Wikipedia). Multiple grammatical features can be combined to express under which conditions the language's grammar requires a given form to be used. Grammatical features are represented as references to Items.

의미
어휘소의 의미(Sense)란 텍스트로 나타낸 여러 어의(語義)를 말합니다. 의미는 "주석"이나 일반적인 뜻풀이 정의로 표현합니다.(영어 위키백과의 '내포적 정의' 문서 참고)

A sense is described using the following information:

In Wikidata image is also added to provide a culturally adapted image of the sense, e.g. of a letterbox or color that can vary greatly between cultures.
 * An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. . These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
 * A Gloss, defining the meaning of the Sense using natural language.
 * A list of Statements further describing the Sense and its relations to Senses and Items (e.g. item for this sense, synonym, antonym, connotation, register, denotes, evokes).

주석
주석(Gloss)은 의미를 자연스럽게 풀이하여 정의한 것입니다(영어 위키백과의 'Gloss'와 skos:definition 참고). 어휘소와 비슷하게, 주석은 단순한 문자열이 아니라 'MultilingualTextValues'입니다. 주석은 변종 형태를 제공하지는 않으나, 서로 다른 언어로 풀이할 수는 있습니다. 예를 들어, 하나의 프랑스어 낱말에 독일어로 된 주석을 단다면, 프랑스어를 배우는 독일 사람에게는 큰 도움이 될 수 있습니다.

Similar to Lemmas, Glosses are not simple strings, but MultilingualTextValues. However, the reason is not providing support for variants, but to allow the gloss to be given in entirely different languages. E.g. it would be quite useful for a German learning French to have a German gloss for a French sense.

주석은 빈 채로 둘 수 없으며, 적어도 하나 이상은 표기하여야 합니다.

Short glosses of only a single or a few words should be avoided as it leaves too much space for interpretation of the meaning.

In Wikidata Glosses are often very similar to carefully crafted descriptions on Q-items. E.g. for apple the Q-items English description fruit of the apple tree is copied as gloss when using tools like MachtSinn to match lexemes and Q-items together and create missing senses.

같이 보기

 * 어휘소 데이터 모델 예시