Topic on Extension talk:WikibaseLexeme/Data Model

Relationships between representations

6
TJones (WMF) (talkcontribs)

It's not clear how relationships between different representations of different forms of a word will be represented. For example, "color" and "colour" are two representations of color. Similarly, "colors" and "colours" are two representations of colors, which is the plural of color. However, "colours" is not the plural of "color", and "colors" is not the plural of "colours".

Similarly, in Serbo-Croatian, Latin "pȁs" and Cyrillic "пас" (meaning "dog") are two representations of one lexeme, and "psȉ" and "пси̏" are two representations of another, related lexeme (the plural). How can the more specific relationship between pȁs/psȉ and пас/пси̏ be represented?

On a related note, would there be any explicit representation of the fact that "color" is an AmE variant and "colour" is a BrE/CanE variant, while "pȁs" is the Latin variant and "пас" the Cyrillic variant?

Psychoslave (talkcontribs)

I understand it, forms are direct embedded property (I'm not sure property match the Wikidata terminology here though), not independent items to which the proposed lexem structure can link to.

However, as I understand the glossary definition of property, "Each statement at an item page links to a property, and assigns the property one or several values, or some other relation or composite or possibly missing value", it's should be possible to link plural forms and other relations between forms as statements on the lexem item, shouldn't it?

TJones (WMF) (talkcontribs)

Sounds plausible. As long as there is some way to indicate the relationship. This is going to be such an awesome resource for computational language nerds.

Denny (talkcontribs)

Forms will have identity and can be referred and linked to directly. They are not independent of the Lexemes (they always belong to one and only one Lexeme and depend on the existence of that Lexeme), but they still get an identity and can be directly linked to, which is useful for such properties as "rhymes with" or "anagram of".

Psychoslave (talkcontribs)
Denny (talkcontribs)

Possible? Technically: yes. Practically: probably not. The data model is not the main problem in this case. The community would need to agree on a language to use for the Voynich manuscript, approve that language code for inclusion, then we need to add items for "unknown grammatical function", enter every occurrence of a token as a form with unknown grammatical markers, connect them to unlabeled lexemes. Technically there is no problem - the data model is certainly flexible enough to accommodate the use case -, practically I don't see that happening in Wikidata itself.

But I could totally see it happening in an external instance, and in fact it would be a great use case, as the statements model is very flexible and would allow to add competing theories, references, to allow to point to the occurrences, etc. This all could be a rather nice collaborative tool for people trying to decipher the Voynich manuscript and to collect all that is currently known and theorized.

But unless the Wiktionaries actually already try to cover the language of the Voynich manuscript, I do have to wonder whether this is actually a requirement for the data model, or whether this is merely a theoretical question.

Reply to "Relationships between representations"