User:Henning (WMDE)/Wikibase/Glossary

From mediawiki.org
Warning Warning: This glossary is maintained by the WIKIBASE developers and should not be edited by anyone else unless there is upfront consent. The Wikidata glossary is on Wikidata:Help:Glossary.

In order to avoid misconceptions, this glossary should not be translated. In the same sense, do not hesitate to contact the developers if you have any questions regarding a glossary term or a WIKIBASE concept, if something needs clarification or if you have a suggestion as the glossary will be improved upon your feedback.

Alias[edit]

An Alias is an alternative name the subject represented by an Entity is known under. Technically, Aliases are represented by Alias Groups. The purpose of an Alias is to ease identifying, searching and finding Entities as the search algorithm considers Aliases in addition to the Label. The term Alias derives from a technical origin and, hence, may be misleading. While a Label is intended to be a phrase that identifies an Entity best, Alias and Label (in the sense of "name") may diverge from the idea of mapping real-world names to Labels and real-world aliases/pseudonyms to Aliases as demonstrated in the following examples.

Theoretical examples for using Aliases:

  • A artist's aliases and nicknames may be specified as Aliases.
  • Some different spelling, maybe even common spelling mistakes of a subject's name.
  • A flower's scientific name may be used as Alias while its common name may be used as Label.
  • The Federal Republic of Germany is more commonly known as, simply, "Germany". Hence, "Germany" may be used as Label while "Federal Republic of Germany" may be used as Alias. This, however, diverges from the concept of mapping the actual real-world name to the Label and the real-world alias to the Alias.
  • Some artists, authors and athletes are most commonly known by an alias. Assigning the real-world alias Pelé as Label to an Item representing the football player eases identifying that Item. However, Pelé's actual name is Edson Arantes do Nascimento which may be added as Alias instead.

With the intention of the concept of Aliases being illustrated, how Label and Aliases relate to the real-world naming of subjects represented by Entities is to be defined by the operator(s) of a WIKIBASE Repository.

Alias Group[edit]

Alias Group is a technical concept representing multiple phrases (multiple strings) in a specific language (as compared to a Term which represents a single string in a specific language). Alias Groups are the technical representation of Aliases with each Alias Group containing all Aliases of an Entity in one particular language.

Badge[edit]

A Badge may be assigned to a Site Link and represents specific information regarding the linked Page. Badges are configured per WIKIBASE Repository and may be used to, for example, express some status of the linked Page.

Claim[edit]

A Claim is encapsulated by a Statement and features a Main Snak along with optional Qualifiers. A Claim is identified by its Globally Unique Identifier (GUID). Like the name suggests, a Claim is a claim about the subject represented by an Item or Property while a Claim itself does not concern credibility or validity. Such attributes are directly expressed by References that may be applied to a Claim forming a Statement.

Data Type[edit]

Like a common data type, a WIKIBASE Data Type is a classification identifying one of various types of data and determines the possible values that may be assigned to that Data Type. The constraint determining possible values is provided by the Data Value derivative that is assigned to the Data Type. (For example, the url Data Type is powered by a string Data Value. Hence, let alone additional validation, any value conforming to the string Data Value may be assigned to the url Data Type.) Each Property features one particular Data Type that is specified when creating the Property and may not be changed afterwards. The available Data Types are defined in the WIKIBASE Repository's configuration.

On the bottom line, values (strings, numbers, dates etc.) assigned to a particular Property have to conform to the Property Data Type's Data Value specification.

Data Value[edit]

A Data Value is a technical concept that allows capturing values of specific Data Value types (not to be confused with Data Types). Data Value instances are containers for actual values. One of the simplest Data Values is the string Data Value that captures strings while the more complex Data Value of type time captures date/time values. See the the Data Values page for detailed information on supplied Data Values.

Description[edit]

A Description is a language-specific descriptive phrase for an Entity captured by a Term. It provides context for the Label and, by doing that, eases identifying, searching and finding Entities as Descriptions ensure disambiguation between Entities featuring the same Label. The combination of Description and Label must be unique within a language. As there is no other purpose than the ones mentioned, a Description should be as short and precise as possible.

Theoretical examples of Descriptions:

  • "German football player"
  • "third planet closest to the Sun in the Solar System"
  • "1963 murder of US President"

Entity[edit]

Entity is the base of Item, Property and Query. Entities are rendered as individual pages in the WIKIBASE Repository. The URIs of those pages are defined by the Entity ID: <Base URL of the WIKIBASE Repository>/entity/<Entiy ID>. Apart from the Entity ID, an Entity features a Fingerprint that contains objects that ease identifying, finding and distinguishing Entities aside from their plain Entity IDs.

Entity Terms[edit]

Entity Terms is an alternative name to Fingerprint used in the JavaScript Data Model implementation.

Entity ID[edit]

An Entity ID is the unique identifier of an Entity. The Entity ID consists of a number prefixed by a letter specific to the Entity type. Prefixes are:

Fingerprint[edit]

With Label, Description and Aliases, a Fingerprint contains objects that ease identifying, finding and distinguishing Entities aside from their plain Entity IDs.

Item[edit]

Item is an Entity that features Site Links and Statements. Items may be regarded the containers for the actual contents of a WIKIBASE Repository. Each Item represents some specific subject which Statements shall be aggregated about.

Theoretical examples of subjects that may be represented by Items:

  • Some specific place, i.e. the city of Berlin, Germany.
  • Some person, i.e. the author George Orwell.
  • Some object, i.e. the book "Nineteen Eighty-Four" by George Orwell.
  • Some event, i.e. the Eurovision Song Contest 1993.
  • Some performance, i.e. the Greek entry one's at the Eurovision Song Contest 1993.

Label[edit]

A Label is a language-specific name of an Entity captured by a Term. While a Label itself does not have to be unique, the combination of Label and Description must be unique within a language. A Label is intended to be the most common or easily understandable phrase an Entity is known under since a Label's purpose is to ease identifying, searching and finding Entities by not having to deal with Entity IDs. By applying this intended concept in a WIKIBASE Repository, a Label is not guaranteed to be of a specific origin. That said, the relationship between Label and Aliases may be misleading as the term Alias derives from a pure technical origin that is supposed to capture Label alternatives. Hence, mapping real-word names to Entity Labels and real-world aliases/pseudonyms to Entity Aliases may not be intended to be applied as demonstrated in the following examples.

Theoretical examples of Labels as to the intention of Label being the most common name a subject is known under:

  • "Berlin"; There are more than just one cities named "Berlin" in the world. However, each of those cities is most commonly known under the name "Berlin". Proper Descriptions should ensure disambiguation.
  • "Germany"; Even though the country's formal name is "Federal Republic of Germany", the country is commonly known as "Germany". "Federal Republic of Germany" should at least be an Alias though.
  • "George Orwell"; Eric Arthur Blair is commonly known under his pen name. However, "Eric Arthur Blair" should at least be added as an Alias.

As the Label is not guaranteed to be the actual official or "given" name of an Entity, it is good practice to always additionally specify an official or "given" name by a Statement on the Entity.

With the intention of the concept of Label being illustrated, how Label and Aliases relate to the real-world naming of subjects represented by Entities is to be defined by the operator(s) of a WIKIBASE Repository.

Main Snak[edit]

Main Snak is a figurative term for the base Snak that is encapsulated by a Claim. The Claim consisting of this Snak may be limited or refined by applying additional Snaks as Qualifiers to it. The term Main Snak is used only to stress the differentiation between the Snak being the Main Snak and those additional Qualifier Snaks. Technically, both, Main Snak and Qualifiers are Snaks.

Multi Term[edit]

Multi Term is an alternative name to Alias Group used in the JavaScript Data Model implementation.

Page[edit]

In the context of a Site Link, a Page, specified by its name on the Site, is the specific target of a Site Link.

Property[edit]

Property is an Entity that features a Data Type and Statements. A Property's Data Type may be defined only when creating the Property and may not be changed afterwards. Properties are used to create Snaks that form Claims and Statements.

Theoretical examples of Properties:

  • "height"
  • "date of birth"
  • "motto"
  • "geographic location"
  • "web site"

Qualifier[edit]

A Qualifier is a Snak defining a Main Snak's circumstance. It limits or refines the statement expressed by the Main Snak. Hence, other than the name suggests, a Qualifier does not qualify a Main Snak as it does not attribute any quality to it.

Theoretical examples of using a Qualifier:

  • A Main Snak specifying an mountain's height may be refined by adding the year of the measurement as Qualifier.
  • The pseudonym an author used for a particular work may be added as a Qualifier to a Main Snak referencing a work by that author.
  • If a person had an occupation for a limited time only, the time period may be specified by adding start date and end date as Qualifiers on the Main Snak specifying the occupation.

Detailed theoretical example of using a Qualifier: A football player scored a particular number of goals in a particular season. Assuming there is an Item representing the football player, there are several methods to capture that information:

  • Using a Property goals scored would allow specifying a Main Snak featuring the number of goals. Adding a Qualifier using a Property season allows limiting the Main Snak's value to the particular season. With the help of Qualifiers, the Property goals scored could also be used to track the overall goals the player scored during his career or while playing in a particular division.
  • Using a Property played in season would allow specifying a Main Snak featuring the season. Adding a Qualifier using a Property goals scored allows specifying the number of goals the player scored during that season. The Property played in season would allow listing all the seasons the played participated in.

The example is supposed to demonstrate that information may be captured by various structures. Which structure is (which structures are) supposed to be used is out of WIKIBASE's scope and is to be defined by the operator(s) of a WIKIBASE Repository.

One particular standard WIKIBASE imposes on the structure of Qualifiers is that Qualifiers featuring the same Property are visually grouped in the WIKIBASE Repository's user interface. The intended representation of values is demonstrated by the following example:

A Property mayor is supposed to list the current as well as all former mayors on an Item representing some city. A particular person was elected mayor three times. However, that mayor's terms did not occur one after another, but were interrupted by a term of another person being elected mayor. Technically, there are three options to model this circumstance:

  • Each mayor is referenced just one single time in the list. Qualifiers specify multiple start and end dates for every mayor that was elected more than once. While this would be the format producing the fewest overhead, the list generated in the user interface would appear unorganized as the mayorships are not naturally ordered by time.
  • Each mayor is referenced for each term with Qualifiers specifying the start and end date of each mayor's term. While this format would instantly contain another information (the number of terms during a specific time range–consider resignations!), the list might become quite long.
  • Each mayor is listed once for every consecutive accumulation of terms. While this format would prevent a list from growing too large, it does not benefit from capturing additional information.

As to the standard imposed on the WIKIBASE Repository's user interface, the first format is not recommended. While the format for representing such information should be clearly defined per WIKIBASE Repository instance, the decision on such a standard is up to the operator(s) of each WIKIBASE Repository.

Query[edit]

(not implemented yet)

A Query is an Entity that defines a search across Items in terms of being a descriptor for a search instead of being a container for the actual hits generated by the search.

Rank[edit]

Figuratively, Ranks apply weight to Statements. Their purpose is to apply focus on most up-to-date as well as most correct Statements in regard to the visualization in Entity renderings as well as in regard to the result of Queries that do not bypass the ranking mechanism by intention. Other than the term suggests, Ranks are not supposed to be used for rating or trying to capture a Statement's quality, reliability or likelihood. All of these are expressed directly by a Statement's References. Statements of each Rank may be valid and reliable as to their References.

Apart from the default normal Rank, Statements may be marked with preferred or deprecated Rank:

Normal rank[edit]

Being the Rank Statements are assigned with by default, the normal rank represents a neutral state as it does not add weight to or remove weight from a Statement. When issuing a Query on an Item or Property for Statements, the normal ranked Statements are returned for each queried Property not featuring preferred ranked Statements.

Theoretical examples for applying a normal rank:

  • Coordinates of a particular location. As long as there is only one set of coordinates specified, there is no need to apply any Rank other than normal.
  • An football player's past team membership. While the current team membership may be assigned the preferred Rank, the past membership should be assigned the normal Rank.
  • A person being parent to several children. All may remain on normal Rank as long as the expression of the Statement of each child being a child of that person resides on the same level of assurance (there is no more or less of "correct"). (Assigning the preferred Rank to all of those Statements would be correct as well.) Regarding this example, disputed parentage, however, may make ranking complicated.

Preferred rank[edit]

When issuing a Query on an Item or Property for Statements, by default, only the preferred Statement(s) is/are returned, provided that the Properties queried feature preferred Statements. (If there are no preferred Statements for a Property, the normal ranked Statements of that Property are returned.) This mechanism provides some sort of convenience since there is no need to figure out the value which most likely would be expected to be returned by the query. Consequently, the preferred Rank should be assigned to most current Statements and/or Statements that represent scientific consensus.

Theoretical examples for applying a preferred Rank:

  • An Item representing a city may feature a list of its current and former mayors. The current mayor would receive the preferred rank.
  • There are several ways to measure the length of a river resulting in different river length according to the method used. On an Item representing a river, the result of the most common method should probably receive the preferred rank.
  • A football player is currently playing in two teams of a football club, the top team and a youth team. While the player was playing for other teams before, the current teams may receive the preferred Rank in contrast to the membership to former teams having assigned the normal Rank.

Just like Ranks are not for rating quality, their nature is not to determine right or wrong. There may be multiple preferred Statements when there is no consensus. A theoretical example in that manner is a politically disputed status of geographic regions.

Deprecated rank[edit]

The deprecated Rank is used to mark Statements that are known to include errors or that represent outdated knowledge that has proven wrong. Marking Statements deprecated instead of simply deleting them maintains integrity aiming at making users aware of to not (re-)add the Statement with another Rank. When issuing a Query on an Item or Property for Statements, deprecated Statements will never be returned unless those are requested specifically. While creating Statements without any Reference may, in general, be problematic, having no or no proper Reference does not by itself qualify a Statement for being assigned with the deprecated Rank. The Rank attributes the Claim only, not the combination of a Claim and its References.

Theoretical examples for applying a deprecated Rank:

  • The earth being the center of the cosmos once was subject of scientific discourse. Although that can be backed by historic sources from that time, the geocentric model is deprecated.
  • An Item representing a city may feature an incorrect population figure that was published in a historical document. Backed by the source, the Statement is not wrong since the figure is accurate according to the historical document. However, since the historical document is known be erroneous, the deprecated Rank should be applied to the Statement.
  • Some literature suggests that a person was born in a specific state. However, that state did not yet exist when the person was born. A Statement ranked deprecated may be used to capture that information.

Theoretical examples for when to not apply a deprecated Rank:

  • A football player left a particular team. The Statement referencing the team membership is not deprecated as it once was true. Instead the Statement's Rank may be reset to normal and a Qualifiers may be added specifying the date the player has left the team.
  • A Statement not featuring any Reference may not automatically be regarded deprecated.

Reference[edit]

Being part of a Statement, a Reference describes a Statement's origin. A Reference is formulated by specifying one ore more Snaks. These Snaks may, for example, reference another Item representing a book, specify the page number, reference an URL or formulate the source of a Statement in some other way. WIKIBASE does neither enforce a particular schema of a Reference's structure, nor any constraints or requirements for References. How References should be formulated is to be defined by the operator(s) of a WIKIBASE Repository.

Site[edit]

A Site is an external resource whose Pages Site Links may link to. Sites are configured per WIKIBASE Repository. Only Sites that are configured Site Links may be created for.

Site ID[edit]

A Site ID references a particular Site configured in a WIKIBASE Repository.

Site Link[edit]

A Site Link represents a link to an external Page and, hence, directly connects an Item to that external Page. It features a Site ID, a Page name as well as one or more Badges.

Snak[edit]

A Snak is a single, basic assertion referring to a single Property. Actual meaning is expressed as to a Snak's type.

Snak types[edit]

A Snak may only be of the following types.

No-Value Snak[edit]

A No-Value Snak defines a Property specifically having no value. Such a Snak may be used to underline a Claim or Statement in terms of it being rather uncommon for the Property to not feature any value regarding a particular subject.

Theoretical examples that may be captured using a No-Value Snak:

  • An emperor having no children.
  • A sportsman not being member of any team.
  • A country having no national anthem.
  • A species having no teeth.
  • A company not having a web site.

Some-Value Snak[edit]

A Some-Value Snak defines a Property having a value but that value, nor parts of it, are known. In a way, such a Snak acts like a placeholder. It is known there is a specific value but the value itself is not known.

Theoretical examples that may be captured using a Some-Value Snak:

  • A historic person that lived long time ago but whose date of death is unknown.
  • A company's mail address not being known.
  • The distance to some galaxy unless is can not be expressed by a distance range.
  • An artist's real name that is not known, stressing that his common name is a pseudonym.
  • Coordinates of a perished city.

Value Snak[edit]

A Value Snak assigns a specific value (a Data Value) to a Property.

Theoretical examples that may be captured using a Value Snak:

  • A person's date of birth.
  • A mountain's height.
  • A car's top speed.
  • The number of goals scored by a football player.
  • Time range which some city is assumed to have been founded in.

Example[edit]

Theoretical example regarding the different Snak types: Given there is a Property date of death and an Item that represents some person, assigning different Snak types with the Property date of death as a Statement's Main Snak results in different meanings:

  • No-Value Snak: The person definitely has no date of death (specifically stating / stressing the person is alive).
  • Some-Value Snak: The person deceased but neither the person's exact date of death nor parts of it nor a time range are known.
  • Value Snak: The person's exact date of death, parts of it or a time range the death occurred in are known and specified as the Snak's value.

Note: Not specifying any Snak for date of death at all results in the meaning of the Property's value being unknown or not relevant: A missing person that may be alive or dead may not receive a Snak capturing date of death by purpose while defining a Snak with a Property distance to state border simply may not be relevant in the scope of an Item representing a person.

WIKIBASE has no native support for distinguishing between unknown and irrelevant as well as there is no native way to specify probability as demonstrated in the following theoretical example:

Several historical persons are referenced they might have been Robin Hood. There is no method to reflect probability or uncertainty applying some disputed flag to a Property was as in "Robin Hood was Robin of Loxley". Instead, a custom Property like might have been would need to be created to construct "Robin Hood might have been Robin of Loxley".

In a similar sense, it is not supported to specify a set of value alternatives as demonstrated in the following theoretical example:

In a historical handwriting, a person's year of birth may either be identified as July 1st 1089 or July 1st 1099. This information may not be captured in a single Snak. Multiple alternatives originating from one source cannot be represented properly at all as adding multiple Statements to a single Property–may it be date of birth or probable date of birth–may communicate there are two sources. As to using a general date of birth Property, this procedure would, even more, result in incorrect Query results as the person would appear to be born in both, 1089 and 1099.

The matter of handling of uncertainty, probability and alternatives originating from a single source should be addressed by the operator(s) of a WIKIBASE Repository.

Statement[edit]

A Statement is a Claim additionally featuring References and a Rank. By assigning Statements to Items and Properties, such Statements define the nature of Entities. Literally, Statements make statements about the subject represented by an Item or Property.

Term[edit]

Term is a technical concept representing some text (a single string) in a specific language. Allocated in a set contained by a Fingerprint, Terms are used to capture Labels and Descriptions of Entities.

WIKIBASE[edit]

WIKIBASE is a collection of applications and libraries–mainly being WIKIBASE Repository and WIKIBASE Client–for creating, managing and sharing structured data. WIKIBASE is an open source project of the Wikimedia Foundation.

WIKIBASE Client[edit]

Being a MediaWiki extension, WIKIBASE Client allows users to directly access the data (Entity, Statements etc.) stored in a WIKIBASE Repository that is connect to the WIKIBASE Client per configuration. In order to retrieve data from the WIKIBASE Repository, the programming language Lua may be used. The pages of multiple WIKIBASE Client powered MediaWiki instances may be linked using Site Links.

WIKIBASE Lib[edit]

Being a MediaWiki extension separate to WIKIBASE Repository and WIKIBASE Client, WIKIBASE Lib contains components shared by WIKIBASE Repository and WIKIBASE Client. Hence, when installing one of those extensions, WIKIBASE Lib is to be installed as well.

WIKIBASE Repository[edit]

Being a MediaWiki extension, WIKIBASE Repository enables a MediaWiki instance to act as data repository and back-end for WIKIBASE Clients by centrally storing structured, non-relational data. A WIKIBASE Repository contains Entities and all related objects and concepts that may be created and manipulated collaboratively by the MediaWiki instance's users. The WIKIBASE Repository's data may be accessed from within a WIKIBASE Client using the programming language Lua or by interacting with the WIKIBASE API.

Wikidata[edit]

Wikidata is a Wikimedia project running a WIKIBASE Repository. With the Wikipedias being WIKIBASE Clients, contents of Wikidata may be accessed directly from within the Wikipedias using Lua.