Wikibase/DataModel

This document is a draft, and should not be assumed to represent the ultimate structure.

The data model of Wikibase describes the structure of the data that is handled in Wikibase. In particular, it specifies which kind of information users can contribute to the system. The data model is conceptual ("Which information do we have to support?") and does not specify how this data should be represented technically ("Which data structures should the software use?") or syntactically ("How should the data be expressed in a file?").

This specification is technical. A primer to the data model is also available that is easier accessible (and more ambiguous and less complete).

Separate documents describe the serialization of the Wikibase data model in JSON and in RDF.

Goals and requirements
The data model has the goal to clarify which information is stored in Wikibase. The model is extensible, but at any point in time it should document all things that are possibly stored in the system. It has two main goals:

Conceptual clarity: It should be clear what Wikibase can (and what it cannot) capture. It is not possible to capture all statements that one could make about the world (not even all that are important or reasonable). A balance must be found between expressive power and complexity/usability.

Technical documentation: Almost every component of Wikibase has to work with the data. To develop the software, it is therefore essential to have a common understanding of what the data is. Internally, the data can be represented quite differently (in objects, in a syntactic format, in a user interface, etc.): it is only important that each representation has a unique and unambiguous reading in terms of the data model.

There are a number of (sometimes conflicting) requirements that the data model should address in a balanced fashion:


 * Coverage: the data model should be able to capture important data that occurs in Wikipedia in a natural way
 * Simplicity: the data model should not be overly complex
 * Extensibility: the data model should allow future extensions
 * Flexibility: accessing and re-purposing data should be supported; the utility of the data should not be limited to one context
 * Exchange: (parts of) the data should be exchangeable and have a clear meaning even outside the concrete system context of Wikidata
 * Technical support: the data model should allow for adequate representations in existing data formats, e.g., JSON or RDF/OWL

The data model covers information that is expected to be relevant in the cause of the Wikidata project. Initially, only a part of it needs to be implemented, but it is important to ensure that the data model can also support later requirements (at least to the extent that they are in scope of the Wikidata project). Therefore, the below data model is not separated into phases.

There are also a number of things that the data model is not supposed to do (or that are at least beyond this document), in particular:


 * Internal data structures: The data model is specified using UML, but this does not mean that it mandates the actual class structures to be used in implementation (in Wikidata or elsewhere). In many concrete situations, data can be stored in a more optimized way.
 * Export formats: Data could be exported in many syntactic forms. Other documents will specify how this is done in each case.
 * Formal semantics: This document explains what the data is intended to express, and gives concrete examples. However, it is not a completely precise specification of how to interpret this data formally: this will be given in a separate document.

Overview of the data model
The main purpose of Wikidata is to store data about things that are described by pages in Wikipedia (in any language). For example, one might want to store that the population of Berlin is 3,499,879. In this case, Berlin is the thing that is described, for example, by the article Berlin in English Wikipedia. In Wikidata, such a "thing" is represented as an Item. The Wikidata Item for Berlin would represent the thing that the Wikipedia article is about, not the Wikipedia article itself. For example, the size of Berlin is very different from the size of the Wikipedia page, and Wikidata only aims at collecting the former, not the latter.

For every Item, various pieces of information are stored in Wikidata. First, there is some basic information that clarifies what the Item is about, such as the link to a Wikipedia page in some language. There are also human readable labels and short descriptions that are used to help Wikidata users find the right Item. Second, there is a list of Statements that users have entered about the Item. Together, the information that is stored about one Item is called an ItemDescription.

Statements are the main approach of representing factual data, such as the population number in the above example. A Statement consists of two parts: a claim that something is the case (e.g., the claim "Berlin has a population of 3,499,879") and a list of references for that claim (e.g., a publication by the statistical office for Berlin-Brandenburg). The reference is given by a ReferenceRecord, and the list of references is allowed to be empty (like in Wikipedia, editors can add Statements without a reference, which might later be improved by others who know about a suitable reference).

The claim that is made in a Statement can have various forms. The most common form is a single assignment of a Value to a Property. For example, population is a Property and the number 3,499,879 is a Value. Property-Value pairs can express many different claims, and Values can be numbers, dates and times, geographic coordinates, and many more. An important special case are values that are Items. For example, one could state that Berlin is the capital of Germany, where Germany has its own Item in Wikidata, that the Property capital of refers to. Properties are defined by users, so any Property can be created. As opposed to Items, Properties do not refer to Wikipedia pages, but they do specify a Datatype for the data that they (usually) store. The data stored about Properties forms a PropertyDescription.

The individual things that Wikidata talks about, including Items and Properties, are called Entities. All Entities are Values, but many kinds of Values are not Entities (examples of the latter kind include Values for numbers, strings, and geographic coordinates). This is so since Wikidata does not intend to store Statements about individual data values, such as strings or numbers (but it could store Statements about a number as a concept that is discussed on a Wikipage, in which case the number is represented by a Wikidata Item).

Property-Value pairs are not the only kind of claims that can be given in a Statement. It is also possible to say, for example, that a Property has no Values for the given Item. For example, one can say that Angela Merkel has no children. Stating this can be relevant to distinguish it from the (common) case that the children have simply not been entered into Wikidata yet. Other things that one can say are related to classification, for example to state that Berlin is a city (i.e., "an instance of the class of all cities"). This is treated in a specific way since classification is important in many areas, e.g., in biologic taxonomies. For lack of a better name, any such basic assertion that one can make in Wikidata is called a Snak (which is small, but more than a byte). This term will not be relevant for using Wikidata (editors will not encounter it), but it is relevant for developers to avoid confusion with Statements or other claims.

For advanced usage, it is possible to make claims that consist of more than one Snak. For example, one might need to say that "the population of Berlin is 3,499,879, considering only the territory of the city, as estimated on 30 November 2011." Here, we have two additional Snaks that specify the territory the number refers to and the time when the measure was taken. It will be described below how exactly a claim can use additional Snaks.

How to read this document
This section explains our notation and general concepts that are used throughout this document.

Defining data structures in UML
The data structures that are specified in this document are usually described using UML class diagrams (see the Wikipedia page on UML for an introduction). We use only the following basic UML features:


 * classes, represented as boxes
 * abstract classes (conceptual classes that are not directly instantiated in data), represented as classes with names in italics
 * class inheritance, represented by arrows with empty triangles as heads, pointing to the superclass
 * class attributes ("member fields"), represented by "name: type" entries in classes
 * associations/compositions, represented by blue lines with empty/filled diamonds on the side of the class that aggregates/composes many objects of the other class

The types of class members are either classes that are defined below, or one of the following basic datatypes:

Numbers of arbitrarily large absolute value or precision can be represented as Strings, e.g., as described in the next section. For purposes of data access (e.g., retrieving values in numeric order), it will often be possible to approximate the value, e.g., by using a double value. However, technical formats such as float or double are not appropriate to represent user input accurately.

Wikidata Object Notation
UML describes data structures in a rather abstract way. To talk about concrete instances of these data structures, it is useful to have a simple serialization syntax for objects, which we call Wikidata Object Notation (WON). The WON is not intended to be used in implementations, but it is useful to give examples and to describe how the data model maps to other syntaxes, such as JSON or RDF.

The WON is described in this text along with the data model, and it will use exactly the same format. We give its simple grammar in BNF notation, using the following standard notation:

The basic datatypes that were described above can be serialized in WON as follows:

We follow common conventions for escaping "-quoted strings, and of enclosing IRIs with < >.

Values
Values are basic objects of Wikidata, that only represent one particular thing. Items represent topics of Wikipedia pages, Properties represent the properties that Items (or other Entities) can have, DataValues represent individual values of a particular Datatype (a number, a geographic coordinate, etc.). The kinds of Values and their structure is shown in the following figure:



Various kinds of Values can be the subject of basic statements (Snaks): they are called Entities. Entities are identified in a uniform way using Uniform Resource Identifiers (URIs), or rather Internationalized Resource Identifiers (IRIs) that also allow Unicode symbols. Since an IRI is a global identifier, no two different Entities may have the same IRI. Hence, all entities can be represented by their IRI alone, without noting what kind of Entity they are. (Items have IRIs of the form http://www.wikidata.entity/Qnnn and Properties have IRIs of the form http://www.wikidata.entity/Pnnn)

In contrast to Entities, DataValues are not identified by an IRI but can simply be viewed as compound values that are identified by their content. Values without an IRI can still be named internally or in exports, but the identifiers that are used in this case will usually consist in the actual content (or a hash thereof).

Note that we distinguish single Entities (e.g., an Item about Berlin) from Descriptions of Entities (e.g., the collection of information that is stored about that Item about Berlin).

Items
Items are Entities that are typically represented by a Wikipage (at least in some Wikipedia languages). They can be viewed as "the thing that a Wikipage is about," which could be an individual thing (the person Albert Einstein), a general class of things (the class of all Physicists), and any other concept that is the subject of some Wikipedia page (including things like History of Berlin).

The IRI of an Item will typically be closely related to the URL of its page on Wikidata. It is expected that Items store a shorter ID string (for example, as a title string in MediaWiki) that is used in both cases. ID strings might have a standardized technical format such as "Q1234567890" and will usually not be seen by users. The ID of an Item should be stable and not change after it has been created.

The exact meaning of an Item cannot be captured in Wikidata (or any technical system), but is discussed and decided on by the community of editors, just as it is done with the subject of Wikipedia articles now. It is possible that an Item has multiple "aspects" to its meaning. For example, the page Orca describes a species of whales. It can be viewed as a class of all Orca whales, and an individual whale such as Keiko would be an element of this class. On the other hand, the species Orca is also a concept about which we can make individual statements. For example, one could say that the binomial name (a Property) of the Orca species has the Value "Orcinus orca (Linnaeus, 1758)."

However, it is intended that the information stored in Wikidata is generally about the topic of the Item. For example, the Item for History of Berlin should store data about this history (if there is any such data), not about Berlin (the city). It is not intended that data about one subject is distributed across multiple Wikidata Items: each Item fully represents one thing. This also helps for data integration across languages: many languages have no separate article about Berlin's history, but most have an article about Berlin.

Properties
Properties are Entities that describe a relationship between Items (or other Entities) and Values of the property. Typical properties are population (using numbers as values), binomial name (using strings as values), but also has father and author of (both using Items as values).

Like Items, Properties are identified by an IRI that will probably be closely related to their URL on Wikidata. However, the IDs will be based on a different naming scheme so that no confusion with Items is possible. For example, a typical identifier string used in a Property ID could be "P123456789". The ID of a Property should be stable and not change after it has been created.

Properties are treated differently to Items because they do not usually have a page in Wikipedia. While there is a page en:population, it does not describe the relationship between a region and its number of (human) inhabitants, but rather the noun population. This can be close to the property, but it can also lack important information. For example, the page en:parent describes what a parent is, but there are multiple related properties, especially parent of and has parent (which have a very different meaning). Wikipedias do not usually contain specific articles about such properties, only about the concepts that they relate to.

As another difference from Items, Properties can have a Datatype that specifies what kind of values users will normally enter for them. Note, however, that the data model does not require strict typing for Properties in Snaks (see below).

Datatypes
A Datatype is an Entity that determines the type and shape of the values that can be assigned to a Property. There are various common Datatypes, and each must be handled specifically by the software (for example, the user interface will be different depending on the type of data that is edited). Therefore, the Datatypes that are supported by Wikidata can only be extended by software developers, not by editors on the site. However, it might be possible to customize some Datatypes when using them for a Property (e.g., one might be able to say that a Property should only accept numbers without decimal digits, i.e., integers).

Most Datatypes are not primitive in the sense that their values consist of only one single value of a type that is commonly found in programming languages. For example, geographic coordinates are an important type of data in Wikidata, but they have an internal structure (e.g., specifying a latitude, longitude, and possibly a height).

More information about the Datatypes available in Wikidata is given in the respective section below.

DataValues
DataValues are Values that are not Entities. They represent values of a particular Datatype, such as a particular number or point in time. Details on the available DataValues and their according types is given in the respective section below.

Snaks
Snaks are the basic information structures used to describe Entities in Wikidata. They are an integral part of each Statement (which can be viewed as collection of Snaks about an Entity, together with a list of references). The kinds of Snaks and their structure is shown in the following figure:



Many of the Snaks are based on similar pieces of information, yet we distinguish Snaks that are intended to have a different meaning. This is useful in many places. Typically, Snaks of different meaning will be represented differently in the user interface. Moreover, it might be that some kinds of Snaks are not supported initially.

PropertyValueSnak
A PropertyValueSnak describes that an Entity has a certain Property with a given Value. Note that it is not required that Value belongs to the Datatype that is currently given to the Property in the system. In general, the UI and API of Wikidata will only allow Values that match the given Datatype, but if the Datatype is changed, then it will not be possible to update all stored data immediately. Moreover, if the Datatype is changed back to its earlier value, it might be possible to continue using existing data that was not changed. This is the main reason for not limiting the data model to strictly typed Properties.

Please also note that the data model does not actually define a unique Datatype for each Property: it just specifies how Datatype assignments would be represented; a unique Datatype is only obtained in a closed system where every Property has a globally unique Datatype assignment.

The Wikidata Object Notation for PropertyValueSnaks is as follows:

Here and below, we omit the names of attributes (e.g., "subject") in WON, and simply encode their values positionally. We do not specify any delimiters between the arguments in this notation. It is silently assumed that whitespace is introduced to avoid ambiguities.

PropertyNoValueSnak
A PropertyNoValueSnak describes that an Entity has no values for a certain Property.

PropertySomeValueSnak
A PropertySomeValueSnak describes that an Entity has some value for a certain Property, without saying anything about this value. This can be used if the value of a property is unknown.

InstanceOfSnak
An InstanceOfSnak describes that an Entity is an instance of another Entity, where the latter is considered as a class. This corresponds to the "Is a" relationship that is commonly used in knowledge modeling.

SubclassOfSnak
A SubclassOfSnak describes that an Entity is a subclass of a certain Entity, where both Entities are considered as classes. If Entity A is a subclass of Entity B, then all instances of A must also be instances of B, but it is not required that Wikidata computes this. In any case, it is meaningful to support this special relationship natively (e.g., to avoid having many different properties for it). Various export formats have special support for subclasses in this sense, so the information can be made available to external tools.

PropertyIntervalSnak
A PropertyIntervalSnak describes that an Entity has a certain Property with all values that are within a given range of Values. At present, it is intended to support this for intervals of time, which are extremely common in Wikipedia, but it could also be supported for other intervals or (more generally) sets of Values such as numbers or geographic locations.

PropertySomeIntervalSnak
A PropertySomeIntervalSnak describes that an Entity has a certain Property with all values in some unknown interval that is not empty. This means the same as saying that the Entity has some value for the Property, which can be done with a PropertySomeValueSnak. However, it is intended to distinguish both cases in the user interface, hence there must be different structures in the data model.

This Snak is mainly provided to support the structural differences between single values and intervals, also on the user level. It has the same applications as PropertySomeValueSnak but for cases where the user has chosen to (or had to) provide an interval.

Statements
Statements describe the claim of a statement and list references for this claim. Every Statement refers to one particular Entity, called the subject of the Statement. There is always one main Snak that forms the most important part of the statement. Moreover, there can be zero or more additional PropertySnaks that describe the Statement in more detail. These auxiliary Snaks store additional information that does not directly refer to the subject (e.g., the time at which the main part of the statement was valid). References are provided as a list (the order is significant in some contexts, especially for displaying a main reference). The complete structure is described as follows:



The individual components have the following meaning:
 * subject: the Entity that the statement is about
 * mainSnak: the main Snak of the statement
 * rank: a StatementRank that will be used for simplifying the selection of Statements; see for more detail below
 * referenceRecords: the list of references, see below for details
 * auxiliarySnaks: optional list of additional PropertySnaks that qualify the statement

Note that auxiliary Snaks can only be PropertySnaks. It is not supported to use InstanceOfSnak or SubclassOfSnak as auxiliary Snaks, since these Snaks must refer to Entities to be meaningful.

Ranks of Statements
The ranks provide a simple selection/filtering criterion in cases where there are many Statements for some property. There are three possible ranks, which have roughly the following meaning:


 * 1) Preferred statements refer to the most important and most up-to-date information that will be shown to all users and that would be displayed in a Wikipedia Infobox by default (example: most recent population figures for Berlin).
 * 2) Normal statements contain relevant information that is believed to be correct but that may be too extensive for showing it by default (example: historic population figures for Berlin for many years).
 * 3) Deprecated statements that may not be considered reliable or that are even known to contain errors (example: a statement that documents a wrong population figure that was published in some historic document; in this case the statement is not wrong – the historic document that is given as a reference really made the erroneous claim – yet the statement should not be used in most cases).

This model is intentionally left coarse and simple. The three levels translate to different treatments in data access, UI (e.g., what is displayed by default), and export (one could, e.g., have an export with only the preferred and normal Statements). The ranks may also be useful for protecting Statements from editing (e.g., by protecting only preferred and normal statements). More fine-grained rankings do not seem to have such a clear interpretation and would thus increase the UI complexity unnecessarily. Having only two ranks (or no ranks at all), on the other hand, would make it harder to cope with Statements that are not trusted, known to contain wrong claims, or simply unpatrolled (if ranks are used for protection).

ReferenceRecords
ReferenceRecords are intended to store information about some source, possibly together with additional data about the exact place that is referenced, such as page number or chapter.

EntityDescriptions of Items and Properties
EntityDescriptions are collections of information about an entity, and they mainly serve as data containers that can be interpreted as sets of Snaks with some further attributes (that could also be represented as Snaks, if desired). Each EntityDescription supports internationalized labels, descriptions, and aliases. ItemDescriptions additionally contain a list of Statements about that Item, while PropertyDescriptions mainly refer to the Datatype of the Property (more detailed property declarations might be supported later). The overall structure is as follows:



Every EntityDescription can contain basic language information, explained below. PropertyDescriptions and ItemDescriptions must refer to entities of the expected type. Moreover, all Statements of an ItemDescription must use the expected Item as the subject of their main Snak.

Every EntityDescription can contain language information that is used for displaying and identifying an Entity. There are three main fields to represent this data:


 * titles: a list of TitleRecord objects, each specifying a site language and a title string that uniquely identifies the described entity in that language, e.g., Georgia (country).
 * For Items, this is interpreted as the Title of the associated Wikipedia article (in the language).
 * For Properties, this is used as an identifier in the same sense, but without being linked to a Wikipedia page.
 * label: the main label to be used for representing the described Entity in Wikidata in various languages, e.g., Georgia could be an English label.
 * description: a brief description to clarify the meaning of the label (which may be ambiguous), e.g., a country in the Caucasus could be an English description.

Labels and descriptions are MultilingualTexts, and thus might be extended with pronunciation information and spoken versions later on. In contrast, TitleRecords only contain a title String, which is not considered a text in this sense: it is really just a string key (and possibly a Wikipedia title string).

There can only be at most one title, label, and description in each EntityDescription. The data model does not include aliases on this level. Entities might have various alternative labels, e.g., Sakartvelo is an alias for Georgia. If this will be supported, then a more general mechanism based on Statements would be used to allow arbitrary Property Values to be used as aliases (but this is mainly a user interface issue).

Entities provide two kinds of keys for identifying entities:
 * Each TitleRecord is a key.
 * The combination of label and description for one particular language is a key.

In Wikibase, users will typically select entities by picking the right label-description, but the title is useful for identifying entities with a human-readable text-only form. Such a form will also be required to refer to Wikibase Entities from within Wikipedia wiki text.

Datatypes and their Values
Datatypes are Entities that specify the format of Property Values. The set of Datatypes in Wikibase is system-defined (it can be extended, but only by developers). Every Datatype has a fixed IRI, that is also system-defined.

For every Datatype, there is one particular form of Values that are used to represent Values of that type. Wikibase distinguishes between Values that can be the subject of Snaks, called Entities, and Values that are not the subject of Snaks, called DataValues. The following is an overview of all DataValues:



Numbers

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_number
 * Value: QuantityValue

A QuantityValue represents a decimal number, together with information about the precision of this number, and an optional unit of measurement. The decimal number is represented as a string using the lexical form of XML Schema decimal. The attributes are:


 * number: decimal number
 * variance: decimal number
 * unit: IRI or empty if no unit is used

The given number is interpreted as the main value of the QuantityValue. The variance specifies how far the true value of the represented quantity could possibly deviate from the number in positive or negative direction. This allows to capture expressions such as 12300 +/- 50. For many practical purposes, only the number might be used (e.g., for sorting and query answering), but the variance can provide valuable information for presentation (e.g., for selecting reasonable precision in unit conversions).

The unit specifies a physical quantity that the number refers to. It is represented as a IRI rather than as a String, since a string like "m" might represent different units in different contexts. The value should be meaningful independently of the declaration information for its Property (from which more details about units could possibly be obtained), hence the unit is a full IRI. In practice, this IRI might be generated from a (normalized) unit string and the information to which quantity it belongs (in Wikidata).

Dates and times

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_time
 * Value: TimeValue

The calendar model used for saving the data is always the proleptic Gregorian calendar according to ISO 8601, but the Calendar model used for displaying the data is given by the saved Calendar model.

A TimeValue represents a point in time that might be imprecise (e.g., if only a year is given). For practical purposes (e.g., sorting values), the value will often be interpreted to be exact by filling the missing positions with more details. The structure of values of this type is as follows:
 * time (isotime) : point in time, represented per in a format resembling ISO 8601, the year having up to 16 digits, the date always being signed, in the format +00000002013-01-01T00:00:00Z. The month or day, and the time, will be set to zero if they are unknown; the precision field should be relied on to determine which time digits are meaningful. The Z is meaningless and the time zone should be determined from the timezone field.
 * precision: shortint. The numbers have the following meaning: 0 - billion years, 1 - hundred million years, ..., 6 - millenia, 7 - century, 8 - decade, 9 - year, 10 - month, 11 - day, 12 - hour, 13 - minute, 14 - second. They refer to the  calendartime  time.
 * after: integer. If the date is uncertain, how many units before the given time could it be? the unit is given by the precision.
 * before: integer. If the date is uncertain, how many units after the given time could it be? the unit is given by the precision.
 * timezone: signed integer. Timezone information as an offset from UTC in minutes. For dates before the modern implementation of UTC in 1972, this is the offset of the time zone from universal time. Before the implementation of time zones, this is the longitude of the place of the event, expressed in the range &minus;180° to 180° (positive is east of Greenwich), multiplied by 4 to convert to minutes.
 * calendarmodel: URI identifying the calendar model that should be used to display this time value. Note that time is always saved in proleptic Gregorian, this URI states how the value should be displayed.
 * calendartime: The time in the format specific to the calendarmodel field.

Interpretation of dates follow ISO 8601:
 * Presently dates refer to the (possibly proleptic) Gregorian or Julian calendar, as specified by the calendarmodel field. Any future extension to other calendars is likely to require a drastically different format for the time field when used with such other calendars.
 * There is a year number 0 that refers to the year that is commonly called 1 BC(E).

Examples
If you have something like "between 1846 and 1855", you can use the "before" and "after" fields of the time value: time: "+00000001850-00-00T00:00:00Z", precision: 9, before: 4, after: 5 This means the "main" value is 1850, given as a year, with a lower bound four years before and an upper bound 5 years after the "main" value (before and after are given in the unit specified by the precision value). The "main" value is what is going to be displayed per default; it will also be used for sorting query results (once we have queries).

This is a bit complicated, but should allow you to actually represent uncertain dates. We made it so you can be precise about the uncertainty.

Web resources and other IRIs

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_iri
 * Value: IriValue

An IriValue represents an arbitrary IRI that follows RFC 3987. If the protocol part is supported by MediaWiki, a hyperlink might be displayed, but the Datatype as such does not require such protocols, and generally it is not required that all IRIs work as URLs. For example, the "tel:" protocol (RFC 3966) might also be allowed.

Geographic locations

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_geocoords
 * Value: GeoCoordinateValue

A coordinate is represented as as:
 * a latitude (decimal, no default, 9 digits after the dot and two before, signed)
 * a longitude (decimal, no default, 9 digits after the dot and three before, signed)
 * a precision (decimal, representing degrees of distance, defaults to 0, 9 digits after the dot and three before, unsigned, used to save the precision of the representation)
 * a dimension (decimal, 3 digits after the dot and and 12 before, unsigned, rough diameter of the object in meter, used for selecting the scale of the map and for uncertainty of an area) (compare to the dim value in the GeoData extension)
 * a coordinate system or globe (identified by an URI, defaults to http://wikidata.org/entity/Q2, i.e. Q2, the Earth, which means WGS84). Any such geodesic system must imply the globe for which it is used (and should be displayed as simply the globe in most cases).

Geographic shapes

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_geoshapes
 * Value: GeoShapeValue

Wikidata items

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_items
 * Value: Item

Items in Wikibase are represented by Item as explained in the section on Values above. While not subtypes of DataValue, we list them here to define the IRI for the respective datatype. It is not planned to have user-defined properties for other types of Entities for now.

Wikidata properties

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_properties
 * Value: Property

Item attributes in Wikibase are represented by Properties as explained in the section on Values above. While not subtypes of DataValue, we list them here to define the IRI for the respective datatype. It is not planned to have user-defined properties for other types of Entities for now.

Media

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_media
 * Value: MediaValue

Strings that are not translated

 * Datatype IRI: http://www.w3.org/2001/XMLSchema#string
 * Value: StringValue

Strings are represented by StringValues. All strings are considered as sequences of Unicode glyphs. As opposed to multilingual and monolingual texts, strings do not contain any language information, and are typically used directly only for strings that are do not belong to a language, e.g., the post code of a UK city.

Monolingual texts

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_monotext
 * Value: MonolingualTextValue

MonolingualTextValues are Values that represent a phrase in some language. In particular, their content could also be pronounced (and be associated with pronunciation information or audio versions). The attributes of MonolingualTextValues are:


 * language: UserLanguageCode
 * value: String

Multilingual texts

 * Datatype IRI: http://wikidata.org/vocabulary/datatype_multitext
 * Value: MultilingualTextValue

MultilingualTextValues are Values that represent a phrase in many languages. This is different from representing many individual Values for each language, since it also captures the information that all of the Values are direct translations (otherwise, if a Property has multiple MonolingualTextValues in each language, it would not be clear which values belong together). MultilingualTextValues store a list of MonolingualTextValues, but at most one for each UserLanguageCode.

Complete Datamodel in WON
Below is an overview of all WON definitions given within this document. Note that this list was created manually, so it might need to be updated (last update 15 Sept 2012).