Wikibase/DataModel/JSON

This document describes the canonical JSON format used to represent Wikibase entities in the API, in JSON dumps, as well as by Special:EntityData (when using JSON output). This format can be expected to be reasonably stable, and is designed with flexibility and robustness in mind.

For an explanation of the terms used in this document, please refer to the Wikibase Glossary. For a specification of the semantics of the data structures described here, see the Wikibase Data Model.

Changes to the JSON format are subject to the Stable Interface Policy.

''NOTE: The canonical copy of this document can be found in the Wikibase source code and should be edited there. Changes can be requested by filing a ticket on Phabricator.''

JSON Flavor
Wikibase follow the JSON specification as given in RFC 7159, aiming for interoperability in the sense of that RFC. When encoding the Wikibase data structure as JSON, several choices have been made as to how values are represented:


 * Keys in JSON objects are unique, their order is not significant.
 * Strings are encoded in one of two ways:
 * using either Unicode escape sequences (like \u0645) resulting in a UTF16 representation when decoded.
 * ...or using native UTF8 encoding.
 * Numbers may be given in two ways:
 * integers from -(2^31) to 2^31-1 may be represented as number literals.
 * all numbers may be represented as decimal strings. In particular, quantity values are represented as arbitrary precision decimal strings.
 * Entity IDs are given as upper-case strings, e.g. "P29" or "Q623289".
 * In JSON dumps, each entity is encoded in as a single line. This allows consumers to process the dump line by line, decoding each entity separately.

Clients should be ready to process any of the forms given above.

Top Level Structure
The JSON representation consists of the following fields in the top level structure:


 * id: The canonical ID of the entity.
 * type: The entity type identifier. "item" for data items, and "property" for properties.
 * labels: Contains the labels in different languages, see Labels, Descriptions and Aliases below.
 * descriptions: Contains the descriptions in different languages, see Labels, Descriptions and Aliases below.
 * aliases: Contains aliases in different languages, see Labels, Descriptions and Aliases below.
 * claims: Contains any number of statements, groups by property. See Statements below.
 * sitelinks: Contains site links to pages on different sites describing the item, see Site Links below.
 * lastrevid: The JSON document's version (this is a MediaWiki revision ID).
 * modified: The JSON document's publication date (this is a MediaWiki revision timestamp).

API modules currently handle the revision and date modified slightly differently using the fields below.

API modules also often return extra information related to the entity and the wiki:


 * title: The title of the page the entity is stored on (this could also include namespace such as 'Item:Q60')
 * pageid: The page id the entity is stored on
 * ns: the namespace id the entity is stored in

Labels, Descriptions and Aliases
Labels, descriptions and aliases are represented by the same basic data structure. For each language, there is a record using the following fields:


 * language: The language code.
 * value: The actual label or description.

In the case of aliases, each language is associated with a list of such records, while for labels and descriptions the record is associated directly with the language.

Site Links
Site links are given as records for each site identifier. Each such record contains the following fields:


 * site: The site ID.
 * title: The page title.
 * badges: Any "badges" associated with the page (such as "featured article"). Badges are given as a list of item IDs.
 * url: Optionally, the full URL of the page may be included.

Statements
A Statement consists of a main Snak, a (possibly empty) list of qualifier Snaks, and a (possibly empty) list of references. A Statement is always associated with a Property (semantically, the Statement is about the Property), and there can be multiple Statements about the same Property in a single Entity. This is represented by a map structure that uses Property IDs as keys, and maps them to lists of Statement records.

A Statement record uses the following fields:


 * id: An arbitrary identifier for the Statement, which is unique across the repository. No assumptions can and shall be made about the identifier's structure, and no guarantees are given that the format will stay the same.
 * type: Always statement. (Historically, claim used to be another valid value here.)
 * mainsnak: The Snak representing the value to be associated with the property. See Snaks below. The Property specified in the main Snak must be the same as the Property the Statement is associated with.
 * rank: The rank expresses whether this value will be used in queries, and shown be visible per default on a client system. The value is either preferred, normal or deprecated.
 * qualifiers: Qualifiers provide a context for the primary value, such as the point in time of measurement. Qualifiers are given as lists of snaks, each associated with one property. See Qualifiers below.
 * references: References record provenance information for the data in the main Snak and qualifiers. They are given as a list of reference records; see References below.

(Historically, there was a distinction between Claims, which had only a main snak and qualifiers, and Statements, which also had references. Traces of this distinction may still be found in the serialization or in outdated documentation.)

Snaks
A Snak provides some kind of information about a specific Property of a given Entity. Currently, there are three kinds of Snaks: value, somevalue or novalue. A value snak represents a specific value for the property, which novalue and somevalue only express that there is no, or respectively some unknown, value.

A Snak is represented by providing the following fields:


 * snaktype: The type of the snak. Currently, this is one of value, somevalue or novalue.
 * property: The ID of the property this Snak is about.
 * datatype: The datatype field indicates how the value of the Snak can be interpreted. The datatypes could be any other of the datatypes listed on Special:ListDatatypes.
 * datavalue: If the snaktype is value, there is a datavalue field that contains the actual value the Snak associates with the Property. See Data Values below.

Data Values
Data value records represent a value of a specific type. They consist of two fields:


 * type: the value type. This defines the structure of the value field, and is not to be confused with the Snak's data type (which is derived from the Snak's Property's data type). The value type does not allow for interpretation of the value, only for processing of the raw structure. As an example, a link to a web page may use the data type "url", but have the value type "string".
 * value: the actual value. This field may contain a single string, a number, or a complex structure. The structure is defined by the type field.

Some value types and their structure are defined in the following sections.

string
Strings are given as simple string literals.

wikibase-entityid
Entity IDs are used to reference entities on the same repository. They are represented by a map structure containing three fields:


 * entity-type: defines the type of the entity, such as item or property.
 * id: the full entity ID.
 * numeric-id: for some entity types, the numeric part of the entity ID.

WARNING: not all entity IDs have a numeric ID – using the full ID is highly recommended.

globecoordinate

 * latitude: The latitude part of the coordinate in degrees, as a float literal (or an equivalent string).
 * longitude: The longitude part of the coordinate in degrees, as a float literal (or an equivalent string).
 * precision: the coordinate's precision, in (fractions of) degrees, given as a float literal (or an equivalent string).
 * globe: the URI of a reference globe. This would typically refer to a data item on wikidata.org. This is usually just an indication of the celestial body (e.g. Q2 = earth), but could be more specific, like WGS 84 or ED50.
 * altitude: Deprecated and no longer used. Will be dropped in the future.

quantity
Quantity values are given as a map with the following fields:


 * amount: The nominal value of the quantity, as an arbitrary precision decimal string. The string always starts with a character indicating the sign of the value, either "+" or "-".
 * upperBound: Optionally, the upper bound of the quantity's uncertainty interval, using the same notation as the amount field. If not given or null, the uncertainty (or precision) of the quantity is not known. If the upperBound field is given, the lowerBound field must also be given.
 * lowerBound: Optionally, the lower bound of the quantity's uncertainty interval, using the same notation as the amount field. If not given or null, the uncertainty (or precision) of the quantity is not known. If the lowerBound field is given, the upperBound field must also be given.
 * unit: the URI of a unit (or "1" to indicate a unit-less quantity). This would typically refer to a data item on wikidata.org, e.g. http://www.wikidata.org/entity/Q712226 for "square kilometer".

time
Time values are given as a map with the following fields:


 * time: the format and interpretation of this string depends on the calendar model. Currently, only Julian and Gregorian dates are supported. The format used for Gregorian and Julian dates use a notation resembling ISO 8601. E.g. "+1994-01-01T00:00:00Z". The year is represented by at least four digits, zeros are added on the left side as needed. Years BCE are represented as negative numbers, using the historical numbering, in which year 0 is undefined, and the year 1 BCE is represented as -0001, the year 44 BCE is represented as -0044, etc., like XSD 1.0 (ISO 8601:1988) does. In contrast, the RDF mapping relies on XSD 1.1 (ISO 8601:2004) dates that use the proleptic Gregorian calendar and astronomical year numbering, where the year 1 BCE is represented as +0000 and the year 44 BCE is represented as -0043. See Wikipedia for more information about the year zero and ISO 8601. Month and day may be 00 if they are unknown or insignificant. The day of the month may have values between 0 and 31 for any month, to accommodate "leap dates" like February 30. Hour, minute, and second are currently unused and should always be 00. Note: more calendar models using a completely different notation may be supported in the future. Candidates include Julian day and the Hebrew calendar. Note: the notation for Julian and Gregorian dates may be changed to omit any unknown or insignificant parts. E.g. if only the year 1952 is known, this may in the future be represented as just "+1952" instead of currently "+1952-00-00T00:00:00Z" (which some libraries may turn into something like 1951-12-31) and the 19th century may be represented as "+18**".
 * timezone: Signed integer. Currently unused, and should always be 0. In the future, timezone information will be given as an offset from UTC in minutes. For dates before the modern implementation of UTC in 1972, this is the offset of the time zone from universal time. Before the implementation of time zones, this is the longitude of the place of the event, expressed in the range &minus;180° to 180° (positive is east of Greenwich), multiplied by 4 to convert to minutes.
 * calendarmodel: A URI of a calendar model, such as gregorian or julian. Typically given as the URI of a data item on the repository
 * precision: To what unit is the given date/time significant? Given as an integer indicating one of the following units:
 * 0: 1 Gigayear
 * 1: 100 Megayears
 * 2: 10 Megayears
 * 3: Megayear
 * 4: 100 Kiloyears
 * 5: 10 Kiloyears
 * 6: millennium (see Wikibase/DataModel for details)
 * 7: century (see Wikibase/DataModel for details)
 * 8: 10 years
 * 9: years
 * 10: months
 * 11: days
 * 12: hours (unused)
 * 13: minutes (unused)
 * 14: seconds (unused)
 * before: Begin of an uncertainty range, given in the unit defined by the precision field. This cannot be used to represent a duration. (Currently unused, may be dropped in the future)
 * after: End of an uncertainty range, given in the unit defined by the precision field. This cannot be used to represent a duration. (Currently unused, may be dropped in the future)

Qualifiers
Qualifiers provide context for a Statement's value, such as a point in time, a method of measurement, etc. Qualifiers are given as snaks. The set of qualifiers for a statement is provided grouped by property ID, resulting in a map which associates property IDs with one list of snaks each.

Example
Below is an example of an extract of a complete entity represented in JSON.