Wikibase/DataModel/JSON

This document can be found in the Wikibase source code and should be edited only there. It should be ensured that this page is up to date with the version in the code.

= Wikibase JSON format =

This document describes the canonical JSON format used to represent Wikibase entities in the API, in JSON dumps, as well as by Special:EntityData (when using JSON output). This format can be expected to be reasonably stable, and is designed with flexibility and robustness in mind.

For an explanation of the terms used in this document, please refer to the Wikibase Glossary.

NOTE: this is different from the JSON format used by Wikibase internally, when storing entities to the database. The internal format is what is used in MediaWiki's XML dumps and returned by Special:Export and then by some API modules that return raw revision text. The internal format is designed to be terse, and may frequently change. External tools should use the canonical JSON format whenever possible, and should not rely on the internal format.

JSON Flavor
When encoding the data structure in JSON, several choices had to be made as to how values are represented:


 * Strings are encoded in one of two ways:
 * using either Unicode escape sequences (like \u0645) resulting in a UTF16 representation when decoded.
 * ...or using native UTF8 encoding.
 * Numbers may be given in two ways:
 * as numeric JavaScript literals (float or int)
 * ...or as strings. Strings are preferable where the precision of numeric literals guaranteed by JSON may not be sufficient.
 * Entity IDs are given as upper-case strings, e.g. "P29" or "Q623289". Note: until recently, lower-case prefixes were used in entity IDs!

Clients should be ready to process any of the forms given above.

Top Level Structure
The JSON representation consists of the following fields in the top level structure:


 * id: The canonical ID of the entity.
 * type: The entity type identifier. "item" for data items, and "property" for properties.
 * labels: Contains the labels in different languages, see Labels, Descriptions and Aliases below.
 * descriptions: Contains the descriptions in different languages, see Labels, Descriptions and Aliases below.
 * aliases: Contains aliases in different languages, see Labels, Descriptions and Aliases below.
 * claims: Contains any number of claims or statements, groups by property. See Claims and Statements below.
 * sitelinks: Contains site links to pages on different sites describing the item, see Site Links below.
 * lastrevid: The JSON document's version (this is a MediaWiki revision ID).
 * modified: The JSON document's publication date (this is a MediaWiki revision timestamp).

API modules currently handle the revision and date modified slightly differently using the fields below.

API modules also often return extra information related to the entity and the wiki:


 * title: The title of the page the entity is stored on (this could also include namespace such as 'Item:Q60')
 * pageid: The page id the entity is stored on
 * ns: the namespace id the entity is stored in

Labels, Descriptions and Aliases
Labels, descriptions and aliases are represented by the same basic data structure. For each language, there is a record using the following fields:


 * language: The language code.
 * value: The actual label or description.

In the case of aliases, each language is associated with a list of such records, while for labels and descriptions the record is associated directly with the language.

Site Links
Site links are given as records for each site identifier. Each such record contains the following fields:


 * site: The site ID.
 * title: The page title.
 * badges: Any "badges" associated with the page (such as "featured article"). Badges are given as a list of item IDs.
 * url: Optionally, the full URL of the page may be included.

Claims and Statements
A Claim consists of a main value (or main Snak) and a number of qualifier Snaks. A Statement is a Claim that also contains a (possibly empty) list of references. A claim is always associated with a Property (semantically, the Claim is about the Property), and there can be multiple Claims about the same Property in a single Entity. This is represented by a map structure that uses Property IDs as keys, and maps them to lists of Claim records.

A Claim record uses the following fields:


 * id: An arbitrary identifier for the claim, which is unique across the repository. No assumptions can and shall be made about the identifier's structure, and no guarantees are given that the format will stay the same.
 * type: the type of the claim - currently either statement or claim.
 * mainsnak: If the claim has the type value, it has a mainsnak field that contains the Snak representing the value to be associated with the property. See Snaks below. The Property specified in the main Snak must be the same as the property the Claim is associated with. That is, if a value claim is provided for property P17, its main Snak will specify P17 as the property the value is assigned to.
 * rank: Ihe rank expresses whether this value will be used in queries, and shown be visible per default on a client system. The value is either preferred, normal or deprecated.
 * qualifiers: Qualifiers provide a context for the primary value, such as the point in time of measurement. Qualifiers are given as lists of snaks, each associated with one property. See Qualifiers below.
 * references: If the Claim's type is statement, there may be a list of references, given as a list of reference records. See References below.

Snaks
A Snak provides some kind of information about a specific Property of a given Entity. Currently, there are three kinds of Snaks: value, somevalue or novalue. A value snak represents a specific value for the property, which novalue and somevalue only express that there is no, or respectively some unknown, value.

A Snak is represented by providing the following fields:


 * snaktype: The type of the snak. Currently, this is one of value, somevalue or novalue.
 * property: The ID of the property this Snak is about.
 * datatype: The datatype field indicates how the value of the Snak can be interpreted. The datatypes could be any other of the datatypes listed on Special:ListDatatypes.
 * datavalue: If the snaktype is value, there is a datavalue field that contains the actual value the Snak associates with the Property. See Data Values below.

Data Values
Data value records represent a value of a specific type. They consist of two fields:


 * type: the value type. This defines the structure of the value field, and is not to be confused with the Snak's data type (which is derived from the Snak's Property's data type). The value type does not allow for interpretation of the value, only for processing of the raw structure. As an example, a link to a web page may use the data type "url", but have the value type "string".
 * value: the actual value. This field may contain a single string, a number, or a complex structure. The structure is defined by the type field.

Some value types and their structure are defined in the following sections.

string
Strings are given is given as simple string literals.

wikibase-entityid
Entity IDs are used to reference entities on the same repository. They are represented by a map structure containing two fields:


 * entity-type: defines the type of the entity, such as item or property.
 * numeric-id: the is the actual ID number.

WARNING: wikibase-entityid may in the future change to be represented as a single string literal, or may even be dropped in favor of using the string value type to reference entities.

NOTE: There is currently no reliable mechanism for clients to generate a prefixed ID or a URL from the information in the data value.

globecoordinate

 * latitude: The latitude part of the coordinate in degrees, as a float literal (or an equivalent string).
 * longitude: The longitude part of the coordinate in degrees, as a float literal (or an equivalent string).
 * precision: the coordinate's precision, in (fractions of) degrees, given as a float literal (or an equivalent string).
 * globe: the URI of a reference globe. This would typically refer to a data item on wikidata.org. This is usually just an indication of the celestial body (e.g. Q2 = earth), but could be more specific, like WGS 84 or ED50.
 * altitude: Deprecated and no longer used. Will be dropped in the future.

time
Time values are given as a map with the following fields:


 * time: the format and interpretation of this string depends on the calendar model. Currently, only Julian and Gregorian dates are supported. The format used for Gregorian and Julian dates use a notation resembling ISO 8601. E.g. "+1994-01-01T00:00:00Z". The year is represented by at least four digits, zeros are added on the left side as needed. Years BCE are represented as negative numbers, following the traditional ordering, in which year 0 is undefined, and the year 1 BCE is represented as -0001, the year 44 BCE is represented as -0044, etc., like XSL 1.0 (ISO 8601:1988) does. In contrast, the RDF mapping relies on XSL 1.1 (ISO 8601:2004) dates that use the proleptic Gregorian calendar and astronomical date ordering, where the year 1 BCE is represented as +0000 and the year 44 BCE is represented as -0043. Month and day may be 00 if they are unknown or insignificant. The day of the month may have values between 0 and 31 for any month, to accommodate "leap dates" like February 30. Hour, minute, and second are currently unused and should always be 00. Note: more calendar models using a completely different notation may be supported in the future. Candidates include Julian day and the Hebrew calendar. Note: the notation for Julian and Gregorian dates may be changed to omit any unknown or insignificant parts. E.g. if only the year 1952 is known, this may in the future be represented as just "+1952" instead of currently "+1952-00-00T00:00:00Z", which some libraries may turn into 1951-12-31.
 * timezone: Signed integer. Currently unused, and should always be 0. In the future, timezone information will be given as an offset from UTC in minutes. For dates before the modern implementation of UTC in 1972, this is the offset of the time zone from Universal time universal time. Before the implementation of time zones, this is the longitude of the place of the event, expressed in the range &minus;180° to 180° (positive is east of Greenwich), multiplied by 4 to convert to minutes.
 * calendarmodel: A URI of a calendar model, such as gregorian or julian. Typically given as the URI of a data item on the repository
 * precision: To what unit is the given date/time significant? Given as an integer indicating one of the following units:
 * 0: 1 Gigayear
 * 1: 100 Megayears
 * 2: 10 Megayears
 * 3: Megayear
 * 4: 100 Kiloyears
 * 5: 10 Kiloyears
 * 6: Kiloyear
 * 7: 100 years
 * 8: 10 years
 * 9: years
 * 10: months
 * 11: days
 * 12: hours (unused)
 * 13: minutes (unused)
 * 14: seconds (unused)
 * Note that the precision should be read as an indicator of the significant parts of the date string, it does not directly specify an interval. That is, 1988-07-13T00:00:00 with precision 8 (decade) will be interpreted as 198?-??-?? and rendered as "1980s". 1981-01-21T00:00:00 with precision 8 would have the exact same interpretation. Thus the two dates are equivalent, since year, month, and days are treated as insignificant.
 * before: Begin of an uncertainty range, given in the unit defined by the precision field. This cannot be used to represent a duration. (Currently unused, may be dropped in the future)
 * after: End of an uncertainty range, given in the unit defined by the precision field. This cannot be used to represent a duration. (Currently unused, may be dropped in the future)

Qualifiers
Qualifiers provide context for a Claim's value, such as a point in time, a method of measurement, etc. Qualifiers are given as snaks. The set of qualifiers for a statement is provided grouped by property ID, resulting in a map which associates property IDs with one list of snaks each.

Example
Below is an example of an extract of a complete entity represented in JSON.