Wikidata Query Service/User Manual

Wikidata Query Service (WDQS) is a software package and public service designed to provide a SPARQL endpoint which allows you to query against the Wikidata data set.

Please note that the service is currently in beta mode, which means that details of the data set or the service provided can change without prior warning. This page or other relevant documentation pages will be updated accordingly; it is recommended that you watch them if you are using the service.

You can see examples of the SPARQL Queries on the SPARQL examples page.

Data set
The Wikidata Query Service operates on a data set from Wikidata.org, represented in RDF as described in the RDF dump format documentation. The service's data set does not exactly match the data set produced by RDF dumps, mainly for performance reasons; the documentation describes a small set of differences.

You can download a weekly dump of the same data from: https://dumps.wikimedia.org/wikidatawiki/entities/

Basics - Understanding SPO (Subject, Predicate, Object) also known as a Semantic Triple
spo or "subject, predicate, object" is known as a triple, or commonly referred to in Wikidata as a statement about data.

The statement "The sky has the color blue", consists of a subject ("the sky"), a predicate ("has the color"), and an object ("blue").

spo is also used as a form of basic syntax layout for querying RDF data structures, or any graph database or triplestore, such as the Wikidata Query Service (WDQS), which is powered by Blazegraph, a high performance graph database.

Advanced uses of a triple (spo) even including using triples as objects or subjects of other triples!

Basics - Understanding Prefixes
WDQS understands many shortcut abbreviations, known as prefixes. Some are internal to Wikidata

and many others are commonly used external prefixes, like

  is a prefix for a statement, or triple, or you could even think of it as the subject in an spo triple.

In the following query, we are asking for items where there is a statement of "P279 = Q7725634" or in fuller terms, selecting subjects that have a predicate of "subclass of" with an object of = "literary work".

Extensions
The service supports the following extensions to standard SPARQL capabilities:

Label service
You can fetch the label, alias, or description of entities you query, with language fallback, using the specialized service with the URI . The service is very helpful when you want to retrieve labels, as it reduces the complexity of SPARQL queries that you would otherwise need to achieve the same effect.

The service can be used in one of the two modes: manual and automatic.

In automatic mode, you only need to specify the service template, e.g.:

and WDQS will automatically generate labels as follows:


 * If an unbound variable in  is named , then WDQS produces the label  for the entity in variable.
 * If an unbound variable in  is named , then WDQS produces the alias  for the entity in variable.
 * If an unbound variable in  is named , then WDQS produces the description  for the entity in variable.

In each case, the variable in  should be bound, otherwise the service fails.

You specify your preferred language(s) for the label with one or more of  triples. Each string can contain one or more language codes, separated by commas. WDQS considers languages in the order in which you specify them. If no label is available in any of the specified languages, the Q-id of the entity (without any prefix) is its label.

Example, showing the list of US presidents and their spouses:

In this example WDQS automatically creates the labels  and   for properties.

In the manual mode, you explicitly bind the label variables within the service call, but WDQS will still provide language resolution and fallback. Example:

This will consider labels and descriptions in French, German and English, and if none are available, will use the Q-id as the label.

Geospatial search
The service allows to search for items with coordinates located within certain radius of the center of within certain bounding box.

Search around point
Example:

The first line of the  service call must have format     , where the result of the search will bind   to items within the specified location and   to their coordinates. The parameters supported are:

Search within box
Example of box search:

or:

Coordinates may be specified directly:

The first line of the  service call must have format     , where and the result of the search will bind   to items within the specified location and   to their coordinates. The parameters supported are:

and  should be used together, as well as   and , and can not be mixed. If  and   predicates are used, then the points are assumed to be the coordinates of the diagonal of the box, and the corners are derived accordingly.

Distance function
The function  returns distance between two points, in kilometers. Example usage:

Automatic prefixes
Most prefixes that are used in common queries are supported by the engine without the need to explicitly specify them.

Extended dates
The service supports date values of type  in the range of about 290B years in the past and in the future, with one-second resolution. WDQS stores dates as the 64-bit number of seconds since the Unix epoch.

Blazegraph extensions
Blazegraph platform on top of which WDQS is implemented has its own set of SPARQL extension. Among them several graph traversal algorithms which are documented on Blazegraph Wiki, including BFS, shortest path, CC and PageRank implementations.

Please also refer to the Blazegraph documentation on query hints for information about how to control query execution and various aspects of the engine.

Federation
We allow SPARQL Federated Queries to call out to a selected number of external databases. Supported endpoints are: Example federated query:

Please note that the databases listed above use ontologies that may be very different from the Wikidata one. Please refer to the owner documentation links above to learn about the ontologies and data access to these databases.

Wikimedia service
Wikimedia runs the public service instance of WDQS, which is available for use at http://query.wikidata.org/.

The runtime of the query on the public endpoint is limited to 60 seconds. That is true both for the GUI and the public SPARQL endpoint. If you need to run longer queries, please contact the Discovery team.

GUI
The GUI at the home page of http://query.wikidata.org/ allows you to edit and submit SPARQL queries to the query engine. The results are displayed as an HTML table. Note that every query has a unique URL which can be bookmarked for later use. Going to this URL will put the query in the edit window, but will not run it - you still have to click "Execute" for that.

One can also generate a short URL for the query via a URL shortening service by clicking the "Generate short URL" link on the right - this will produce the shortened URL for the current query.

The "Add prefixes" button generates the header containing standard prefixes for SPARQL queries. The full list of prefixes that can be useful is listed in the RDF format documentation. Note that most common prefixes work automatically, since WDQS supports them out of the box.

The GUI also features a simple entity explorer which can be activated by clicking on the "🔍" symbol next to the entity result. Clicking on the entity Q-id itself will take you to the entity page on wikidata.org.

Default views
If you run the query in the WDQS GUI, you can choose which view to present by specifying a comment:  at the beginning of the query.

SPARQL endpoint
SPARQL queries can be submitted directly to the SPARQL endpoint with a GET or POST request to. The result is returned as XML by default, or as JSON if either the query parameter  or the header   are provided. POST requests also accepts query in the body of the request, instead of URL, allowing to run larger queries without hitting URL length limit.

JSON format is standard SPARQL 1.1 Query Results JSON Format.

It is recommended to use GET for smaller queries and POST for larger queries, as POST queries are not cached.

Supported formats
The following output formats are currently supported by the SPARQL endpoint:

Query timeout
There is a hard query deadline configured which is set to 60 seconds.

Every query will timeout when it takes more time to execute than this configured deadline. You may want to optimize the query or report a problematic query here.

Also note that currently access to the service is limited to 5 parallel queries per IP. These limits are subject to change depending on resources and usage patterns.

Linked Data Fragments endpoint
We also support querying the database using Triple Pattern Fragments interface. This allows to cheaply and efficiently browse triple data where one or two components of the triple is known and you need to retrieve all triples that match this template. See more information at the Linked Data Fragments site.

The interface can be accessed by the URL:. Example requests:


 * https://query.wikidata.org/bigdata/ldf?subject=http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ146 - all triples with subject


 * https://query.wikidata.org/bigdata/ldf?subject=&predicate=http%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23label&object=%22London%22%40en - all triples that have English label "London"

Note that only full URLs are currently supported for the,   and   parameters.

By default, HTML interface is displayed, however several data formats are available, defined by  HTTP header.

The data is returned in pages, page size being 100 triples. The pages are numbered starting from 1, and page number is defined by  parameter.

Standalone service
As the service is open source software, it is also possible to run the service on any user's server, by using the instructions provided below.

The hardware recommendations can be found in Blazegraph documentation.

Installing
In order to install the service, it is recommended that you download the full service package as a ZIP file, e.g. from Maven Central, with group ID  and artifact ID " ", or clone the source distribution at https://github.com/wikimedia/wikidata-query-rdf/ and build it with "mvn package". The package ZIP will be in the  directory under.

The package contains the Blazegraph server as a .war application, the libraries needed to run the updater service to fetch fresh data from the wikidata site, scripts to make various tasks easier, and the GUI in the  subdirectory. If you want to use the GUI, you will have to configure your HTTP server to serve it.

By default, only the SPARQL endpoint at http://localhost:9999/bigdata/namespace/wdq/sparql is configured, and the default Blazegraph GUI is available at http://localhost:9999/bigdata/. Note that in the default configuration, both are accessible only from localhost. You will need to provide external endpoints and an appropriate access control if you intend to access them from outside.

Using snapshot versions
If you want to install an un-released snapshot version (usually this is necessary if released version has a bug which is fixed but new release is not available yet) and do not want to compile your own binaries, you can use either:
 * https://github.com/wikimedia/wikidata-query-deploy - deployment repo containing production binaries. Needs  working. Check it out and do " ".
 * Archiva snapshot deployments at https://archiva.wikimedia.org/#artifact/org.wikidata.query.rdf/service - choose the latest version, then Artifacts, and select the latest package for download.

Loading data
Further install procedure is described in detail in the [Https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md Getting Started document] which is part of the distribution, and involves the following steps:


 * 1) Download recent RDF dump from https://dumps.wikimedia.org/wikidatawiki/entities/ (the RDF one is the one ending in  ).
 * 2) Pre-process data with the   script. This creates a set of TTL files with preprocessed data, with names like , etc. See options for the script below.
 * 3) Start Blazegraph service by running the   script.
 * 4) Load the data into the service by using  . Note that loading data is usually significantly slower than pre-processing, so you can start loading as soon as several preprocessed files are ready. Loading can be restarted from any file by using the options as described below.
 * 5) After all the data is loaded, start the Updater service by using.

Scripts
The following useful scripts are part of the distribution:

munge.sh
Pre-process data from RDF dump for loading.

Example:

loadData.sh
Load processed data into Blazegraph. Requires  to be installed.

Example:

runBlazegraph.sh
Run the Blazegraph service.

Example:

Inside the script, there are two variables that one may want to edit: DEFAULT_GLOBE=2 USER_AGENT="Wikidata Query Service; https://query.wikidata.org/";
 * 1) Q-id of the default globe
 * 1) Blazegraph HTTP User Agent for federation

runUpdate.sh
Run the Updater service.

It is recommended that the settings for the  and   options (or absence thereof) be the same for munge.sh and runUpdate.sh, otherwise data may not be updated properly.

Example:

Configurable properties
The following properties are configurable via adding them to the script run command in the scripts above:

Missing features
Below are features which are currently not supported:


 * Redirects are only represented as owl:sameAs triple, but do not express any equivalence in the data and have no special support.

Contacts
If you notice anything wrong with the service, you can contact the Discovery team by email on the list  or on the IRC channel.

Bugs can also be submitted to Phabricator and tracked on the Discovery Phabricator board.