Wikidata Query Service/User Manual

= Introduction = Wikidata Query Service (WDQS) is a software package and public service designed to provide a SPARQL endpoint which allows you to query against the Wikidata data set.

Please note that the service is currently in beta mode, which means that details of the data set or the service provided can change without prior warning. This page or other relevant documentation pages will be updated accordingly; it is recommended that you watch them if you are using the service.

You can see examples of the SPARQL Queries on the SPARQL Examples page.

Data set
The data set is the data from Wikidata.org, represented in RDF as described in the RDF dump format documentation. Please note that the service data set does not exactly match the data set produced by RDF dumps, mainly for performance reasons - there is a small set of differences described in the RDF format documentation.

Extensions
The following are extensions to standard SPARQL capabilities supported by the service:

Label service
WDQS allows you to fetch the label, alias or description of the entity queried, with language fallback, using the specialized service with the URI . The service is very helpful when the labels are required as it allows you to reduce the complexity of SPARQL queries otherwise needed to achieve the same effect.

The service can be used in one of the two modes: manual and automatic.

When using automatic mode, only the service template needs to be specified, e.g.:

and the labels will be automatically generated according to the following: In either case, the variable in ?{NAME} should be bound, otherwise the service fails.
 * If an unbound variable in SELECT is named, then the label  for the entity in variable ?{NAME} is produced
 * If an unbound variable in SELECT is named, then the alias  for the entity in variable ?{NAME} is produced
 * If an unbound variable in SELECT is named, then the description  for the entity in variable ?{NAME} is produced

The language can be specified by one or more of  triples, where each string can contain one or more language codes, separated by commas. Languages are considered in the order in which they are specified. If no label is available in any of the specified languages, the Q-id of the entity (without any prefix) is used as the label.

Example, showing the list of US presidents and their spouses:

The labels  and   are created automatically by the service.

In the manual mode, the label variables should be explicitly bound within the service call, but the service will still provide the language resolution and fallback. Example:

This will consider labels and descriptions in French, German and English, and if none are available, will use the Q-id as the label.

Extended dates
The service supports date values of type  in the range of about 290B years in the past and in the future, with one-second resolution. The date is stored as the 64-bit number of seconds since the Unix epoch.

Wikimedia service
Wikimedia runs the public service instance of WDQS, which is available for use at http://query.wikidata.org/.

The runtime of the query on the public endpoint is limited to 30 seconds. That is true both for the GUI and the public SPARQL endpoint. If you need to run longer queries, please contact the Discovery team.

GUI
The GUI at the home page of http://query.wikidata.org/ allows you to edit submit SPARQL queries to the query engine. The results are displayed as an HTML table. Note that every query has a unique URL which can be bookmarked for later use. Going to this URL will put the query in the edit window, but will not run it - you still have to click "Execute" for that.

One can also generate a short URL for the query via a URL shortening service by clicking the "Generate short URL" link on the right - this will produce the shortened URL for the current query.

The "Add prefixes" button generates the header containing standard prefixes for SPARQL queries. The full list of prefixes that can be useful is listed in the RDF format documentation.

The GUI also features a simple entity explorer which can be activated by clicking on the "*" symbol next to the entity result. Clicking on the entity Q-id itself will take you to the entity page on wikidata.org.

SPARQL endpoint
SPARQL queries can be submitted directly to the SPARQL endpoint with GET request to https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}. The result is returned as XML by default, or as JSON if either the query parameter  or the header   are provided.

Standalone service
As the service is open source software, it is also possible to run the service on any user's server, by using the instructions provided below.

The hardware recommendations can be found in Blazegraph documentation.

Installing
In order to install the service, it is recommended that you download the full service package as a ZIP file, e.g. from Maven Central, with group ID  and artifact ID " ", or clone the source distribution at https://github.com/wikimedia/wikidata-query-rdf/ and build it with "mvn package". The package ZIP will be in the  directory under.

The package contains the Blazegraph server as a .war application, the libraries needed to run the updater service to fetch fresh data from the wikidata site, scripts to make various tasks easier, and the GUI in the  subdirectory. If you want to use the GUI, you will have to configure your HTTP server to serve it.

By default only the SPARQL endpoint at http://localhost:9999/bigdata/namespace/wdq/sparql is configured, and the default Blazegraph GUI is available at http://localhost:9999/bigdata/. Note that in the default configuration both are accessible only from localhost. You will need to provide external endpoints and an appropriate access control if you intend to access them from outside.

Loading data
Further install procedure is described in detail in the [Https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md Getting Started document] which is part of the distribution, and involves the following steps:
 * 1) Download recent RDF dump from https://dumps.wikimedia.org/wikidatawiki/entities/ (the RDF one is the one ending in  ).
 * 2) Pre-process data with the   script. This creates a set of TTL files with preprocessed data, with names like , etc. See options for the script below.
 * 3) Start Blazegraph service by running the   script.
 * 4) Load the data into the service by using  . Note that loading data is usually significantly slower than pre-processing, so you can start loading as soon as several preprocessed files are ready. Loading can be restarted from any file by using the options as described below.
 * 5) After all the data is loaded, start the Updater service by using.

Scripts
The following useful scripts are part of the distribution:

munge.sh
Pre-process data from RDF dump for loading. Example:

loadData.sh
Load processed data into Blazegraph. Requires  to be installed. Example:

runBlazegraph.sh
Run the Blazegraph service. Example:

runUpdate.sh
Run the Updater service. It is recommended that the settings for the  and   options (or absence thereof) be the same for munge.sh and runUpdate.sh, otherwise data may not be updated properly.

Example:

Missing features
Below are features which are currently not supported:
 * Geolocation search. While the coordinates themselves are part of the data set, the operations on these coordinates - like comparison, search within radius, etc. - are not yet supported.
 * Redirects are only represented as owl:sameAs triple, but do not express any equivalence in the data and have no special support.
 * SERVICE requests to outside URLs are not allowed in queries.

Contacts
If you notice anything wrong with the service, you can contact the Discovery team by email on the list  or on the IRC channel.

Bugs can also be submitted to Phabricator and tracked on the Discovery Phabricator board.