Wikibase/Indexing/WDQS Beta

The purpose of this deployment is to provide test grounds for the query service and collect basic usage patterns. The service runs at http://wdqs-beta.wmflabs.org/

Deployment host
and.

One of these machines serves, another is a standby since loading data takes several days, so if one crashes or needs to be re-imported we can continue with another while that one is reloaded.

If you need access to it ping any member of wikidata-query project on Labs. Each is an xlarge instance with 160G storage.

Source code
The code comes from https://github.com/wikimedia/wikidata-query-rdf/. See https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md for detailed description of how to build and set up stuff. This is already done on the beta host, so it's for information/disaster recovery purposes only.

All necessary data except for nginx configs (see below) is contained in  deployment package, which is what is deployed at. Deployment can be done by puppet role below. Note that puppet role does not start Blazegraph and Updater services, only nginx.

Puppet deployment
Puppet is using self-hosted puppetmaster at.

Configuration for puppetmaster: Configuration for clients:
 * check
 * set the puppetmaster to
 * check role
 * set   to true
 * check
 * set the puppetmaster to
 * enable role

Blazegraph deployment
Blazegraph is deployed in, running under user. If the service is stopped or crashes, to restart it, run: # ./runBlazegraph.sh | tee $(date +%s).log from. Preserving logs at least for some time is recommended in case some unexpected failure happens. No log rotation scheme in place so far, so just delete the old ones once you're sure nobody needs them anymore.

Some interesting settings may be found in  - namely   and. Changing those probably requires restart. Note that if you restart the Blazegraph service you may also need to restart the updater as it may give up if the Blazegraph is offline for too long (see below).

The Blazegraph instance has a GUI workbench accessible at. It is not for public access, as it allows full write access to the database. One can access it by configuring port forwarding while logging in to the host via ssh.

Updater deployment
The updater is the service that is constantly pulling Wikidata and synchronizing it with current database. If it stops, query service is still functional but contains data up to the last successful update. This service can be run under any user, as it communicates with Blazegraph only via REST API and does not store any persistent data by itself, everything is stored in Blazegraph. Currently runs under.

Recommended way of running it is: from. Preserving logs at least for some time is recommended in case some unexpected failure happens. No log rotation scheme in place so far, so just delete the old ones once you're sure nobody needs them anymore.
 * 1) ./runUpdater -n wdq | tee /srv/wdqs/blazegraph/logs/$(date +%s).log

The updater logs progress information like this: 20:32:55.850 [main] INFO org.wikidata.query.rdf.tool.Update - Polled up to 2015-05-19T09:11:50Z at (2.6, 2.7, 2.8) updates per second and (2085.5, 2096.2, 2202.2) milliseconds per second The date is the point in the main database to which it is updated, the first set of numbers is number of entities updated per second, the second - how far in catching up with the main data it got in a second. These numbers are relevant only if the service is behind the main DB.

If there is no updates, the updater will sleep and then re-check the wikidata site. It can also be stopped and re-started in any moment without affecting query service functionality. If blazegraph service is down, it will retry for a short time, then exit.

Web access
External access to the service is provided at the URL http://wdqs-beta.wmflabs.org/.

The access is performed via nginx proxy, configs are in. Only GET requests to URLs starting with  are proxied to the Blazegraph.

The root document for http://wdqs-beta.wmflabs.org/ is the WDQS Beta GUI, which is served from. It is located in  subdirectory in the sources.

Access logs are in. Searching for " " would provide the list of queries that were attempted from the GUI.

See also sample nginx config that can be used for the service.

Monitoring
The logs for SPARQL requests are available at labs Logstash: https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/wdqs

The Graphite monitoring is available on http://graphite.wmflabs.org/, e.g.: http://graphite.wmflabs.org/dashboard/#wdq-beta

Other tools
This section should eventually find better place, for now this is the list of related tools:
 * http://tools.wmflabs.org/wdq2sparql/w2s.php - WDQ to SPARQL translator
 * https://tools.wmflabs.org/bene/sparql/ - SPARQL query generator
 * https://tools.wmflabs.org/ppp-sparql/ - natural language query generator based on Platypus