Wikidata Query Service/Implementation

Labs Deployment (beta)
Note that currently deployment is via git-fat (see below) which may require some manual steps after checkout. This can be done as follows: See also Wikidata Query service beta.
 * 1) Check out   repository and update   submodule to current   branch.
 * 2) Run   to instantiate the binaries if necessary.
 * 3) rsync the files to deploy directory

Production Deployment
Production deployment is done via git deployment repository. The procedure is as follows: The puppet role that needs to be enabled for the service is.
 * 1)   the source repository.
 * 2)   in the source repository - this deploys the artifacts to archiva. Note that for this you will need repositories   and   configured in   with archiva username/password.
 * 3) Install new files (which will be also in  ) to deploy repo above. Commit them. Note that since git-fat uses archiva as primary storage, there can be a delay between files being deployed to archiva and them appearing on rsync and ready for git-fat deployment.
 * 4) Use   to deploy the new build.

It is recommended to test deployment checkout on beta (see above) before deploying it in production.

GUI deployment
GUI deployment files are in repository  branch. It is a submodule of  which is linked as   subdirectory.

To build deployment GUI version, use  in gui subdir. This will generate patch for deploy repo that needs to be merged in gerrit (currently manually). Then update submodule  on   to latest   head and commit/push the change. Deploy as per above.

Services
Service  runs the Blazegraph server.

Service  runs the updater. Depends on wdqs-blazegraph.

Maintenance mode
In order to put the server in the maintenance mode, create file  - this will make all HTTP requests return 503 and the LB will take this server out of rotation. Note that Icinga monitoring will alert about such server being down, so you need to take the measures to prevent it if you are going to do maintenance of the server.

Non-Wikidata deployment
WDQS can be run as a service for any Wikibase instance, not just Wikidata. You can still follow the instructions in the documentation, with the following changes:

To generate the dump of your database, use  script in the   directory of Wikibase extension. Depending on your requirements, you may still want to run  script, or you may load the resulting RDF directly into the database.

You may also want to set  parameter when running Blazegraph if your ontology uses URIs that are not based on Wikidata URIs, e.g.: BLAZEGRAPH_OPTS="-DwikibaseHost=www.my-wikibasehost.org" bash ./runBlazegraph.sh And the same option for Updater.

Hardware
We're currently running on three servers in eqiad:,  ,   and three servers in codfw:  ,   and. Those two clusters are in active/active mode (traffic is sent to both), but due to how we route traffic with GeoDNS, the eqiad cluster sees most of the traffic.

Server specs are similar to the following:

CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3

Disk: 800GB raw raided space SSD

RAM: 128GB

Upgrading customized Blazegraph

 * 1) Use script buildwmf.sh to create Blazegraph binaries. Note that you need to update   in the script. Also note that   should point to Java 7 home directory (production hosts do not have Java 8 yet).
 * 2) Use instructions in source repo to upload the new binaries to archiva.
 * 3) Update Blazegraph version in   in the source.
 * 4) Rebuild/redeploy production packages as described above.

Releasing to Maven
Release procedure described here: http://central.sonatype.org/pages/ossrh-guide.html

Releasing new version

 * 1) Set the version with
 * 2) Commit the patch and merge it
 * 3) Tag the version:
 * 4) Deploy the files to OSS:  . You will need GPG key configured to sign the release.
 * 5) Proceed with the release as described in OSS guide.
 * 6) Set the version back to snapshot:
 * 7) Commit the patch and merge it

Updating specific ID
If there is a need to update specific ID data manually, this can be done using (for ID Q12345):

The runUpdate.sh script is located in the root of WDQS deployment directory. Note that each server needs to be updated separately, they do not share databases.

Units support
For support of the unit conversion, the configuration of unit conversion is stored in  This config is generated by script, e.g.: mwscript repo/maintenance/updateUnits.php --wiki wikidatawiki --base-unit-types Q223662,Q208469 --base-uri http://www.wikidata.org/entity/ If the config is changed, after new config is in place, the database should be updated (unless new dump is going to be loaded) by running: mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/addUnitConversions.php --wiki wikidatawiki --config NEW_CONFIG.json --old-config OLD_CONFIG.json --format nt --output data.nt This will generate an RDF file which then will need to be loaded into the database.

Monitoring
Icinga group

Grafana dashboard: https://grafana.wikimedia.org/dashboard/db/wikidata-query-service

WDQS dashboard: http://discovery.wmflabs.org/wdqs/

Data reload procedure

 * 1) Go to icinga and schedule downtime: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=wdqs2002
 * 2) Depool:
 * 3) Stop updater:
 * 4) Remove data loaded flag:
 * 5) Turn on maintenance:
 * 6) Stop Blazegraph:
 * 7) Prepare data for loading (can be done in advance)
 * 8) Remove old db:
 * 9) Start blazegraph:
 * 10) Check logs:
 * 11) Load data:
 * 12) Restore data loaded flag:
 * 13) Start updater:
 * 14) Check logs:
 * 15) Reload categories:
 * 16) Wait till updater catches up
 * 17) Turn off maintenance:
 * 18) Repool:

Usage constraints
Wiqidata Query Service has a public endpoint available at https://query.wikidata.org. As anyone is free to use this endpoint, the traffic sees a lot a variability and thus the performance of the endpoint can vary quite a lot.

We are working on a new internal endpoint, which will be more constraint. This should allow us to provide a more stable service for use cases which requires it. The new internal endpoint will be subject to the following constraints:


 * 30 secs timeout
 * requiring user-agent to be set
 * only allowing internal access
 * must be used only for synchronous user facing traffic, no batch jobs
 * requests are expected to be cheap

The constraints are subject to evolve once we see what the actual use cases are and how the cluster behaves.

Contacts
If you need more info, talk to User:Smalyshev_(WMF) or anybody from Discovery team.