Wikidata Query Service/Implementation
- 1 Sources
- 2 Labs Deployment (beta)
- 3 Production Deployment
- 4 Hardware
- 5 Upgrading customized Blazegraph
- 6 Releasing to Maven
- 7 Updating specific ID
- 8 Units support
- 9 Monitoring
- 10 Data reload procedure
- 11 Usage constraints
- 12 Contacts
The source code is in gerrit project
wikidata/query/rdf, github mirror:
The GUI source is in
wikidata/query/gui project and is the submodule of the main project. The deployment version of the GUI is in
production branch, which is cherry-picked from master branch when necessary. Production branch should not contain test & build service files (which currently means some cherry-picks will have to be manually merged).
Labs Deployment (beta)
Note that currently deployment is via git-fat (see below) which may require some manual steps after checkout. This can be done as follows:
- Check out
wikidata/query/deployrepository and update
guisubmodule to current
git submodule update).
git-fat pullto instantiate the binaries if necessary.
- rsync the files to deploy directory (
See also Wikidata Query service beta.
Production deployment is done via git deployment repository
wikidata/query/deploy. The procedure is as follows:
mvn packagethe source repository.
mvn deploy -Pdeploy-archivain the source repository - this deploys the artifacts to archiva. Note that for this you will need repositories
~/.m2/settings.xmlwith archiva username/password.
- Install new files (which will be also in
dist/target/service-*-dist.zip) to deploy repo above. Commit them. Note that since git-fat uses archiva as primary storage, there can be a delay between files being deployed to archiva and them appearing on rsync and ready for git-fat deployment.
scap deployto deploy the new build.
The puppet role that needs to be enabled for the service is
It is recommended to test deployment checkout on beta (see above) before deploying it in production.
GUI deployment files are in repository
production. It is a submodule of
wikidata/query/deploy which is linked as
To build deployment GUI version, use
grunt deploy in gui subdir. This will generate patch for deploy repo that needs to be merged in gerrit (currently manually). Then update submodule
wikidata/query/deploy to latest
production head and commit/push the change. Deploy as per above.
wdqs-blazegraph runs the Blazegraph server.
wdqs-updater runs the updater. Depends on wdqs-blazegraph.
In order to put the server in the maintenance mode, create file
/var/lib/nginx/wdqs/maintenance - this will make all HTTP requests return 503 and the LB will take this server out of rotation. Note that Icinga monitoring will alert about such server being down, so you need to take the measures to prevent it if you are going to do maintenance of the server.
WDQS can be run as a service for any Wikibase instance, not just Wikidata. You can still follow the instructions in the documentation, but you may need to make some additional configurations. Please refer to Standalone Wikibase documentation for full description of the steps necessary.
We're currently running on three servers in eqiad:
wdqs1005 and three servers in codfw:
wdqs2003. Those two clusters are in active/active mode (traffic is sent to both), but due to how we route traffic with GeoDNS, the eqiad cluster sees most of the traffic.
Server specs are similar to the following:
- CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3
- Disk: 800GB raw raided space SSD
- RAM: 128GB
The internal cluster has
wdqs1008 in eqiad and
wdqs2006 in codfw. The hardware is the same as above.
Upgrading customized Blazegraph
|IMPORTANT: The content of this page is outdated. These instructions are for 1.5.x version of Blazegraph. Current build uses 2.1.x, for which we do not have custom builds yet. As soon as we have any, this will be updated. If you have checked or updated this page and found the content to be suitable, please remove this notice.|
- Use script buildwmf.sh to create Blazegraph binaries. Note that you need to update
VERSIONin the script. Also note that
JAVA_HOMEshould point to Java 7 home directory (production hosts do not have Java 8 yet).
- Use instructions in source repo to upload the new binaries to archiva.
- Update Blazegraph version in
pom.xmlin the source.
- Rebuild/redeploy production packages as described above.
Releasing to Maven
Release procedure described here: http://central.sonatype.org/pages/ossrh-guide.html
Releasing new version
- Set the version with
mvn versions:set -DnewVersion=1.2.3
- Commit the patch and merge it (
git commit/git review)
- Tag the version:
git tag 1.2.3
- Deploy the files to OSS:
mvn clean deploy -Prelease. You will need GPG key configured to sign the release.
- Proceed with the release as described in OSS guide.
- Set the version back to snapshot:
mvn versions:set -DnewVersion=1.2.4-SNAPSHOT
- Commit the patch and merge it (
git commit/git review)
Updating specific ID
If there is a need to update specific ID data manually, this can be done using (for ID Q12345):
runUpdate.sh -n wdq -- --ids Q12345,Q6790
The runUpdate.sh script is located in the root of WDQS deployment directory. Note that each server needs to be updated separately, they do not share databases.
For support of the unit conversion, the configuration of unit conversion is stored in
mediawiki-config/wmf-config/unitConversionConfig.json. This config is generated by script, e.g.:
$ mwscript extensions/Wikibase/repo/maintenance/updateUnits.php --wiki wikidatawiki \ > --base-unit-types Q223662,Q208469 \ > --base-uri http://www.wikidata.org/entity/
If the config is changed, after new config is in place, the database should be updated (unless new dump is going to be loaded) by running:
$ mwscript extensions/Wikibase/repo/maintenance/addUnitConversions.php --wiki wikidatawiki --config NEW_CONFIG.json --old-config OLD_CONFIG.json --format nt --output data.nt
This will generate an RDF file which then will need to be loaded into the database.
Grafana dashboard: https://grafana.wikimedia.org/dashboard/db/wikidata-query-service
Grafana frontend dashboard: https://grafana.wikimedia.org/dashboard/db/wikidata-query-service-frontend
WDQS dashboard: http://discovery.wmflabs.org/wdqs/
Data reload procedure
Wikidata Query Service has a public endpoint available at https://query.wikidata.org. As anyone is free to use this endpoint, the traffic sees a lot a variability and thus the performance of the endpoint can vary quite a lot.
Current restrictions are:
- Query timeout of 60 seconds
- One client (user agent + IP) is allowed 60 seconds of processing time each 60 seconds
- One client is allowed 30 error queries per second
- Clients exceeding the limits above are throttled
We also have an internal endpoint, which is serving WMF internal workloads. The endpoint is at http://wdqs-internal.discovery.wmnet/sparql . The new internal endpoint is subject to the following constraints:
- 30 secs timeout
- requiring user-agent to be set
- only allowing internal access
- must be used only for synchronous user facing traffic, no batch jobs
- requests are expected to be cheap
The constraints are subject to evolve once we see what the actual use cases are and how the cluster behaves. If you have a question about how to or whether to use it, please contact us.