Wikidata query service/Implementation

From MediaWiki.org
Jump to: navigation, search

Sources[edit]

The source code is in gerrit project wikidata/query/rdf, github mirror: https://github.com/wikimedia/wikidata-query-rdf

The GUI source is in wikidata/query/gui project and is the submodule of the main project. The deployment version of the GUI is in production branch, which is cherry-picked from master branch when necessary. Production branch should not contain test & build service files (which currently means some cherry-picks will have to be manually merged).

Labs Deployment (beta)[edit]

Note that currently deployment is via git-fat (see below) which may require some manual steps after checkout. This can be done as follows:

  1. Check out wikidata/query/deploy repository and update gui submodule to current production branch (git submodule update).
  2. Run git-fat pull to instantiate the binaries if necessary.
  3. rsync the files to deploy directory (/srv/wdqs/blazegraph)

See also Wikidata Query service beta.

Production Deployment[edit]

Production deployment is done via git deployment repository wikidata/query/deploy. The procedure is as follows:

  1. mvn package the source repository.
  2. mvn deploy -Pdeploy-archiva in the source repository - this deploys the artifacts to archiva. Note that for this you will need repositories wikimedia.releases and wikimedia.snapshots configured in ~/.m2/settings.xml with archiva username/password.
  3. Install new files (which will be also in dist/target/service-*-dist.zip) to deploy repo above. Commit them. Note that since git-fat uses archiva as primary storage, there can be a delay between files being deployed to archiva and them appearing on rsync and ready for git-fat deployment.
  4. Use scap deploy to deploy the new build.

The puppet role that needs to be enabled for the service is role::wdqs.

It is recommended to test deployment checkout on beta (see above) before deploying it in production.

GUI deployment[edit]

GUI deployment files are in repository wikidata/query/deploy-gui branch production. It is a submodule of wikidata/query/deploy which is linked as gui subdirectory.

To build deployment GUI version, use grunt deploy in gui subdir. This will generate patch for deploy repo that needs to be merged in gerrit (currently manually). Then update submodule gui on wikidata/query/deploy to latest production head and commit/push the change. Deploy as per above.

Services[edit]

Service wdqs-blazegraph runs the Blazegraph server.

Service wdqs-updater runs the updater. Depends on wdqs-blazegraph.

Maintenance mode[edit]

In order to put the server in the maintenance mode, create file /var/lib/nginx/wdqs/maintenance - this will make all HTTP requests return 503 and the LB will take this server out of rotation. Note that Icinga monitoring will alert about such server being down, so you need to take the measures to prevent it if you are going to do maintenance of the server.

Non-Wikidata deployment[edit]

WDQS can be run as a service for any Wikibase instance, not just Wikidata. You can still follow the instructions in the documentation, with the following changes:

To generate the dump of your database, use dumpRdf.php script in the repo/maintenance directory of Wikibase extension. Depending on your requirements, you may still want to run munge.sh script, or you may load the resulting RDF directly into the database.

You may also want to set wikibaseHost parameter when running Blazegraph if your ontology uses URIs that are not based on Wikidata URIs, e.g.:

BLAZEGRAPH_OPTS="-DwikibaseHost=www.my-wikibasehost.org" bash ./runBlazegraph.sh

And the same option for Updater.

Hardware[edit]

We're currently running on three servers in eqiad: wdqs1001, wdqs1002, wdqs1003. There are three standby servers in codfw: wdqs2001, wdqs2002 and wdqs2003.

Server specs are similar to the following:

CPU: dual Intel(R) Xeon(R) CPU E5-2620 v3

Disk: 800GB raw raided space SSD

RAM: 128GB

Upgrading customized Blazegraph[edit]

  1. Use script buildwmf.sh to create Blazegraph binaries. Note that you need to update VERSION in the script. Also note that JAVA_HOME should point to Java 7 home directory (production hosts do not have Java 8 yet).
  2. Use instructions in source repo to upload the new binaries to archiva.
  3. Update Blazegraph version in pom.xml in the source.
  4. Rebuild/redeploy production packages as described above.

Releasing to Maven[edit]

Release procedure described here: http://central.sonatype.org/pages/ossrh-guide.html

Releasing new version[edit]

  1. Set the version with mvn versions:set -DnewVersion=1.2.3
  2. Commit the patch and merge it (git commit/git review)
  3. Tag the version: git tag 1.2.3
  4. Deploy the files to OSS: mvn clean deploy -Prelease. You will need GPG key configured to sign the release.
  5. Proceed with the release as described in OSS guide.
  6. Set the version back to snapshot: mvn versions:set -DnewVersion=1.2.4-SNAPSHOT
  7. Commit the patch and merge it (git commit/git review)

Updating specific ID[edit]

If there is a need to update specific ID data manually, this can be done using (for ID Q12345):

runUpdate.sh -n wdq -- --ids Q12345,Q6790

The runUpdate.sh script is located in the root of WDQS deployment directory. Note that each server needs to be updated separately, they do not share databases.

Units support[edit]

For support of the unit conversion, the configuration of unit conversion is stored in mediawiki-config/wmf-config/unitConversionConfig.json. This config is generated by script, e.g.:

mwscript repo/maintenance/updateUnits.php --wiki wikidatawiki  
    --base-unit-types Q223662,Q208469 
    --base-uri http://www.wikidata.org/entity/

If the config is changed, after new config is in place, the database should be updated (unless new dump is going to be loaded) by running:

mwscript extensions/Wikidata/extensions/Wikibase/repo/maintenance/addUnitConversions.php --wiki wikidatawiki --config NEW_CONFIG.json --old-config OLD_CONFIG.json --format nt --output data.nt

This will generate an RDF file which then will need to be loaded into the database.

Monitoring[edit]

Icinga group

Grafana dashboard: https://grafana.wikimedia.org/dashboard/db/wikidata-query-service

WDQS dashboard: http://discovery.wmflabs.org/wdqs/

Contacts[edit]

If you need more info, talk to User:Smalyshev_(WMF) or anybody from Discovery team.