Wikidata Toolkit

The Wikidata Toolkit is an open source Java library for using data from Wikidata and other Wikibase sites. Its main goal is to make it easy for external developers to take advantage of this data in own applications. The project started in early 2014, supported by an Individual Engagement Grant of the Wikimedia Foundation. The original project proposal envisions features for loading data from dumps or through the Web API, as well as query functionalities to access and analyse the data.

This page and its subpages provide the main entry points to documentation and resources about the Wikidata Toolkit.

What is Wikidata? What is Wikibase?
Wikidata is a project of the Wikimedia project that aims to gather data from all Wikipedias and many other projects in a single location. It is a wiki and anyone can edit the data. If you want to know more about the project, its goals, content, and development, then the introductory article Wikidata: A Free Collaborative Knowledge Base is a good place to start. More details are found on the project pages at wikidata.org.

The software that is used to run this site is Wikibase. This is an extension to the MediaWiki software, which is still used underneath. Indeed Wikidata also has many wikitext pages that co-exist with the data pages that make up most of the content. It is possible to use Wikibase on other sites (the first example of this is the wiki of the EAGLE Project). The Wikidata Toolkit is written to support such sites as well.

How to use Wikidata Toolkit
The current version of Wikidata Toolkit can be used to automatically download and process dumpfiles from Wikidata.org in Java. This is useful if you want to process all data in Wikidata.org in a streaming fashion in Java. Advanced query capabilities will be added in upcoming versions.

Code examples can be found online in the Examples module:
 * DumpProcessingExample shows how to load and process Wikidata dumps to extract some simple statistics

Download and installation
The current release of Wikidata Toolkit is version 0.1.0. The easiest way of using the library is with Maven. Maven users simply add the following dependency to the dependencies in their pom.xml file: Currently, the following modules (artifacts) are available:
 * wdtk-dumpfiles: Downloading and processing dumpfiles. As shown in the examples, this can be used to get Java access to all Wikidata.org data. It could also be used to download and process XML dumpfiles for arbitrary MediaWiki projects, especially for the Wikimedia projects that publish dumps at dumps.wikimedia.org. However, this access would be on the wikitext level; a parser for MediaWiki wikitext is not included.
 * wdtk-datamodel: Representing Wikibase data in Java. This is an implementation of the Wikibase datamodel as used by Wikidata and other Wikibase sites.
 * wdtk-storage: Custom data structures that are used by Wikidata Toolkit for storing data in memory.
 * wdtk-utils: Utility code that is not specific to any of the other modules.

Most likely, you want to use wdtk-dumpfiles at the current stage (it also depends on all the other modules). You could also use wdtk-datamodel alone to represent Wikibase data in your application. However, be aware that the API details may still change until the first stable release.

If you are not using Maven, you can still download the jars for the above modules manually from Maven Central. Alternatively, there is also a single all-in-one jar available from the Wikidata Toolkit release page. Note that this code depends on other libraries (this is why using Maven is so much simpler).

The source code is hosted at github, where it can be browsed, forked, and downloaded:


 * https://github.com/Wikidata/Wikidata-Toolkit

Getting help
Bugs and feature requests should be reported through github under the Wikidata Toolkit issue tracker. For further discussion, the mailing list wikidata-l (usage, general requirements) and wikidata-tech (technical discussions, development) should be used.

Getting involved
Developers are invited to contribute to the toolkit. Developers can download or fork the github repository, and are generally invited to send comments and requirements. The project uses Maven to manage dependencies and to build the code, making it very easy for developers to compile the project. Change to the folder where the source code has been downloaded to and run the following commands to compile and to test the code (required Maven >=3.0 to be installed):

mvn install mvn test

Maven integration is available for standard Java IDEs:
 * Eclipse setup
 * 

People
The project is led by Markus Kroetzsch; see also IEG proposal project team. The list of contributors can be found at github.