Wikidata Query Service/Categories

Wikidata Query Service also provides access to category graph of select wikis. The list of covered wikis can be seen here: https://noc.wikimedia.org/conf/categories-rdf.dblist

Accessing the data
The data is stored in the Blazegraph database in  namespace. Currently, there is no GUI to access the category data, but SPARQL queries can be made against the namespace by using https://query.wikidata.org/bigdata/namespace/categories/sparql?query=SPARQL. This SPARQL endpoint works in the same way as the main WDQS SPARQL endpoint.

Note that while each wiki has its own data set, they are all stored in the same namespace.

Example query, providing subcategories of category Ducks on English wikipedia:

NOTE: this query would not work with default GUI! For now, you will have to run it manually against the SPARQL endpoint above.

NOTE: the dataset includes only categories and not pages belonging to categories (the latter would be much bigger data set).

Data format
The data about category describe its URL and the name, e.g.

 a mediawiki:Category ; rdfs:label "Test". Links between categories are represented as  relationship, e.g.:  mediawiki:isInCategory 

Prefixes
Prefix  is defined as https://www.mediawiki.org/ontology. Full ontology can be found at https://www.mediawiki.org/ontology/ontology.owl.

Dump header
Dump header contains information about the dump, e.g.:  a schema:Dataset, owl:Ontology ; cc:license  ; schema:softwareVersion "1.0" ; schema:dateModified "2017-09-09T20:00:05Z"^^xsd:dateTime ; schema:isPartOf  ; owl:imports .

Data dumps
Data dumps are stored in https://dumps.wikimedia.org/other/categoriesrdf/. Full dumps are performed weekly. Each wiki has its own dump file.

https://dumps.wikimedia.org/other/categoriesrdf/lastdump/ stores timestamps of the last dump performed.

Updating
Currently, the data in Blazegraph is not automatically updated yet (this is still in the works) but the plan is to update the data daily.

Adding wikis
It is not clear yet how wikis are to be added to the list. For now, if you want some wiki added, please comment on the talk page. Exception is Commons, which has by far the largest set of categories and thus we decided not to cover it for now, until we ensure everything works as planned with smaller data sets.

TODO

 * Regular (daily) updates:
 * Support from the GUI
 * Easier tree querying (stored query?)
 * More wikis support
 * Hidden category support: