API:Presenting Wikidata knowledge

From MediaWiki.org
Jump to navigation Jump to search
Plug-in Noun project 4032.svg This page is part of the Web APIs hub.


Introduction[edit]

This shows how to retrieve and present relevant information from Wikidata by associating it with entities in your application.

You can use Wikidata items and properties to provide language-independent information about entities (real-world things) in your own application – events, places, people, works of art, concepts, etc. This is more direct and consistent than presenting descriptions and snippets from Wikipedia articles about these things, as API:Page info in search results explains.

Example[edit]

inventaire.io showing Wikidata information about the book Les Misérables

Inventaire lets you create an inventory of your books and share with others. It displays certain properties from Wikidata about books, such as P364 "original language of work" and P50 "author". To do so it uses Wikidata's 'Q' IDs internally to identify books. For example its URL https://inventaire.io/entity/wd:Q180736 shows certain properties from the Wikidata entity http://www.wikidata.org/entity/Q180736 – the book "Les Misérables".

The Wikidata glossary explains entities and properties in more detail.

Recipe[edit]

  1. Find existing interesting wiki pages in the domain of your application, e.g. creative works, places, events, people, species.
  2. View the Wikidata information for those pages, choose interesting properties.
  3. Associate Wikidata entity IDs with entities of your application.
  4. Display their Wikidata information in the user's language.
  5. Use the Wikidata "sitelinks" information about the item to provide links to the full Wikipedia (and Wikiquote, Wikivoyage, etc.) article about the entity in the user's language.

Get Wikidata entity IDs[edit]

To get an article's entity ID in Wikidata, you can:

  • copy the link "Wikidata item" ('wikibase-dataitem' message key) in the sidebar in most skins. It ends with 'Q'NNNN'.
  • access the wgWikibaseItemId variable in client-side JavaScript with mw.config.get( 'wgWikibaseItemId' );
  • query the page for the page property wikibase_item:

API query for pageprop wikibase_item:

Choosing interesting properties[edit]

If you view https://www.wikidata.org/wiki/Q180736, you can see

  • Some localized information:
    • "label"
    • "description"
    • "aliases" (displayed as "Also known as")
  • Many Statements about the item, that give values for its properties such as "author" and "publication date".
  • Many sitelinks about the item, providing the titles of pages about the item in various Wikipedias, also Wikibooks, Wikiquote, etc.
This diagram shows you the most important terms you will hear around Wikidata.

Clicking the title of a statement takes you to a page about that property, for example the "author" property is Property:P50. Property pages in turn have labels, descriptions, aliases, and further statements, much like the Wikidata pages for real-world items.

The set of properties in Wikidata is steadily growing, there are thousands of them. See Wikidata:List of properties. Not all items in Wikidata have properties, and not all property values have been translated into all languages (for example Victor Hugo's occupation as a "author" has been translated into nearly all languages, but as of October 2015 "draughtsperson" has fewer translations"). So you need to consider how to fallback if a property or value isn't translated into a language you are supporting, and you shouldn't build your application around a property that only appears in a few statements. The API is doing language fallback for you if possible. (Of course you can help by contributing missing statements and translations to Wikidata.)

Querying wikibase[edit]

The extensions Wikibase Repository and Wikibase Client power Wikidata, together with related components. Most Wikimedia sites run Wikibase Client (check with Special:Version), while only wikidata.org itself runs Wikibase Repository. Wikidata Repository implements several modules for MediaWiki's Action API, all prefixed wb. The main workhorse API module in Wikibase Repository is wbgetentities (see its generated API help). This returns the dataset Wikidata has about items (in QNNNNN entities) or properties.

Retrieve and display Wikidata information[edit]

Say you have associated Wikidata entity IDs with your application's entities, and you want to display

action=wbgetentities can return the same information that you see on an item's Wikidata page: labels, descriptions, aliases, "claims" (like statements), and sitelinks.

  • You can give wbgetentities page titles on a wiki; but in this scenario, we provide it with the entity ids of the Wikidata items for entities in our application.
  • You can specify the languages you want for the information, and it will only return the description, labels, and aliases in that language (if they are available).
    • You can also specify languagefallback= so that values and properties without a translation in your requested languages fall back to some value.
  • wbgetentities has no means to specify which properties you want, instead you request all claims about the entity. So in this scenario, we request props=labels|descriptions|claims|sitelinks/urls.
  • You can specify a sitefilter for the wiki site links you want; in this scenario we only want the Wikipedia page (if any) on the wiki for the same language.

Let's ask for this information about Les Misérables in a less popular language, Azerbaijani, to see how languagefallback and sitelinks/urls work. Example: Request information about entity Q180736

You can see from the response that as of October 2015, the label for Les Misérables is available in Azerbaijani ("Səfillər"), but the description "for-language": "az" falls back to the English description. And there is a wiki page for it on azwiki, az:Səfillər (roman).

Choosing sitelinks[edit]

The generated API help for action=wbgetentitites includes all the possible values for site and sitefilter (Wikimedia encompasses a lot of wikis!). Visit Special:SiteMatrix for a table listing Wikimedia wikis. The wiki names are pretty standardized except for some edge cases, so it's safe to wiki names from that table that exist and are not struck out (meaning closed) in sitefilter. If you want to e.g. ask for links to Wikiquote sites that may not exist yet, your code can also query the API module action=sitematrix (see its generated API help) and look through its response to dynamically build a list of relevant sites for sitefilter.

Parsing claims[edit]

The claims that give properties values are unavoidably complex: there can be more than one and they may disagree, they differ in rank, they are (ideally) backed up by references, they may be qualified (for example the date range in which a claim applies).

As a result, for each property value you want, you must walk through an array of claims for it. In this example, for "author" and "genre" of Les Misérables, you would expect the value of a statement about them to be another item in Wikidata (rather than a simple number or date). To get the IDs for the genre (P136), we are looking for (in the syntax for JSON elements used by jq):

.entities.Q180736.claims.P136[].mainsnak.datavalue.value."numeric-id"

In pseudocode, you would locate entities.Q180736.claims.P136 in the JSON response, then for each element in the array, you would check that its mainsnak.datavalue.value['entity-type'] exists and its value is "item", then you can safely access the numeric-id. The result of all this is a set of numbers of items, in this case 8261 and 192239.

You then need to request the labels of items Q8261 and Q192239 in the user's language, making a similar action=wbgetentities request but perhaps only requesting props=labels. For performance, you (obviously) should batch up all these follow-on queries, and build a local cache of item labels, so that you don't repeatedly query the Wikidata API to find that Q8261 is a "novel" ("Roman" in Azerbaijani).

Getting the publication date (P577) is a little simpler since the value of a statement about it is a simple date rather than another item. In jq's syntax it is:

.entities.Q180736.claims.P577[].mainsnak.datavalue.value.time

In pseudocode, you would locate entities.Q180736.claims.P577 in the JSON response, then for each element in the array, you would check that its mainsnak.datavalue exists and its "type" is "time", then you can use its value. The result of all this is a set of times, in this case one value "+1862-01-01T00:00:00Z". A time value's format resembles ISO 8601; the Wikibase DataModel page gives the details, including datavalue.value.precision which in this case is 9, indicating this publication date is accurate to the year (1862).

action=wbgetclaims for claims alone[edit]

If all you want is the claims of an item (wbgetentities' props=claims), you can instead invoke the API module action=wbgetclaims. It returns similar information. Example: Get claims about entity Q180736

Alternatives[edit]

You can associate an entity in your application with a page in a particular language's Wikipedia. Then as Page info in search results shows, you can query for and display useful information from that article such as a lead image thumbnail, opening text, and description (action=query&prop=pageimages|pageterms|extracts, try it for Les Misérables). A downside of this is page titles change so you may have to deal with redirects. Another is it's not multilingual: you have to know the page's title in other wikis (for example, the article in Greek Wikipedia about Les Misérables is Οι Άθλιοι), or track down a "sitelink" to the page in another language. Hence that article talks about page info in the context of search – if your user is searching for articles from a wiki, you know her language and the wiki to query.

Over time, this common information about articles in individual wikis is moving to Wikidata:

  • prop=pageterms is already returning the description of the page from Wikidata
  • The "Wikivoyage banner" image for places on Wikivoyage sites is now a property (P948) of the Wikidata item for that place (example of San Francisco).
  • The sitelinks to the same article in other languages and in other kinds of wikis are all maintained in Wikidata (example of San Francisco).

So querying Wikidata for information is aligned with future developments in organizing Wikimedia wikis.

See also[edit]

  • qLabel is a JavaScript library to help create multilingual web sites. You simply mark up text elements with 'Q' IDs and the library retrieves their Wikidata labels in the user's language and replaces the text.
  • Reasonator and Autodesc are tools that create machine-generated articles and short descriptions about Wikidata items.
  • Wikidata.org maintains a growing list of external tools.
  • Consult or reuse the code in existing tools to parse claims.
    • For example, inventaire.io uses wikidata-sdk to query wikidata and handle its responses.