Talk:Wikibase/Indexing

Gremlin query examples
--Smalyshev (WMF) (talk) 22:57, 29 November 2014 (UTC)
 * Top 10 countries by population:
 * People born in a city with more than 100k inhabitants:
 * Largest 10 cities in Europe that have a female mayor:

Countries by population
g.listOf('Q6256').as('c').groupBy{it}{it.claimValues('P1082').preferred.latest}.cap .scatter.filter{it.value.size>0}.transform{it.value = it.value.P1082value.collect{it?it as int:0}.max; it} .order{it.b.value <=> it.a.value}.transform{[it.key.wikibaseId, it.key.labelEn, it.value]}

List of occupations
tree[28640][][31,279] g.wd('Q28640').treeIn('P279').instances.dedup.namesList

List of potential nationalities
Warning: big query, do not run unbounded. tree[350][][17,131] AND (claim[31:6256] OR claim[31:15634554]) claim[31:5] AND claim[19] AND between[569,1750]

g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value != 'somevalue' && it.P569value > Date.parse('yyyy', '1750')} .back('humans').claimVertices('P19').toCountry.as('countries').select(['humans', 'countries']){it.labelEn}{it.labelEn}

People born before 1880 having no date of death
Warning: big query, do not run unbounded. claim[31:5] AND noclaim[570] AND between[569,0,1880] g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value && it.P569value < Date.parse('yyyy', '1880')} .back('humans').filter{!it.out('P570').hasNext}[0..10]

instance of human; occupation writer; not occupation author
claim[31:5] AND claim[106:36180] AND noclaim[106:482980]

g.wd('Q36180').in('P106').has('P31link', CONTAINS, 'Q5').filter{!it.out('P106').has('wikibaseId', 'Q482980').hasNext}[0..100]

Places in the U.S. that are named after Francis of Assisi
(TREE[30][150][17,131] AND CLAIM[138:676555])

g.wd('Q676555').in('P138').filter{it.toCountry.has('wikibaseId', 'Q30').hasNext}.namesList

All items in the taxonomy of the Komodo dragon
TREE[4504][171,273,75,76,77,70,71,74,89]

g.wd('Q4504').as('loop').out('P171').loop('loop'){true}{true}.dedup.namesList

All animals on Wikidata
TREE[729][][171,273,75,76,77,70,71,74,89] g.wd('Q729').as('loop').in('P171').loop('loop'){it.object.in('P171').hasNext}{true}.dedup.namesList

Bridges in Germany
(CLAIM[31:(TREE[12280][][279])] AND TREE[183][150][17,131])

g.wd('Q12280').treeIn('P279').in('P31').as('b').toCountry.has('wikibaseId', 'Q183').back('b').namesList

Bridges across the Danube
(CLAIM[31:(TREE[12280][][279])] AND CLAIM[177:1653])

g.wd('Q12280').treeIn('P279').in('P31').as('b').out('P177').has('wikibaseId', 'Q1653').back('b').namesList

Items with VIAF string "64192849"
STRING[214:'64192849']

People who were born 1924-1925, and died 2012-1013
(BETWEEN[569,+00000001924-00-00T00:00:00Z,+00000001926-00-00T00:00:00Z] AND BETWEEN[570,+00000002012-00-00T00:00:00Z,+00000002014-00-00T00:00:00Z])"}

Items 15km around the center of Cambridge, UK
AROUND[625,52.205,0.119,15]

Reconciliation from OpenRefine
Hey guys. I might be on a different page from what you have in mind for this feature, but it would be great if the service would be able to act as a reconciliation service for OpenRefine.

For those who haven't worked with OpenRefine, it's a web tool for data cleaning - the way I see it is a spreadsheet application with all the right buttons and features for data analysis. Reconciliation is a semi-automated process of matching text names to database IDs (keys) and it currently works out of the box with Freebase. This is most of the times enough for English-language data, as Freebase extracts some of its data from Wikipedia, but I found that it doesn't work so well with other languages. As Wikidata is much more multilingual and (hopefully) much more dynamic than Freebase, it would really help a lot if OpenRefine users could connect directly to Wikidata.

Some implementation notes:
 * most of what reconciliation does can do can be done by calling the Wikidata API and parsing the return value with some scripts. Of course, this is not as straightforward for non-programmers.
 * these is an OpenRefine extension that allows reconciliation against SPARQL endpoints and rdf dumps, so this might a quick way to have this functionality.

Looking forward to your input on this request.--Strainu (talk) 20:48, 12 December 2014 (UTC)


 * It would be interesting to know what kind of requests such tool would need from Wikidata API. --Smalyshev (WMF) (talk) 07:15, 17 December 2014 (UTC)
 * I'm not sure I understand the question: are you refering to the API between OpenRefine and Wikidata or to the information that one could extract from Wikidata. Could you please elaborate?--Strainu (talk) 22:53, 17 December 2014 (UTC)
 * More what kind of queries OpenRefine needs to run - i.e. what types of queries, how big the result sets would be, etc. --Smalyshev (WMF) (talk) 05:24, 23 December 2014 (UTC)