Talk:Wikibase/Indexing

Gremlin query examples
--Smalyshev (WMF) (talk) 22:57, 29 November 2014 (UTC)
 * Top 10 countries by population:
 * People born in a city with more than 100k inhabitants:
 * Largest 10 cities in Europe that have a female mayor:

Countries by population
g.listOf('Q6256').as('c').groupBy{it}{it.out('P1082').preferred.latest}.cap .scatter.filter{it.value.size>0}.transform{it.value = it.value.P1082value.collect{it as int}.max; it} .order{it.b.value <=> it.a.value}.transform{[it.key.wikibaseId, it.key.labelEn, it.value]}

List of occupations
g.wd('Q28640').as('loop').in('P279').loop('loop'){it.loops < 20}{true} .copySplit(_[0].transform({g.wd('Q28640').next}), _).exhaustMerge.instances.dedup.namesList

List of potential nationalities
Warning: big query, do not run unbounded.

g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value != 'somevalue' && it.P569value > Date.parse('yyyy', '1750')} .back('humans').claimVertices('P19').toCountry.as('countries').select(['humans', 'countries']){it.labelEn}{it.labelEn}

People born before 1880 having no date of death
Warning: big query, do not run unbounded.

g.listOf('Q5').as('humans').claimValues('P569').filter{it.P569value && it.P569value < Date.parse('yyyy', '1880')} .back('humans').filter{!it.out('P570').hasNext}[0..10]

Places in the U.S. that are named after Francis of Assisi
(TREE[30][150][17,131] AND CLAIM[138:676555])

g.wd('Q676555').in('P138').filter{it.toCountry.has('wikibaseId', 'Q30').hasNext}.namesList

All items in the taxonomy of the Komodo dragon
TREE[4504][171,273,75,76,77,70,71,74,89]

g.wd('Q4504').as('loop').out('P171').loop('loop'){true}{true}.dedup.namesList

All animals on Wikidata
TREE[729][][171,273,75,76,77,70,71,74,89] g.wd('Q729').as('loop').in('P171').loop('loop'){it.object.in('P171').hasNext}{true}.dedup.namesList

Reconciliation from OpenRefine
Hey guys. I might be on a different page from what you have in mind for this feature, but it would be great if the service would be able to act as a reconciliation service for OpenRefine.

For those who haven't worked with OpenRefine, it's a web tool for data cleaning - the way I see it is a spreadsheet application with all the right buttons and features for data analysis. Reconciliation is a semi-automated process of matching text names to database IDs (keys) and it currently works out of the box with Freebase. This is most of the times enough for English-language data, as Freebase extracts some of its data from Wikipedia, but I found that it doesn't work so well with other languages. As Wikidata is much more multilingual and (hopefully) much more dynamic than Freebase, it would really help a lot if OpenRefine users could connect directly to Wikidata.

Some implementation notes:
 * most of what reconciliation does can do can be done by calling the Wikidata API and parsing the return value with some scripts. Of course, this is not as straightforward for non-programmers.
 * these is an OpenRefine extension that allows reconciliation against SPARQL endpoints and rdf dumps, so this might a quick way to have this functionality.

Looking forward to your input on this request.--Strainu (talk) 20:48, 12 December 2014 (UTC)


 * It would be interesting to know what kind of requests such tool would need from Wikidata API. --Smalyshev (WMF) (talk) 07:15, 17 December 2014 (UTC)