User:Siriuswapnil/Blogs/Guide-to-SPARQL-queries-and-GeoJSON

= Working with Wikipedia content in your own projects — A Guide to using SPARQL Queries =

Introduction, why we want data
We have all used Wikipedia countless times during work. Often we are amused at the way Wikipedia stores its data, and more importantly forms connections between different types of content that can be retrieved very easily. With all the content Wikipedia has, it would be an obvious choice to use its data in our projects. So, think of preparing a database of all female scientists born between 1815–1915, along with their birth place and images(if possible). Throughout this document, we will be using this example to understand how we can retrieve this data, and more importantly, in a useful form.

How Wikipedia structures its content — Wikidata
All the content that’s present on Wikipedia, is very carefully indexed on a platform known as Wikidata. As the website states, Wikidata is a free and open knowledge system that forms the base for all Wikipedia content. It is the central storage for many Wikimedia projects (e.g. Wikipeda, Wikivoyage, Wikisource etc. ) as well as provides a medium to interact with the huge data store. True to the fundamentals, Wikidata is free and openly collaborative, which means anybody with the right authorization, can access and modify data according to one’s requirements. With Wikidata, there is no need for individual wikis to hold networked content (which is cross referenced in multiple wikis) separately as they can be dynamically retrieved from the central database. This makes the access to very easy to retrieve and modify and the process simple and fast.

Querying through Wikidata
Querying for content in Wikidata is similar to querying in any SQL based database. We declare the information that we need, e.g. location, image, type of data etc., and we can ask for it. We can also use conditionals and filters to query specific type of data. To perform such queries, a specific type of query language is used, known as SPARQL in conjunction with the Wikidata Query Service.

The tools we need : WQS and SPARQL
SPARQL is a query language,capable to retrieve and use data available in RDF ( Resource Description Format). In simple terms, it enables us to retrieve content which is highly structured and contains metadata, in a useful manner. The Wikidata Query Service, utilizes the SPARQL language, to acquire information from the database in a user friendly manner. The query platform is available here looks like this.

Sample Query syntax in SPARQL
Let’s see the basic syntax to perform a query and retrieve some information from the Wiki database.
 * 1) Find the birth place of all female physicists

SELECT ?item ?itemLabel ?place ?coord WHERE {

?item wdt:P31 wd:Q5.

?item wdt:P21 wd:Q6581072.

?item wdt:P106 wd:Q901.

?item wdt:P19 ?place.

?place wdt:P625 ?coord.

SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en,fr,de,es,it,no" }

} Let’s understand the syntax one by one: # content : This line is a comment. Not anything mandatory, but it highlights the purpose of the query. In general use cases, the ‘#’ symbol is also used to define the default View (more discussed later). SELECT ?item ?itemLabel ?place ?coord : This statement defines the column headers or the type of data that we need out of the database. Also ? , defines variable names in SPARQL. Hence, ?item returns the item ID, ?itemLabel, returns the associated label(name)associated with that item, ?place is the name of the place, and ?coord is the actual coordinate location,that we can put on a map, if our project demands. WHERE { … } This statement defines the conditions that will define our query. In other words, these contain our actual queries. ?item wdt:P31 wd:Q5. This is an important syntax. wdt is the predicate. This defines the class or type of information we want to retrieve. So for querying a scientist, one’s occupation would be scientist. So, wdt defines the property of a data. Here, the property is occupation. But we don’t see property written. wdt is set to P31. Now, ‘occupation’ and other human readable words cannot be recognized, so we have codes for them. For example, P31 is the code for ‘instance of’, and wd: gives the value of that property. Here, the first line, states that the object must be from human subclass first of all. Hence, its instance(P31), must be human(Q5). The subsequent statements further filter down the properties. And lastly, ?place wdt:P625 ?coord. This gives us the coordinates in the ?coord variable which we can display on a map.

Our required Queries
This project required us to have few SPARQL queries to plot coordinates on a map. So the following three SPARQL queries were finalized :


 * 1) Covid Hotspot Districts in Delhi
 * 2) Hospitals within 50km of Connaught Place,New Delhi
 * 3) Metro Stations in Delhi with a Daily Patronage of more than 10000

Covid Hotspot Districts in Delhi

 * 1) covid hotspots delhi

SELECT DISTINCT ?title ?marker_size ?marker_color ?type ?marker_symbol ?long ?lat WHERE {

?place wdt:P361 wd:Q84055514.

?place wdt:P17 wd:Q668.

?place wdt:P31 wd:Q1149652.

?place wdt:P625 ?location.

# Get the English Wikidata label of the dpl

?place rdfs:label ?placeLabel.

FILTER (lang(?placeLabel) = 'en')

BIND(CONCAT(' ',?placeLabel,' ') AS ?title)

# Set 4 default values

BIND("small" AS ?marker_size)

BIND("#FFC0CB" AS ?marker_color)

BIND("library" AS ?marker_symbol)

BIND("Point" AS ?type)

BIND(STRBEFORE(STRAFTER(STR(?location), ' '), ')') AS ?lat)

BIND(STRBEFORE(STRAFTER(STR(?location), 'Point('), ' ') AS ?long)

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

}

Hospitals within 50km of Connaught Place,New Delhi

 * 1) Hospitals within 50km of Connaught Place,New Delhi


 * 1) added before 2016-10

SELECT ?place ?placeLabel ?location ?dist ?lat ?long ?marker_size ?marker_color ?type ?marker_symbol

WHERE

{

wd:Q2341950 wdt:P625 ?loc.

SERVICE wikibase:around {

?place wdt:P625 ?location.

bd:serviceParam wikibase:center ?loc.

bd:serviceParam wikibase:radius "50".

}

OPTIONAL { ?place wdt:P18 ?image. }

?place wdt:P31 wd:Q16917.

SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

BIND(geof:distance(?loc, ?location) as ?dist)

# Set 4 default values

BIND("small" AS ?marker_size)

BIND("#FFC0CB" AS ?marker_color)

BIND("library" AS ?marker_symbol)

BIND("Point" AS ?type)

BIND(STRBEFORE(STRAFTER(STR(?location), ' '), ')') AS ?lat)

BIND(STRBEFORE(STRAFTER(STR(?location), 'Point('), ' ') AS ?long)

} ORDER BY ?dist

Metro Stations in Delhi with a Daily Patronage of more than 10000

 * 1) Metro Stations in Delhi with a Daily Patronage of more than 10000


 * 1) defaultView:Map

SELECT ?item ?itemLabel ?location ?image ?marker_size ?marker_color ?type ?marker_symbol ?lat ?long

WHERE {

?item wdt:P131 wd:Q1353.

?item wdt:P31 wd:Q928830.

?item wdt:P1373 ?num

FILTER(?num > 10000)

?item wdt:P625 ?location.

OPTIONAL { ?item wdt:P18 ?image. }

BIND(STRBEFORE(STRAFTER(STR(?location), ' '), ')') AS ?lat)

BIND(STRBEFORE(STRAFTER(STR(?location), 'Point('), ' ') AS ?long)

SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

}

Using the queries — The GeoJSON format
Once, we have the SPARQL queries, the next step is to save the result in a usable format. For manipulating and visualization purposes, GeoJSON is a suitable file format. GeoJSON is an open format based on JSON designed for representing geographical structures. Now, we cannot get the query directly in GeoJSON, but WQS definitely provides us with various formats, including JSON and tsv(tab-spaced values). For our tsv seemed suitable and also was parsed well by geojson.io, which is a platform to edit and display geojson files. Hence, uploading the .tsv results file onto geojson.io gave the file in a downloadable .geojson file format. This geojson file can now be used in our projects through multiple different ways.

Final Notes
Once we have the geojson file, it is fairly simple to display the properties of the coordinate onto a map library like OpenLayers. We could use it through AJAX calls, reading multiple files through PHP, or even simply include it as a URL to the GEOJSON function provided by OpenLayers.

Hence, this way, using a few queries, we can get the required data out of Wikidata and use it in Wiki-based or even off-Wiki projects easily.