User:Siriuswapnil/Blogs/Guide-to-SPARQL-queries-and-GeoJSON

From mediawiki.org


A Guide to using SPARQL Queries — Playing with Wikidata[edit]


Introduction, why we want data[edit]

We have all used Wikipedia countless times during work. Often we are amused at the way Wikipedia stores its data, and more importantly forms connections between different types of content that can be retrieved very easily. With all the content Wikipedia has, it would be an obvious choice to use its data in our projects. So, think of preparing a database of all female scientists born between 1815–1915, along with their birth place and images(if possible) . Throughout this document, we will be using this example to understand how we can retrieve this data , and more importantly, in a useful form.

How Wikipedia structures its content — Wikidata[edit]

All the content that’s present on Wikipedia, is very carefully indexed on a platform known as Wikidata . As the website states, Wikidata is a free and open knowledge system that forms the base for all Wikipedia content. It is the central storage for many Wikimedia projects (e.g. Wikipeda, Wikivoyage, Wikisource etc. ) as well as provides a medium to interact with the huge data store. True to the fundamentals, Wikidata is free and openly collaborative, which means anybody with the right authorization, can access and modify data according to one’s requirements. With Wikidata, there is no need for individual wikis to hold networked content (which is cross referenced in multiple wikis) separately as they can be dynamically retrieved from the central database. This makes the access to very easy to retrieve and modify and the process simple and fast.

Querying through Wikidata[edit]

Querying for content in Wikidata is similar to querying in any SQL based database. We declare the information that we need, e.g. location, image, type of data etc., and we can ask for it. We can also use conditionals and filters to query specific type of data. To perform such queries, a specific type of query language is used, known as SPARQL in conjunction with the Wikidata Query Service.

The tools we need : WQS and SPARQL[edit]

SPARQL is a query language,capable to retrieve and use data available in RDF ( Resource Description Format). In simple terms, it enables us to retrieve content which is highly structured and contains metadata, in a useful manner. The Wikidata Query Service, utilizes the SPARQL language, to acquire information from the database in a user friendly manner. The query platform is available here looks like this. The Wikidata Query Service can be accessed at https://query.wikidata.org/.

Sample Query syntax in SPARQL[edit]

Let’s see the basic syntax to perform a query and retrieve some information from the Wikidata.

Working Link : https://w.wiki/V6a

#Find the birth place of all female physicists
#defaultView:Map
SELECT ?item ?itemLabel ?place ?coord WHERE {
    ?item wdt:P31 wd:Q5 .
    ?item wdt:P21 wd:Q6581072 .
    ?item wdt:P106 wd:Q901 .
    ?item wdt:P19 ?place.
    ?place wdt:P625 ?coord.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "nl,en,fr,de,es,it,no" }
}

Let’s understand the syntax one by one:

# content : 

This line is a comment. Not anything mandatory, but it highlights the purpose of the query. In general use cases, the ‘#’ symbol is also used to define the default View (more discussed later).

SELECT ?item ?itemLabel ?place ?coord : 

This statement defines the column headers or the type of data that we need out of the database. Also ?<name> , defines variable names in SPARQL. Hence, ?item returns the item ID, ?itemLabel, returns the associated label(name)associated with that item, ?place is the name of the place, and ?coord is the actual coordinate location,that we can put on a map, if our project demands.

WHERE { … }

This statement defines the conditions that will define our query. In other words, these contain our actual queries.

?item wdt:P31 wd:Q5 .

This is an important syntax. wdt is the predicate. This defines the class or type of information we want to retrieve. So for querying a scientist, one’s occupation would be scientist. So, wdt defines the property of a data. Here, the property is occupation. But we don’t see property written. wdt is set to P31. Now, ‘occupation’ and other human readable words cannot be recognized, so we have codes for them. For example, P31 is the code for ‘instance of’, and wd: gives the value of that property. Here, the first line, states that the object must be from human subclass first of all. Hence, its instance(P31), must be human(Q5). The subsequent statements further filter down the properties. And lastly,

?place wdt:P625 ?coord.

This gives us the coordinates in the ?coord variable which we can display on a map.

Our required Queries[edit]

This project required us to have few SPARQL queries to plot coordinates on a map. So the following three SPARQL queries were finalized :

  1. Covid Hotspot Districts in Delhi
  2. Hospitals within 50km of Connaught Place,New Delhi
  3. Metro Stations in Delhi with a Daily Patronage of more than 10000


Covid Hotspot Districts in Delhi[edit]

Link : https://w.wiki/V6c

#covid hotspots delhi
#defaultView:Map
SELECT DISTINCT ?title ?marker_size ?location ?marker_color ?type ?marker_symbol ?long ?lat WHERE  {
  ?place wdt:P361 wd:Q84055514 .
  ?place wdt:P17 wd:Q668 .
  ?place wdt:P31 wd:Q1149652 .
 ?place wdt:P625 ?location .
  # Get the English Wikidata label of the dpl
  ?place rdfs:label ?placeLabel.  
  FILTER (lang(?placeLabel) = 'en') 
  BIND(CONCAT('[[d:',STRAFTER(STR(?place),"http://www.wikidata.org/entity/"),'|',?placeLabel,']]') AS ?title)  
  # Set 4 default values  
  BIND("small" AS ?marker_size)
  BIND("#FFC0CB" AS ?marker_color)
  BIND("library" AS ?marker_symbol)
  BIND("Point" AS ?type)    
  BIND(STRBEFORE(STRAFTER(STR(?location), ' '), ')') AS ?lat)
  BIND(STRBEFORE(STRAFTER(STR(?location), 'Point('), ' ') AS ?long)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }     
}




Hospitals within 50km of Connaught Place,New Delhi[edit]

Link : https://w.wiki/V6d

#Hospitals within 50km of Connaught Place,New Delhi
#added before 2016-10
#defaultView:Map
 SELECT ?place ?placeLabel ?location ?dist ?lat ?long ?marker_size ?marker_color ?type ?marker_symbol
WHERE
{
  wd:Q2341950 wdt:P625 ?loc .
  SERVICE wikibase:around {
      ?place wdt:P625 ?location .
      bd:serviceParam wikibase:center ?loc .
      bd:serviceParam wikibase:radius "50" .
  }
  OPTIONAL { ?place wdt:P18 ?image . }
  ?place wdt:P31 wd:Q16917 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
  BIND(geof:distance(?loc, ?location) as ?dist)
   # Set 4 default values  
  BIND("small" AS ?marker_size)
  BIND("#FFC0CB" AS ?marker_color)
  BIND("library" AS ?marker_symbol)
  BIND("Point" AS ?type) 
  BIND(STRBEFORE(STRAFTER(STR(?location), ' '), ')') AS ?lat)
  BIND(STRBEFORE(STRAFTER(STR(?location), 'Point('), ' ') AS ?long)
} ORDER BY ?dist




Metro Stations in Delhi with a Daily Patronage of more than 10000[edit]

Link : https://w.wiki/V6e

#Metro Stations in Delhi with a Daily Patronage of more than 10000
#defaultView:Map
SELECT ?item ?itemLabel ?location ?image ?marker_size ?marker_color ?type ?marker_symbol ?lat ?long
WHERE {
  ?item wdt:P131 wd:Q1353 .
  ?item wdt:P31 wd:Q928830 . 
  ?item wdt:P1373 ?num
  FILTER(?num > 10000)
  ?item wdt:P625 ?location . 
  OPTIONAL { ?item wdt:P18 ?image . }
  BIND(STRBEFORE(STRAFTER(STR(?location), ' '), ')') AS ?lat)
  BIND(STRBEFORE(STRAFTER(STR(?location), 'Point('), ' ') AS ?long)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}



Using the queries — The GeoJSON format[edit]

Once, we have the SPARQL queries, the next step is to save the result in a usable format.Once we click on Run, we see the results as given below:


For manipulating and visualization purposes, we need to export this result to a usable format. For queries regarding georgraphical data, GeoJSON is an open format based on JSON designed for representing geographical structures. Now, we cannot get the query directly in GeoJSON, but WQS definitely provides us with various formats, including JSON and tsv(tab-spaced values). For our purpose, tsv seemed suitable. So to get the results in TSV format, we will export the results to TSV like the below image:



The TSV format is useful since it is parsed well by geojson.io(for conversion to a GeoJSON file), which is a platform to edit and display geojson files. Let us now, open the file in on geojson.io Initially, it looks something like this:

File:Geojson-webpage.png
Geojson-webpage


To obtain the GeoJSON equivalent, upload the TSV file we obtained earlier, using the OPEN menu shown above. On doing so, the map will be populated automatically with the points obtained through the query, we performed earlier. This is how it should look :

File:Geojson-on-map.png
Geojson-on-map


Finally, go to the SAVE dropdown, and click on GeoJSON to download a GeoJSON copy of our Wikidata query. We are now ready to display the queries on the map.

Mapping the SPARQL Queries[edit]

Up until now, we have a GeoJSON file that gives us the properties and geometries of selected coordinates queried from Wikidata. We would now want to use this GeoJSON to really point these on an actual map. For this we will use OpenLayers, which is a Javascript client-side library for displaying map data in web browsers. We will create a simple script to render the maps on our browsers, with map tiles fetched from the OpenMapTiles servers. And we will use OpenLayers’ built-in functions, to plot our GeoJSON file on the map.

Generating a map from OpenLayers[edit]

To generate a map, we will first need to setup the environment to start Openlayers. A detailed guide is given at https://openlayers.org/en/latest/doc/tutorials/bundle.htm.

Note: We could also use a version of Openlayers provided by CDN. However, that doesn’t work in our case, since, we need to load an external GeoJSON file into the JS file, and that will cause a CORS issue while loading into the browser. Hence, we need to setup an offline server, here provided by Parcel, to view the GeoJSON files.

Once, the environment is setup, create an index.html file, that looks something like this:

<!doctype html>
<html lang="en">
  <head>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/openlayers/openlayers.github.io@master/en/v6.3.1/css/ol.css" type="text/css">
    <style>
      .map {
        height: 400px;
        width: 100%;
      }
    </style>
    <script src="https://cdn.jsdelivr.net/gh/openlayers/openlayers.github.io@master/en/v6.3.1/build/ol.js"></script>
    <title>OpenLayers example</title>
  </head>
  <body>
    <h2>My Map</h2>
    <div id="map" class="map"></div>
   </body>
</html>


As stated in the Openlayers website, to put a map on the webpage, we need three things:

  1. Include Openlayers — We include this in the src from the OL CDN
  2. <div> map container — The div container here, is the HTML element to show the map area as well as to control the various properties of the map being displayed.
  3. Javascript to create a simple map — We need a js file with at least the following to show the map on the HTML webpage.
const map = new Map({
  target: 'map',
  layers: [
    new TileLayer({
      source: new OSM()
    })
  ],
  view: new View({
    center: [0, 0],
    zoom: 0
  })
});

This main.js file contains the boilerplate to show the map.


Including GeoJSON on the map[edit]

To include GeoJSON on the map, we need to create a separate layer that will hold the coordinates. We can assign the layer to a variable and then, finally use the built-in method, map.addLayer(layerName) to add it to our target map. The code to include new layer looks something like this:

var newLayer = new VectorLayer({
      source: new VectorSource({
        format: new GeoJSON(),
        url: './data/countries.json'
      })
})

Once, we have our layer, we need to add the layer to the map variable, created earlier. This can be done by,

map.addLayer(newLayer);

Note: Layers are rendered on the webpage in the order in which they are added. Hence, we need to make sure, that the layer containing the GeoJSON file is added the last, i.e. after the other foundational layers are called.

Once, this is done, we will have the final output displayed as this:

The blue dots signify the coordinates asked in the Query
The blue dots signify the coordinates asked in the Query

Final Notes[edit]

Hence, this way we can query for any geolocation and put it on a webpage using the OpenLayers library. To summarize, once we have the geojson file, it is fairly simple to display the properties of the coordinate onto a map library like OpenLayers. We could use it through AJAX calls, reading multiple files through PHP, or even simply include it as a URL to the GEOJSON function provided by OpenLayers.

Hence, this way, using a few queries, we can get the required data out of Wikidata and use it in Wiki-based or even off-Wiki projects easily.