Extension:SparqlExtension

Note: ''Semantic MediaWiki (SMW) 1.6 provides some of the functionality of this extension, specifically that of synchronizing RDF data from Semantic MediaWiki with a SPARQL endpoint. However, it does not (yet) implement other functions of the SparqlExtension. Specifically, SMW 1.6 currently does not have the ability to query external SPARQL endpoints, and to embed SPARQL queries on wiki pages and visualize their results. See here for a more detailed discussion.''

This extension allows one to integrate Semantic MediaWiki with a SPARQL endpoint via a web-service. While this has been set up and tested to work with Joseki and Jena TDB, the configuration is generic enough to make it work with any other standard-compliant SPARQL endpoint. We are especially interested in hearing your experiences using the extension with other SPARQL endpoints.

The Semantic MediaWiki Conference in September 2010 marked the start of an initiative to incorporate some of this extension's functionality into the core of Semantic MediaWiki. So while we still welcome feedback, we highly encourage you to also participate in the SMW community discussion and work on better integrating RDF, SPARQL, and SPARUL support into SMW Core.

This extension is an open-source alternative to the triplestore connector in the Extension:Halo Extension, and since the initial release of this extension, several other open-source triplestore connectors have been developed as shown in the table below.

Comparison of Semantic MediaWiki triplestore connectors
There are (already) a few extensions that offer the possibility to connect the SMW to a rdf triplestore. The connectors differ significantly in their functionality and scalability. The goals and philosophies of their creators also differ. Below is a basic comparison table, however you are strongly suggested to look into the respective extension pages to select the extension that suits your needs.

Here's the basic advice:
 * Use Halo (the professional version) if you want corporate support
 * Use SparqlExtension if you want to use data from external endpoints and expose your data to the rest of the world (linked data)
 * Use RDFIO if you are not concerned about scalability (PHP only) or cannot run a separate Java triple-store

Basic Use

 * Querying the properties of the current page:

Where the function sparqlencode encodes the current page name ( variable) into the sparql-suitable format. The function is equivalent to except that it leaves the forward slashes un-encoded, which is required for querying sub-pages.


 * Dealing with sub-pages as recommended in Linked Data Patterns:

Here instead of using prefixes we are forced to use BASE keyword because the prefixed local part cannot contain any forward slashes (or other illegal characters - need reference to SPARQL docs here!!!). Using the BASE keyword allows to rephrase the IRI, so the following are equivalent:  article:France  SparqlExtension takes care of setting the correct prefixes and BASE keyword (equivalent to $smwgNamespace). The following prefixes are default ($smwgNamespace='http://YOURHOST/wiki/';): BASE  PREFIX article:  PREFIX a:  PREFIX category:  PREFIX cat:  PREFIX property:  PREFIX prop:  PREFIX rdf: 

You can try the following queries on the Special:SparqlExtension page.

So that the following query would return all pages that belong to a Cat category:

...the following query would return all pages that have a EnjoysMice property:

...the following query would all properties of page Cat:

...which is equivalent to:

Features

 * Synchronizes semantic data with the triple store and exposes it via Joseki endpoint.


 * Embed Sparql query output into a wiki page using a parser extension #sparql.


 * Multiple output formats: table, maps, graph, template, charts

Template
The following query:

...would feed the following template:
 * Article belongs to a category : called

Maps querying DBpedia (requires Extension:Maps)
Maps output format requires the query to have a "point" variable of type http://www.georss.org/georss/point (or a string with lat and lon separated by a space or comma). If you have a variable named "title", then this will show up when you click the markers on the map.

Inline
Best used to return a single value. If multiple values are return they are separated by comma.

is the biggest power plant in China. ...would produce: Three Gorges Powerplant is the biggest power plant in China.

Charts
Explore the chart gallery at http://enipedia.tudelft.nl/wiki/User:Alfredas/Charts

Currently the extension supports the following charts:
 * GeoMap
 * PieChart
 * ScatterChart
 * OrgChart
 * AreaChart
 * BarChart
 * ColumnChart
 * LineChart
 * TreeMap

Configuring and styling Google charts
The extension now allows to configure Google Charts using the same parameters that are specified in the Chart API. You specify the parameters in the mediawiki fashion:

Google Visualization DataSource and embedding charts
The charts are based on Google Visualization API and the SparqlExtension wraps the endpoint to implement a DataSource.

This means that the SparqlExtension produced charts can be used locally on the wiki and also be embedded into other websites like here.

Special:SparqlExtension
The extension now has a special page Special:SparqlExtension, that features a form and simulates the behavior of an endpoint. This allows for easy use in a federated query in the fashion: select * where { service  { ?x ?y ?z } service  { ?x ?y ?z } service  { ?x ?y ?z } } See here for a known issue involving aggregates and federated queries.

The special page also implements Google Visualization DataSource (select output format GOOGLE-VIZ) and allows creating charts in the wiki as well as externally.

Prerequisites

 * Requires Extension:Semantic_MediaWiki
 * Requires Joseki
 * Requires php5-xsl

Installation
Alternatively you can check out the latest version: svn co https://svn.eeni.tbm.tudelft.nl/SparqlExtension/branches/0.7/ SparqlExtension
 * Install SMW and Joseki first.
 * Configure Joseki to accept named graphs and "SELECT FROM" statements (see example joseki config here)
 * New: Install php5-xsl if you do not already have it.
 * Optional: it seems that Joseki allows non-restricted access to the SPARQL update service allowing strangers to write to your store or even delete data. See User:Alfredas/JosekiSecurity for a security "patch".
 * Download zip and extract it into your_mediawiki_path/extensions/
 * Add the following to the end of your LocalSettings.php and change the MYHOST to the name of your server.
 * Note: You may want to set the update_url to localhost for security reasons so that remote updates of the data are disabled.

$smwgNamespace = 'http://MYHOST/wiki/'; require_once("$IP/extensions/SparqlExtension/SparqlExtension.php"); $smwgDefaultStore = "JosekiStore"; $sparqlEndpointConfiguration = array(       "service_url" => "http://MYHOST/joseki/sparql", // wherever the endpoint is -- could be http://dbpedia.org/sparql        "update_url" => "http://MYHOST/joseki/update/service", // wherever the endpoint is         // change these parameters only if you are going to use the extension with a non-standard endpoint         "query_parameter" => "query", // the query parameter used by the endpoint - usually "query"        "output_type_parameter" => "output", // the output type parameter used by the endpoint - usually "output"        "default_type" => "csv" // the default type of output from an endpoint (xml, csv and json supported) );

You can verify the service and update URLs for Joseki either through a web browser or wget. You should get the error messages below, which indicate that you found the correct URL.
 * http://MYHOST/joseki/sparql - "No query string"
 * http://localhost/joseki/update/service - "SPARQL:/Update request received via GET - must use POST"

Proxy configuration
If your server is behind a proxy, the following allows to configure a proxy (used for fetching data from external endpoints): $sparqlProxyIP = "xxx.xxx.xxx.xxx"; $sparqlProxyPort = 3128;


 * You can import all your current SMW data using a utility script: your_mediawiki_path/extensions/SparqlExtension/importPagesIntoEndpoint.php

Version

 * 0.7 - Sept. 10, 2010 - Major rewrite of code. Added Special:SparqlExtension and 9 new output formats, mostly google charts.
 * 0.6.1 - Sept. 3, 2010 - Added support for Semantic Internal Objects
 * 0.6 - June 9, 2010 - Rewrite to fix security issues.
 * 0.1 - 0.5 - General bug fixes

Known issues & query debugging advice
PREFIX xsd:  SELECT (xsd:double(?GDPperCapita) as ?num) WHERE { select (xsd:string(fn:substring(str(?onlineDate), 1, 4)) as ?onlineYear) (fn:round(sum(?capacity)) as ?totalCapacity) where { select ?x where { ?x prop:Likes a:Dogs. OPTIONAL{?x  ?z }. filter(!bound(?z)). }
 * Error: magic word "twinkle" not found - See here for a solution. This seems to happen when Halo is installed as well.
 * N.B. Even though TDB allows multiple read single write (MRSW) policy only, Joseki allows simultaneous writes to TDB (or so it seems). Consider this a Joseki bug that hopefully will be fixed in the nearest future.
 * The Maps Extension may cause problems if you use it to generate coordinates for property values, specifically if it renders coordinates using the decimal-minute-second notation. To fix this, you should add the following line to LocalSettings.php, after the inclusion of Maps:
 * If you still have problems, make sure that the latest version of Validator is installed and called before Maps in LocalSettings.php.
 * Federated queries where aggregation is performed on results from remote endpoints may not work as expected, depending on the endpoint software you are using. The issue is that the aggregation syntax may not be pushed to the remote endpoint, meaning that any aggregation is performed at the local endpoint over a possibly truncated set of results.  A clear sign of this would be if the returned count is a round number such as 1000.  Another easy way to check for this is to perform a count over the number of results available via your endpoint, and those found directly from the remote endpoint (i.e. through a web interface like http://dbpedia.org/sparql provides).  See this thread for a discussion of the issue on the jena-dev list.
 * Very long SPARQL queries used with Google Visualizations may run into problems due to URL length limitations of certain browsers (such as Internet Explorer with a 2083 character limit). If this happens, you will likely see an error of: google.visualization.Query: [object Error]
 * Queries performed on wikis where anonymous viewing is disabled may run into problems. This is because queries are routed through the Special:SparqlExtension page in order to facilitate output formats such as those used for the Google Visualizations.  The code being executed is seen as an anonymous viewer by the wiki, and is therefore blocked.
 * Extension:GraphViz can only display a single graph per page. If you try to display more than one, you will see duplicates of one of the graphs.  This can be fixed using similar code to what we have here for Google Visualizations, where charts are given unique identifiers so that multiple charts can be displayed per page.
 * The Google Visualizations expect numbers of the type xsd:double. You may run into problems if the numbers are of a different format like xsd:long.  This needs to be fixed in the extension code, but until then a workaround is shown below where a variable is explicitly cast to a xsd:double:
 * The Google Visualizations expect numbers of the type xsd:double. You may run into problems if the numbers are of a different format like xsd:long.  This needs to be fixed in the extension code, but until then a workaround is shown below where a variable is explicitly cast to a xsd:double:
 * For Google Visualizations such as column charts, it seems that labels in a numerical form (i.e. years, etc.) need to be explicitly cast as a string, otherwise they will be interpreted as data for plotting. See here for a working example, or below for the code that works around this.
 * Literal values of 0 may be exported to the triplestore as blank nodes (if this occurs you will see values of "_1", "_2", "_3", etc). See here for more description along with a simple fix to for the Semantic MediaWiki code causing this.
 * Both pages and their redirect links will show up in the queries. This may lead to issues like double counting among other problems.  An example of how to exclude redirects is shown below:
 * Filter statements in SPARQL that use the pipe symbol for logical OR statements (the || symbol) will not work with the MediaWiki parser. To fix this, you need to create a template, such as the Template:! which contains the pipe character, and then include this template within your queries.
 * Will not work: filter(?x = 3 || ?x = 4)
 * Works: filter(?x = 3 || ?x = 4)