Extension:SparqlExtension

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
SparqlExtension

Release status: beta

Implementation Hook, Parser function
Description Connects a semantic mediawiki to a SPARQL endpoint via Joseki web service.
Author(s) Alfredas Chmieliauskas, Chris Davis (alfredastalk)
Latest version 0.7 (2010-09-10 for
SMW ≤ 1.5.6)
License GPL
Download https://svn.eeni.tbm.tudelft.nl/SparqlExtension/branches/0.7/
Hooks used
ParserFirstCallInit

LanguageGetMagic

Translate the SparqlExtension extension if possible

Check usage and version matrix; code metrics

Note: Semantic MediaWiki (SMW) 1.6 provides some of the functionality of this extension, specifically that of synchronizing RDF data from Semantic MediaWiki with a SPARQL endpoint. However, it does not (yet) implement other functions of the SparqlExtension. Specifically, SMW 1.6 currently does not have the ability to query external SPARQL endpoints, and to embed SPARQL queries on wiki pages and visualize their results. See here for a more detailed discussion.

This extension allows one to integrate Semantic MediaWiki with a SPARQL endpoint via a web-service. While this has been set up and tested to work with Joseki and Jena TDB, the configuration is generic enough to make it work with any other standard-compliant SPARQL endpoint. We are especially interested in hearing your experiences using the extension with other SPARQL endpoints.

The Semantic MediaWiki Conference in September 2010 marked the start of an initiative to incorporate some of this extension's functionality into the core of Semantic MediaWiki. So while we still welcome feedback, we highly encourage you to also participate in the SMW community discussion and work on better integrating RDF, SPARQL, and SPARUL support into SMW Core.

This extension is an open-source alternative to the triplestore connector in the Extension:Halo Extension, and since the initial release of this extension, several other open-source triplestore connectors have been developed as shown in the table below.

Comparison of Semantic MediaWiki triplestore connectors[edit | edit source]

There are (already) a few extensions that offer the possibility to connect the SMW to a rdf triplestore. The connectors differ significantly in their functionality and scalability. The goals and philosophies of their creators also differ. Below is a basic comparison table, however you are strongly suggested to look into the respective extension pages to select the extension that suits your needs.

Extension/Connector Architecture Underlying Store Open Source Features
Embed query results in a wiki page Multiple query output formats Query external endpoints Expose wiki data via endpoint Supports multiple endpoint implementations Import triples (Update facts in wiki articles)
Halo Java Jena(free), [1](commercial) No Yes Yes No Yes Yes No
SparqlExtension Java/Web services Jena Yes Yes Yes Yes Yes Yes No
RDFIO PHP ARC2 (PHP) Yes No No No Yes No Yes
LinkedWiki C++/PHP 4store + arc2 with hacking sparql 1.1 Yes Yes Yes Yes Yes Yes No

Basic Use[edit | edit source]

  • Querying the properties of the current page:
{{#sparql:
select * where{
article:{{sparqlencode:{{PAGENAME}}}} ?p ?o .
} 
}}

Where the function sparqlencode encodes the current page name ({{PAGENAME}} variable) into the sparql-suitable format. The function is equivalent to {{anchorencode}} except that it leaves the forward slashes un-encoded, which is required for querying sub-pages.

{{#sparql:
select * where{
<{{sparqlencode:{{PAGENAME}}}}> ?p ?o .
} 
}}

Here instead of using prefixes we are forced to use BASE keyword because the prefixed local part cannot contain any forward slashes (or other illegal characters - need reference to SPARQL docs here!!!). Using the BASE keyword allows to rephrase the IRI, so the following are equivalent:

<http://host/wiki/France>
article:France
<France>

SparqlExtension takes care of setting the correct prefixes and BASE keyword (equivalent to $smwgNamespace). The following prefixes are default ($smwgNamespace='http://YOURHOST/wiki/';):

BASE <http://YOURHOST/wiki/>
PREFIX article: <http://YOURHOST/wiki/>
PREFIX a: <http://YOURHOST/wiki/>
PREFIX category: <http://YOURHOST/wiki/Category:>
PREFIX cat: <http://YOURHOST/wiki/Category:>
PREFIX property: <http://YOURHOST/wiki/Property:>
PREFIX prop: <http://YOURHOST/wiki/Property:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

You can try the following queries on the Special:SparqlExtension page.

So that the following query would return all pages that belong to a Cat category:

{{#sparql:
select ?page where {
?page rdf:type cat:Cat 
}
}}

...the following query would return all pages that have a EnjoysMice property:

{{#sparql:
select ?page where {
?page prop:EnjoysMice ?o 
}
}}

...the following query would all properties of page Cat:

{{#sparql:
select ?property ?value where {
a:Cat ?property ?value
}
}}

...which is equivalent to:

{{#sparql:
select ?property ?value where {
<Cat> ?property ?value
}
}}

Features[edit | edit source]

  • Synchronizes semantic data with the triple store and exposes it via Joseki endpoint.
  • Embed Sparql query output into a wiki page using a parser extension #sparql.
{{#sparql:
select * where { ?x ?y ?z } limit 10
}}
  • Multiple output formats: table, maps, graph, template, charts

Template[edit | edit source]

The following query:

{{#sparql: 
select * where {
   ?page rdf:type ?category .
   ?category rdfs:label ?label .
} limit 10
| format=template
| template=CategoryTemplate
| link=none
}}

...would feed the following template:

<includeonly>
* Article [[{{{page}}}]] belongs to a category [[:{{{category}}}]] called {{{label}}}
</includeonly>

Table with formatting[edit | edit source]

{{#sparql: 
select * where {
   ?page rdf:type ?category .
   ?category rdfs:label ?label .
} limit 10
| tablestyle=border-width:1px; border-spacing:0px; border-style:outset; border-color:black; border-collapse:collapse;
| rowstyle=padding:2px;
| oddrowstyle=background-color:Lavender
| evenrowstyle=background-color:white   
| headerstyle=background-color:CornflowerBlue; color: white
}}

Maps querying DBpedia (requires Extension:Maps)[edit | edit source]

Maps output format requires the query to have a "point" variable of type http://www.georss.org/georss/point (or a string with lat and lon separated by a space or comma). If you have a variable named "title", then this will show up when you click the markers on the map.

{{#sparql:
PREFIX prop: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX template: <http://dbpedia.org/resource/Template:>
SELECT ?title ?point WHERE {
?station prop:wikiPageUsesTemplate template:infobox_ns-station .
?station <http://www.georss.org/georss/point> ?point .
?station prop:station ?title .
}
|format=maps
}}

Graph (requires Extension:GraphViz)[edit | edit source]

{{#sparql: 
select * where {
   ?page rdf:type ?category .
   ?category rdfs:label ?label .
} limit 10
| format=graph
}}

Inline[edit | edit source]

Best used to return a single value. If multiple values are return they are separated by comma.

{{#sparql:
select ?plant where {
?plant rdf:type cat:Powerplant .
?plant prop:Energyoutput ?out .
?plant prop:Country a:China .
} order by desc(?out) limit 1
| format=inline
}}
is the biggest power plant in China.

...would produce:

Three Gorges Powerplant is the biggest power plant in China. 

Charts[edit | edit source]

Explore the chart gallery at http://enipedia.tudelft.nl/wiki/User:Alfredas/Charts

Currently the extension supports the following charts:

Configuring and styling Google charts[edit | edit source]

The extension now allows to configure Google Charts using the same parameters that are specified in the Chart API. You specify the parameters in the mediawiki fashion:

{{#sparql:
select { ... }
| format=areachart
| title=Some Title
| width=900
| height=600
| vAxis={title:'MWh'}
}}

Google Visualization DataSource and embedding charts[edit | edit source]

The charts are based on Google Visualization API and the SparqlExtension wraps the endpoint to implement a DataSource.

This means that the SparqlExtension produced charts can be used locally on the wiki and also be embedded into other websites like here. You can also use JavaScript libraries such as Spark.

Special:SparqlExtension[edit | edit source]

The extension now has a special page Special:SparqlExtension, that features a form and simulates the behavior of an endpoint. This allows for easy use in a federated query in the fashion:

select * where {
service <http://mediawiki1/wiki/Special:SparqlExtension> {
?x ?y ?z
}
service <http://mediawiki2/wiki/Special:SparqlExtension> {
?x ?y ?z
}
service <http://mediawiki3/wiki/Special:SparqlExtension> {
?x ?y ?z
}
}

See here for a known issue involving aggregates and federated queries.

The special page also implements Google Visualization DataSource (select output format GOOGLE-VIZ) and allows creating charts in the wiki as well as externally.

Prerequisites[edit | edit source]

Installation[edit | edit source]

  • Install SMW and Joseki first.
  • Configure Joseki to accept named graphs and "SELECT FROM" statements (see example joseki config here)
  • New: Install php5-xsl if you do not already have it.
  • Optional: it seems that Joseki allows non-restricted access to the SPARQL update service allowing strangers to write to your store or even delete data. See User:Alfredas/JosekiSecurity for a security "patch".
  • Download zip and extract it into your_mediawiki_path/extensions/

Alternatively you can check out the latest version:

svn co https://svn.eeni.tbm.tudelft.nl/SparqlExtension/branches/0.7/ SparqlExtension
  • Add the following to the end of your LocalSettings.php and change the MYHOST to the name of your server.
    • Note: You may want to set the update_url to localhost for security reasons so that remote updates of the data are disabled.
$smwgNamespace = 'http://MYHOST/wiki/';
require_once("$IP/extensions/SparqlExtension/SparqlExtension.php");
$smwgDefaultStore = "JosekiStore";
$sparqlEndpointConfiguration = array(
        "service_url" => "http://MYHOST/joseki/sparql", // wherever the endpoint is -- could be http://dbpedia.org/sparql
        "update_url" => "http://MYHOST/joseki/update/service", // wherever the endpoint is
         // change these parameters only if you are going to use the extension with a non-standard endpoint 
        "query_parameter" => "query", // the query parameter used by the endpoint - usually "query"
        "output_type_parameter" => "output", // the output type parameter used by the endpoint - usually "output"
        "default_type" => "csv" // the default type of output from an endpoint (xml, csv and json supported)
);

You can verify the service and update URLs for Joseki either through a web browser or wget. You should get the error messages below, which indicate that you found the correct URL.

Proxy configuration[edit | edit source]

If your server is behind a proxy, the following allows to configure a proxy (used for fetching data from external endpoints):

$sparqlProxyIP = "xxx.xxx.xxx.xxx";
$sparqlProxyPort = 3128;
  • You can import all your current SMW data using a utility script: your_mediawiki_path/extensions/SparqlExtension/importPagesIntoEndpoint.php

Version[edit | edit source]

  • 0.7 - Sept. 10, 2010 - Major rewrite of code. Added Special:SparqlExtension and 9 new output formats, mostly google charts.
  • 0.6.1 - Sept. 3, 2010 - Added support for Semantic Internal Objects
  • 0.6 - June 9, 2010 - Rewrite to fix security issues.
  • 0.1 - 0.5 - General bug fixes

Known issues & query debugging advice[edit | edit source]

  • Be careful with SPARQL queries that reference wiki pages with redirects from other pages. (i.e. have owl:sameAs links). See here for a discussion of what can go wrong (plus possible solutions) when you have queries that make lists or aggregate data.
  • Error: magic word "twinkle" not found - See here for a solution. This seems to happen when Halo is installed as well.
  • N.B. Even though TDB allows multiple read single write (MRSW) policy only, Joseki allows simultaneous writes to TDB (or so it seems). Consider this a Joseki bug that hopefully will be fixed in the nearest future.
  • The Maps Extension may cause problems if you use it to generate coordinates for property values, specifically if it renders coordinates using the decimal-minute-second notation. To fix this, you should add the following line to LocalSettings.php, after the inclusion of Maps:
    • $egMapsCoordinateNotation = Maps_COORDS_FLOAT;
      
    • If you still have problems, make sure that the latest version of Validator is installed and called before Maps in LocalSettings.php.
  • Federated queries where aggregation is performed on results from remote endpoints may not work as expected, depending on the endpoint software you are using. The issue is that the aggregation syntax may not be pushed to the remote endpoint, meaning that any aggregation is performed at the local endpoint over a possibly truncated set of results. A clear sign of this would be if the returned count is a round number such as 1000. Another easy way to check for this is to perform a count over the number of results available via your endpoint, and those found directly from the remote endpoint (i.e. through a web interface like http://dbpedia.org/sparql provides). See this thread for a discussion of the issue on the jena-dev list.
  • Very long SPARQL queries used with Google Visualizations may run into problems due to URL length limitations of certain browsers (such as Internet Explorer with a 2083 character limit). If this happens, you will likely see an error of:
    google.visualization.Query: [object Error]
    
  • Queries performed on wikis where anonymous viewing is disabled may run into problems. This is because queries are routed through the Special:SparqlExtension page in order to facilitate output formats such as those used for the Google Visualizations. The code being executed is seen as an anonymous viewer by the wiki, and is therefore blocked.
  • Extension:GraphViz can only display a single graph per page. If you try to display more than one, you will see duplicates of one of the graphs. This can be fixed using similar code to what we have here for Google Visualizations, where charts are given unique identifiers so that multiple charts can be displayed per page.
  • The Google Visualizations expect numbers of the type xsd:double. You may run into problems if the numbers are of a different format like xsd:long. This needs to be fixed in the extension code, but until then a workaround is shown below where a variable is explicitly cast to a xsd:double:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (xsd:double(?GDPperCapita) as ?num) WHERE {
  • For Google Visualizations such as column charts, it seems that labels in a numerical form (i.e. years, etc.) need to be explicitly cast as a string, otherwise they will be interpreted as data for plotting. See here for a working example, or below for the code that works around this.
select (xsd:string(fn:substring(str(?onlineDate), 1, 4)) as ?onlineYear) (fn:round(sum(?capacity)) as ?totalCapacity)  where {
  • Literal values of 0 may be exported to the triplestore as blank nodes (if this occurs you will see values of "_1", "_2", "_3", etc.). See here for more description along with a simple fix to for the Semantic MediaWiki code causing this.
  • Both pages and their redirect links will show up in the queries. This may lead to issues like double counting among other problems. An example of how to exclude redirects is shown below:
select ?x where {
?x prop:Likes a:Dogs . 
OPTIONAL{?x <http://www.w3.org/2002/07/owl#sameAs> ?z }.
filter(!bound(?z)) . 
} 
  • Filter statements in SPARQL that use the pipe symbol for logical OR statements (the || symbol) will not work with the MediaWiki parser. To fix this, you need to create a template, such as the Template:! which contains the pipe character, and then include this template within your queries.
    • Will not work:
      filter(?x = 3 || ?x = 4)
      
    • Works:
      filter(?x = 3 {{!}}{{!}} ?x = 4)