Extension:SemanticQueryRDFS++

Note: This extension has been recently renamed. Its source code is pending update for this change. Jan 17th, 2011

What This Extension Does
The SemanticQueryRDFS++ extension is an extension of the Semantic MediaWiki extension. It extends the modeling language (SMW-ML) and the query language (SMW-QL) of SMW with
 * negation and cardinality in query
 * inverse property in modeling
 * transitive, functional, inverse functional, symmetric properties in modeling
 * domain and range inference for properties in modeling

The namesake, the modeling expressivity supported by the extension is roughly the subset of OWL called as "RDFS++", "RDFS 3.0" or "OWL Prime".

It is based on a theoretical work described in

''Jie Bao, Li Ding, James A. Hendler. Knowledge Representation and Query in Semantic MediaWiki: A Formal Study, In Tetherless World Constellation (RPI) Technical Report, pp. TW-2008-42, 2008 http://tw.rpi.edu/wiki/TW-2008-42''

How It Works
The extension translates both the SMW semantic markup ("the modeling language") and the query language into logic programs (LP), and uses a LP solver as the reasoner. For this implementation, we used dlv as the reasoner, but other LP solvers may be used as well.

It has two work modes
 * file-based mode: first, the administrator need to build a snapshot (dump) of the wiki semantic data using command "php QLPlus_dump.php". In this mode, no real-time change will be captured in queries.
 * database-based mode: the wiki semantic data may also be dumped to a shadow database via ODBC. Real-time changes of instance data will be updated, but ontological changes (i.e. the ones about categories and properties, or the introduction of new category/property) won't.

Scalability
Tested on a machine with configuration: 2 * Xeon 5365 Quad 3.0GHz 1333MHz /16G / 2 * 1TB

A dump of part of DBLP data has been used for test, with about 10k pages and 100k triples.

For the file-based mode, most of queries are answered <1.5s.

For the database-based mode, most of queries are answered <2.5s. In general, db-mode takes 50% more time than the file-base mode.

The execution time is nearly linear to the size of the wiki, and almost constant for most of queries (due to the nature of model building strategies used in lp solvers).

Caching
If caching is turned on ($wgQLPlus_UseCache = true), then a query is only executed for the first visit, following page visits will be loaded from a cache. The cache may be refreshed by the "refresh" (purge) action.

Extended SMW Syntax
Use the "askplus" hook function: e.g.

The syntax is an extension of SMW-QL, so you can use most of SMW-QL features here (some limitations apply)

Features inherited from SMW
From SMW-ML
 * category instantiation e.g., [[Category:C]]
 * property instantiation e.g., P::v
 * subclass, e.g., [[Category:C]] (on a category page)
 * subproperty, e.g., Subpropety of:Property:P (on a property page)

From SMW-QL (in "ask" hook function)
 * conjunction: e.g.,
 * disjunction: e.g.,, B or |w
 * property chain: e.g., P.Q::v
 * property wildcat: e.g., P::+
 * subquery: e.g., [[P::
 * inverse property e.g., -P::v
 * value comparison, e.g. P::>3P::<7P::!5

Negation
Negation as failure (naf) can be modelled using '<>' before a category or property's name in selection conditions. Example

Instances of D which are not instances of C.

Instances of C that have no attribute value of P

It's always a good idea to not use a negated query condition alone (e.g., ), because it may lead to a VERY large result set.

Cardinality
Non-qualified cardinality queries:

Find instances with at least 3 attribute values of P

Qualified cardinality queries:

Find instances with less than 3 attribute values of P, which are instances of D

Domain and Range
On a property page. e.g. Property:P, adding

Domain:Category:C Range:Category:D means that for every instance P::y on page x, then x is an instance of C, y is an instance of D.

The "Category:" prefix may be omitted. Thus, the following script has the same effect.

Domain:C Range:D

Property Axioms
On Property:P, one may declare

Type::Transitive Type::Symmetric Type::Functional Type::InverseFunctional Inverse of::Property:Q

Their meanings are similar to their counterparts in OWL. For properties of Functional or InverseFunctional types, "SameAs" relations may be inferred. For instance, for a functional property P, with " P::v1P::v2 " on the same page, then SameAs(v1,v2) is inferred (i.e., equivalent to adding SameAs::v2 to page v1).

Note: The "SameAs" relation has a weaker semantic meaning than that of owl:sameAs. Even if SameAs(v1,v2) is true, in counting (e.g., in cardinality queries), v1 and v2 are still counted as two different individuals.

While it is possible to add these markups to any properties, to ensure correct inference, it's better only to add them to properties of the "Page" datatype, i.e., the ones with Has type::Type:Page (this is the default type of a property)

Demo and Examples
The following example show querying with a transitive property "part of":

.

The result is:

.

Model Integrity Constraint
The SMW-QL+ language may be used to state integrity constraints. For instance, if we require every person to have a name, the following query will find all instances violating this constraint:

Combining this with templates, it's easy to show integrity constraint warnings on a category page, or on individual pages that violate this constraint.

Download and Installation
1. Install from SVN: you can copy the source code from SVN, and put them under "WIKI-PATH/extentions/SemanticQueryPlus" svn co http://smwbp.googlecode.com/svn/trunk/mediawiki/extensions/SemanticQueryPlus/
 * or if you have shell access, you can install using svn

OR

1. Install from a zip file: you can download a zip file and unzip it to the "WIKI-PATH/extentions/SemanticQueryPlus" folder.

2. Append the following to LocalSettings.php (near the bottom) of your MediaWiki installation:

3. Make sure executables under /bin have the right permissions (e.g., 755)

Configuration
(To be added) See QLPlus_Setting.php for details.

Databse (ODBC) Mode
See DLV ODBC setup for setup.

Dumping Data
Use "php QLPlus_dump.php" under the extension's folder. By default, the dumped data is stored under /dump folder. If the database-mode is selected, some database-lp mapping information and the ontological data are still saved in that folder, not in the database.

It's advised to periodically re-dump the data. For moderate-sized wikis (<100k triples) and mainstream servers configuration, the process should be less than 5 minutes.

Limitations
Not all printing features of the "ask" function are supported, e.g. counting, soting or multi-page viewing. [To be extended]

Change Log
The latest SemanticQueryRDFS++ extension has been tested on MediaWiki versions 1.16 and Semantic MediaWiki version 1.5.4. It does not work with prior versions of SMW.

History:
 * Dec 22, 2010 version 0.1 -first release

Bug Report
If you find a bug (sure, there are plenty of them), please add it here

or email baojie@gmail.com