Extension:Cargo/Cargo and Semantic MediaWiki

Semantic MediaWiki (SMW) is an extension to MediaWiki that lets you store and query data. It has a large number of spinoff extensions -- around 30 active ones -- that make use of it, and together turn an SMW-based system into something resembling a full-fledged, easy-to-use data framework.

The Cargo extension was consciously designed to mimic the full system of SMW and many of its spinoff extensions, in its syntax options and overall interface. In a few cases, code itself has been copied over as well, though in a modified form. In all, Cargo provides some or all of the functionality of seven extensions from the SMW "family": Semantic MediaWiki, Semantic Result Formats, Semantic Maps, Maps, Semantic Drilldown, Semantic Compound Queries and Semantic Internal Objects.

Differences from SMW
If Cargo is essentially just a clone of SMW and some other extensions, why was it created in the first place? And why should anyone use it? Cargo does have a number of differences from SMW, that give it some advantages.

Philosophically, Cargo differs from SMW in three main ways:
 * Cargo ties data storage directly to templates. In SMW, semantic values can be placed anywhere on the page, even though in practice they're usually confined to templates; but in Cargo, it is the template itself that is responsible for storing its data.
 * Cargo stores its data in as simple a fashion as possible, using standard database tables to hold tabular data; while SMW uses a database to represent "triples" of data.
 * Though this is a more minor difference, Cargo is less customizable than SMW and its spinoff extensions, opting instead to base display settings on the data itself.

The first two differences especially enable the code in Cargo built around both storage and querying to be much simpler than that of SMW. Cargo lets users make near-direct use of SQL "SELECT" statements; which means that a custom query language does not need to be defined or supported. It also means that Cargo's own code for displaying query results in various formats can be significantly simpler than the corresponding code in SMW, SRF etc. And it means that the setup and maintenance work for administrators can be simpler. Cargo, a single extension, can take the place of about 15 extensions: the seven extensions listed before, plus another seven or so "library" extensions required by Semantic MediaWiki, like DataValues. And setting up data structures in the wiki is easier, too, since there are no longer property pages that need to be created and maintained.

The usage of near-direct SQL also enables Cargo to do queries that are not easily possible in SMW. These include:
 * Displaying joined data, i.e data from two different tables in one display; in SMW this could be represented by "?A.B", a syntax option that is not allowed.
 * Getting the set of pages that do not have a value, i.e. have a blank value, for some field.
 * Doing string operations within queries, like finding all rows that have a value for some field with exactly five characters.

Finally, the fact that the data structure is defined explicitly, instead of implicitly, within templates means that the interface has knowledge about the data schema, that it can make use of. Cargo's Special:CargoTables page is one example of this: it automatically displays, for each table/template, the full set of fields and the values for each field.

There may only be one substantial feature of SMW that Cargo does not support, which is the ability to store data in an RDF-based triplestore, like Virtuoso. This is something that would be good for Cargo to support; although if Cargo does add this support, it will probably be in a Cargo-like way, keeping the "wrapper" around the underlying storage as minimal as possible. Instead of having #cargo_query be able to query the triplestore, for instance, there would probably be separate functionality - maybe a parser function called #cargo_sparql - that enabled near-direct SPARQL querying.

Features checklist
What about a wiki that uses Cargo and not SMW, but wants a full set of functionality matching that available through the set of SMW-based extensions? The table below shows the main set of functionality that SMW-based sites tend to make use of, and how it is, or is not, available in a Cargo-based system.

Performance differences
Cargo uses a simple database structure, instead of Semantic MediaWiki's more complex, custom DB structure (assuming a triplestore is not used); so one might expect Cargo's querying to be at least somewhat faster than SMW's. One small-scale test comparing the two has been run; you can see the details at the page Performance testing. In this test, Cargo querying was around 50% faster than SMW querying.