Semantic Schemas

Semantic Classes is a proposed MediaWiki extension that would work alongside Semantic MediaWiki, Semantic Forms, Semantic Drilldown and possibly other extensions as well. It would allow for defining all the information about a "class", or data type - for naming, display, data input, and browsing - within a single piece of XML contained within a category page. This XML could then be used to generate template, property, form and filter pages. The XML could then also be edited via some kind of helper tool, preventing users from having to generate or modify it by hand. ''A more conservative name for this extension is semantic schemas - suitable if it abstracts existing SMW and/or OWL schemas (see below). A reason to prefer the Classes name is that it implies that proper XML and ultimately programming language class constructs can be supported.''

Semantic Classes would most likely define two special pages: 'Special:EditClass' and 'Special:GenerateClassPages'. 'Special:EditClass' would allow for creating or editing the XML via the helper form, while 'Special:GenerateClassPages' would essentially provide a button that would let the administrator create the template(s), semantic properties, form, filter(s), etc. automatically from the XML definition. 'Special:GenerateClassPages' would most likely also include checkboxes for each of the pages to be generated, to let administrators choose not to override certain wiki pages that already exist. ''As XML pages are already directly editable by many other programs, these pages would correspond one-to-one to views on that class of object from those programs. Whether the mediawiki extension can directly support a wide range of views and restrictions on views of XML data that these programs support, is an open question. It would be helpful to encourage users of XML parsers and tools to integrate support for SMW into their tools so that SMW can present a partial or restricted or controlled view of the various XML objects.;;

Possible XML structure
Here is some possible XML to be contained within a page called “Category:Cities”, used to define the "City" data type. This section defines the name of the class, the name of the form, and information for a single field of that class, "Population".

 City City  Population  Has population Number   text</InputType> <Size>20</Size> </FormInput> <Filter> <Label>Population</Label> </Filter> </Field> ... </SemanticClass>

Inheritance
Another important aspect of Semantic Classes would be its support for inheritance, or one class inheriting the structure of another. As an example, let's say you want to create a class called "Historical city", that exactly matches the "City" class, but with one additional field, "Current name" (this is, admittedly, a contrived example). The XML for "Historical city", contained within the page "Category:Historical cities", could look like this:

<SemanticClass> Historical city</Name> Historical city</Form> <InheritsFrom>City</InheritsFrom> <Field> Current name</Name>  Has current name</Name> String</Type> </SemanticProperty>  <InputType>text</InputType> <Size>50</Size> </FormInput> </Field> </SemanticClass>

This might be the entire XML for the class.

A "sub-class" could also remove or modify certain of its parent class's fields. Supporting changes in historical city borders over the years might require more specialization, for instance the ancient city of Constantinople does not have the same borders as the modern city of Istanbul and those borders also changed over time. Subclassing of this nature has a lot of applications. There would also be no use for certain fields describing a modern city such as crime rate or quality of life index for which no reliable historical data is available. Some other data, like population, might accept broader data types such as wide ranges or perhaps even normal distributions ("1 million plus or minus 1 sigma" or "from 500,000 to 700,000 people") which a sophisticated language can process as an ordinary number (resolving to an ambiguous number). ''This is an interesting example as it illustrates the various ways in which geographic data and terminology can change making old data useless without such adaptive representation. It's worth developing into a better type hierarchy.''

Customization
There is no way that such a system could generate all the many layouts and customizations that people can perform on their data structures, especially their templates. Thus, once templates, etc. are generated by Semantic Class, they can be modified by users to any extent - the only downside is that, once these pages were modified, they would need to be maintained by hand, since running "GenerateClassPages" again would overwrite the changes (though it should be noted that "GenerateClassPages" would probably be set to allow administrators to choose which pages to override and which to preserve).

On the other hand, for basic implementations, where administrators don't want to customize the appearance or behavior beyond the default, it may be possible to do away with form-definition and filter pages altogether - Semantic Forms and Semantic Drilldown could, in theory, read the XML directly.

Advantages
There are a number of important advantages that Semantic Classes would provide over the current setup:


 * It would provide for a single point where all data for a "class" is stored, making maintenance of the data much easier (currently, making changes like adding new fields can be a headache).
 * There would be separation of data structure from display, as opposed to how the two are now combined in templates and form definitions (mostly - the one exception is the ordering of fields, which would be hardcoded within the XML).
 * Classes can be edited, not just created, via a helper form - for basic data structures, administrators may never have to edit wiki text.
 * It would allow for inheritance of data structure, eliminating duplication (see above).
 * It would allow for easier transfer of data structures from one wiki to another.
 * It would allow for importing and exporting of structure from and to other formats (UML, OWL, etc.)

Alternate approaches
There are several alternate approaches to creating a wiki-wide data schema, for wikis that use Semantic MediaWiki; we have heard a number of these during discussions about the Semantic Classes concept. Here are some other approaches, and why we think they're not as ideal:


 * OWL-based schema - instead of using a custom XML format, the class data would be stored as OWL/RDF (it would most likely still be XML, but with a very different structure). The two advantages of such an approach would be (1) to the extent that one considers OWL/RDF the ideal format for data, it would be stored in that format, and (2) one could use standard RDF editors like Protégé to edit the data. We don't see (1) as a big advantage since we're format-agnostic; and it should be noted that there is currently no standard form-description ontology, so the OWL/RDF would have to be "proprietary" anyway. For (2), we think that a custom editor, on a MediaWiki special page, would be both easier to access and easier to use than an outside editor, since it could have very specific formatting and help text, in the administrator's own language.


 * SMW property-based schema - the data about forms and templates would be stored as a large set of SMW properties across various pages, in the same way that Semantic Drilldown already operates; so that the page for a property could store not just its type and allowed values, but also its form input type, its autocompletion parameters, its help text, etc. The supposed advantage of this approach is that it lets SMW use SMW's own data structure and formatting, instead of using XML, a machine-readable format that some people consider un-wiki-like. We think this approach, though, is a huge step backward: Semantic Classes is meant to move all of the data schema into a single page, while this approach would distribute it into dozens of different pages, making it difficult to even know about all of the schema at any given time, let alone modify it.


 * JavaScript objects - custom JavaScript code would process SMW constructs into first class objects as part of the user interface. Requires custom programming and relies on JavaScript instance data storage.


 * Object databases - language-independent representations of objects that require custom coding to support. Can rely on ODB support for OWL, RDF and SMW itself.


 * SQL databases - poor support for custom objects. Can rely on mediawiki level support and potentially store the objects in exactly the same DB sets as the pages.

Funding needed
Funding will most likely be needed to get the Semantic Classes extension created. If you're willing to fund all or part of this project, please write Yaron Koren <yaron57@gmail.com> to discuss further. If you can identify a likely funder (vendor, online service provider, government, etc.) that needs these capabilities (perhaps they are maintaining SMW or other mediawiki data already and trying to co-ordinate it with data in other forms, and could really use this capability) add them here to this page: