Semantic Schemas

Semantic Schemas is a proposed MediaWiki extension that would allow for defining all the schema information about a "class", or data type - for naming, display, data input, and browsing - within a single piece of XML contained within a category page. This XML could then be used to generate all the necessary additional pages needed to put the schema into effect, such as template pages and, if Semantic MediaWiki is installed, property pages. The XML could then also be edited via some kind of editing interface, preventing users from having to generate or modify it by hand.

Semantic Schemas would allow other extensions to define their own fields to add to the XML, using hooks. To start with, it's planned that the extensions Semantic MediaWiki, Semantic Forms and Semantic Drilldown would hook into the Semantic Schemas code to add their own fields. Semantic Schemas is planned with these extensions in mind, which is why it has "Semantic" in its title; but there's no reason why the extension couldn't be named something more generic, like "Schemas" or "Class Schemas" - it could work even without the presence of Semantic MediaWiki and related extensions.

Semantic Schemas would most likely define two special pages: 'Special:EditSchema' and 'Special:GenerateClassPages'. 'Special:EditSchema' would allow for creating or editing the XML via the helper form, while 'Special:GenerateClassPages' would essentially provide a button that would let the administrator create the template(s), semantic properties, form, filter(s), etc. automatically from the XML definition. 'Special:GenerateClassPages' would most likely also include checkboxes for each of the pages to be generated, to let administrators choose not to override certain wiki pages that already exist.

Possible XML structure
Here is some possible XML to be contained within a page called "Category:Cities", used to define the "City" data type. This section defines the name of the category, the name of the form, and information for a single field of that class, "Population".

 City   Number   text 20   Population</Label> </semanticdrilldown:Filter> </Field> ... </ClassSchema>

Note that XML namespaces are used here so that each extension can define its own fields without fear of overriding others'.

Inheritance
Another important aspect of Semantic Schemas would be its support for inheritance, or one class inheriting the structure of another. As an example, let's say you want to create a class called "Fictional city", that exactly matches the "City" class, but with one additional field, "Media source" (e.g., the book, movie, etc. where that city was first mentioned). The XML for "Fictional city", contained within the page "Category:Fictional cities", could look like this:

<ClassSchema name="Fictional city"> <InheritsFrom>City</InheritsFrom> Fictional city</semanticforms:FormName> <Field name="Media source"> <semanticmediawiki:Property name="Has media source"> String</Type> </semanticmediawiki:Property>  text</InputType> 50</Size> </semanticforms:FormInput> </Field> </ClassSchema>

This might be the entire XML in the schema.

A "sub-class" could also remove, or modify, certain of its parent class's fields.

Customization
There is no way that such a system could generate all the many layouts and customizations that people can perform on their data structures, especially their templates. Thus, once templates, etc. are generated by Semantic Schemas, they can be modified by users to any extent - the only downside is that, once these pages are modified, they would need to be maintained by hand, since running "GenerateClassPages" again would overwrite the changes (though it should be noted that "GenerateClassPages" would probably be set to allow administrators to choose which pages to override and which to preserve).

On the other hand, for basic implementations, where administrators don't want to customize the appearance or behavior beyond the default, it may be possible to do away with form-definition and filter pages altogether - Semantic Forms and Semantic Drilldown could, in theory, read the XML directly.

Advantages
There are a number of important advantages that Semantic Schemas would provide over the current setup:


 * It would provide for a single point where all data for a schema/"class" is stored, making maintenance of the data much easier (currently, making changes like adding new fields can be a headache).
 * There would be separation of data structure from display, as opposed to how the two are now combined in templates and form definitions (mostly - the one exception is the ordering of fields, which would be hardcoded within the XML).
 * Schemas can be edited, not just created, via a helper form - for basic data structures, administrators may never have to edit wiki text.
 * It would allow for inheritance of data structure, eliminating duplication (see above).
 * It would allow for easier transfer of data structures from one wiki to another.
 * It would allow for importing and exporting of structure from and to other formats (UML, OWL, etc.)

Alternate approaches
There are several alternate approaches to creating a wiki-wide data schema, for wikis that use Semantic MediaWiki; we have heard a number of these during discussions about the Semantic Schemas concept. Here are some other approaches, and why we think they're not as ideal:


 * OWL-based schema - instead of using a custom XML format, the schema data would be stored as OWL/RDF (it would most likely still be XML, but with a very different structure). The two advantages of such an approach would be (1) to the extent that one considers OWL/RDF the ideal format for data, it would be stored in the ideal format for data, and (2) one could use standard RDF editors like Protégé to edit the data. We don't see (1) as a big advantage since we're format-agnostic; and it should be noted that there is currently no standard form-description ontology, so the OWL/RDF would have to be "proprietary" anyway. For (2), we think that a custom editor, on a MediaWiki special page, would be both easier to access and easier to use than an outside editor, since it could have very specific formatting and help text, in the administrator's own language.


 * SMW property-based schema - the data about forms and templates would be stored as a large set of SMW properties across various pages, in the same way that Semantic Drilldown already operates; so that the page for a property could store not just its type and allowed values, but also its form input type, its autocompletion parameters, its help text, etc. The supposed advantage of this approach is that it lets SMW use SMW's own data structure and formatting, instead of using XML, a machine-readable format that some people consider un-wiki-like. We think this approach, though, is a huge step backward: Semantic Schemas is meant to move all of the data schema into a single page, while this approach would distribute it into dozens of different pages, making it difficult to even know about all of the schema at any given time, let alone modify it.

Funding needed
Funding will most likely be needed to get the Semantic Schemas extension created. If you're willing to fund all or part of this project, please write Yaron Koren <yaron57@gmail.com> to discuss further.