Semantic Schemas

Semantic Schemas is a proposed MediaWiki extension that would work alongside Semantic MediaWiki, Semantic Forms, Semantic Drilldown and possibly other extensions as well. It would allow for defining all the schema information about a "class", or data type - for naming, display, data input, and browsing - within a single piece of XML contained within a category page. This XML could then be used to generate template, property, form and filter pages. The XML could then also be edited via some kind of helper tool, preventing users from having to generate or modify it by hand.

Semantic Schemas would most likely define two special pages: 'Special:EditSchema' and 'Special:GenerateClassPages'. 'Special:EditSchema' would allow for creating or editing the XML via the helper form, while 'Special:GenerateClassPages' would essentially provide a button that would let the administrator create the template(s), semantic properties, form, filter(s), etc. automatically from the XML definition. 'Special:GenerateClassPages' would most likely also include checkboxes for each of the pages to be generated, to let administrators choose not to override certain wiki pages that already exist.

Possible XML structure
Here is some possible XML to be contained within a page called “Category:Cities”, used to define the "City" data type. This section defines the name of the category, the name of the form, and information for a single field of that class, "Population".

 City City  Population  Has population Number   text</InputType> <Size>20</Size> </FormInput> <Filter> <Label>Population</Label> </Filter> </Field> ... </ClassSchema>

Inheritance
Another important aspect of Semantic Schemas would be its support for inheritance, or one class inheriting the structure of another. As an example, let's say you want to create a class called "Historical city", that exactly matches the "City" class, but with one additional field, "Current name" (this is, admittedly, a contrived example). The XML for "Historical city", contained within the page "Category:Historical cities", could look like this:

<ClassSchema> Historical city</ClassName> Historical city</Form> <InheritsFrom>City</InheritsFrom> <Field> Current name</Name>  Has current name</Name> String</Type> </SemanticProperty> <FormInput> <InputType>text</InputType> <Size>50</Size> </FormInput> </Field> </ClassSchema>

This might be the entire XML in the schema.

A "sub-class" could also remove or modify certain of its parent class's fields.

Customization
There is no way that such a system could generate all the many layouts and customizations that people can perform on their data structures, especially their templates. Thus, once templates, etc. are generated by Semantic Schemas, they can be modified by users to any extent - the only downside is that, once these pages were modified, they would need to be maintained by hand, since running "GenerateClassPages" again would overwrite the changes (though it should be noted that "GenerateClassPages" would probably be set to allow administrators to choose which pages to override and which to preserve).

On the other hand, for basic implementations, where administrators don't want to customize the appearance or behavior beyond the default, it may be possible to do away with form-definition and filter pages altogether - Semantic Forms and Semantic Drilldown could, in theory, read the XML directly.

Advantages
There are a number of important advantages that Semantic Schemas would provide over the current setup:


 * It would provide for a single point where all data for a schema/"class" is stored, making maintenance of the data much easier (currently, making changes like adding new fields can be a headache).
 * There would be separation of data structure from display, as opposed to how the two are now combined in templates and form definitions (mostly - the one exception is the ordering of fields, which would be hardcoded within the XML).
 * Schemas can be edited, not just created, via a helper form - for basic data structures, administrators may never have to edit wiki text.
 * It would allow for inheritance of data structure, eliminating duplication (see above).
 * It would allow for easier transfer of data structures from one wiki to another.
 * It would allow for importing and exporting of structure from and to other formats (UML, OWL, etc.)

Alternate approaches
There are several alternate approaches to creating a wiki-wide data schema, for wikis that use Semantic MediaWiki; we have heard a number of these during discussions about the Semantic Schemas concept. Here are some other approaches, and why we think they're not as ideal:


 * OWL-based schema - instead of using a custom XML format, the schema data would be stored as OWL/RDF (it would most likely still be XML, but with a very different structure). The two advantages of such an approach would be (1) to the extent that one considers OWL/RDF the ideal format for data, it would be stored in that format, and (2) one could use standard RDF editors like Protégé to edit the data. We don't see (1) as a big advantage since we're format-agnostic; and it should be noted that there is currently no standard form-description ontology, so the OWL/RDF would have to be "proprietary" anyway. For (2), we think that a custom editor, on a MediaWiki special page, would be both easier to access and easier to use than an outside editor, since it could have very specific formatting and help text, in the administrator's own language.


 * SMW property-based schema - the data about forms and templates would be stored as a large set of SMW properties across various pages, in the same way that Semantic Drilldown already operates; so that the page for a property could store not just its type and allowed values, but also its form input type, its autocompletion parameters, its help text, etc. The supposed advantage of this approach is that it lets SMW use SMW's own data structure and formatting, instead of using XML, a machine-readable format that some people consider un-wiki-like. We think this approach, though, is a huge step backward: Semantic Schemas is meant to move all of the data schema into a single page, while this approach would distribute it into dozens of different pages, making it difficult to even know about all of the schema at any given time, let alone modify it.

Funding needed
Funding will most likely be needed to get the Semantic Schemas extension created. If you're willing to fund all or part of this project, please write Yaron Koren <yaron57@gmail.com> to discuss further.