Extension:Data

Data extension is a MediaWiki extension by Nikola Smolenski which enables getting and setting of data in the articles.

Data is described by item/key/value triplets, where 'item' is typically the name of the article, 'key' the name of the data, and 'value' the actual data. Perhaps reading about the Data function will explain it the best.

The extension could be seen in action at http://www.rastko.net/~nikola/

Note that the extension does have some similarity to the Semantic MediaWiki. While coding I took a look at it - it was just a quick look, and I didn't look at actual SMW code - but now when the extension is finished, people tell me that it does much the same thing. But, on the other hand, perhaps attacking the problem from different angles will give a better solution.

Data function
Data is a parser function which returns value of certain key of certain item, for example:

will return '2144700', which is the number of people who live in Paris.

In this example, Paris->population is the descriptor, which tells the function to return the value of key 'population' of item 'Paris'. Full form of the descriptor is:

Lang::Item->Key


 * 'Lang' is the code for the language in which the descriptor is; the idea behind this is to have the ability to describe same values in multiple languages. Currently, it is unimplemented. If left out (trailing :: then becomes unnecessary as well), the language of the wiki is used.
 * 'Item' is name of the item for which the value is sought. If left out (together with -> ), name of the article is used instead, except in the data block, see below. If data function is used in a template then, of course, this is the name of the article in which the template is included.
 * 'Key' is name of the key whose value is sought. It too can be left out, in which case name of the item is returned (this is useful in the data block).

To overview leaving outs, in article entitled 'Paris', on an English language wiki, the following descriptors will all return the same value:

The descriptor is case-insensitive.

Note that you can build the descriptor out of values returned by other data functions, which could be very useful, for example:

If France->capital is 'Paris', the descriptor becomes Paris->population, and that is returned.

Data block
The data block parses the wiki code enclosed by it using all items which satisfy certain condition. Suppose that you want a list of countries with their population.

The block will first fetch a list of all items which satisfy the condition (they have key 'is a' whose value is 'country'). Currently, the condition could have only a single comparison, and it could only be = ; this should be expanded in the future.

Then, wiki text enclosed by it will be parsed, once for each item. If, in a descriptor of a data function, name of the item is left out, then name of the current item will be used, instead of article name as usual.

If you need to use a value of the article in which the data block is, you can use

Sort function
Sort function is only useful inside of the data block, outside of it it has no use. Example:

Argument 'descending' or 'desc' will make the data block sort the data in descending order. Example:

If the function is not used, the data block is sorted alphabetically. In any case, natural sorting is used (10>2).

Setdata block
Setdata block has the following form:

is a=city population=2144700 capital=Paris

To the left of = is the descriptor, which is parsed in the same way as in the data function.

Note that it is possible to set data which belong to one item from the article on another item. This creates some problems, however I won't give up on the ability, which could be immensely useful in some cases, for example if you wish to enter data about a million of items at once.

Data special page
Special:Data shows all data which belongs to an item. For example, Special:Data/Paris would show:

Paris
 * is a: city
 * population: 2144700
 * is in: France

Joindata special page
Special:Joindata joins the text of an article (typically a template) using an arbitrary string as its name. For example, Special:Join/Template:City-=-Paris shows Template:City as it would look if included in article Paris, without the need for the article on Paris to exist at all.

Similar to use of -> "arrow" in the descriptor, you may think of -=- as of a "chain" that binds article and name. I initially wanted to use // but MediaWiki for some reason reduces it to / and so I typed the first thing that crossed my mind, later noticing that it looks like a chain.

Wikipedia
The extension as-is works with a simple table in the database of the wiki; but I imagine that if it is actually implemented in Wikimedia projects, Wikipedias would only be able to read the data from a central database, which would be updated from a wiki, say at http://data.wikimedia.org.

An obvious usefulness of this is that, when data is changed (for example, a new census recounts the population of all cities in a country), all Wikipedias will have updated data.

But a much more useful use is solving the eternal dilemma between mass article entry that plagues most Wikipedias (the dilemma, not the entry). With use of Special:Joindata, translating a single template would give to any wikipedia basic articles about all places in the world, or stub biographies of all people which are in the central database, without actually having to insert every article by a bot. And someone who wishes to further expand the article could too use the template as the starting point.

Another useful ability is periodical changing of fluctuating values (either directly in the database of the data wiki or via a bot which would update actual page on it). For example: (I know I'll be ostracised for this) display current weather in an article on a city or (this is actually useful) display current exchange rate in an article on a money.

Wiktionary
The extension should obviously be very useful for the Wiktionary.

Theory
This is my first actually working implementation of something I call "free-form database". Both in theory I learned, and in practice I observed, that when an information system is made, it is hindered by its rigidness, because any change of needs of its user, or any unforeseen feature, becomes an obstacle which can't be overcome without actually changing the system. Free-form databases should be able to solve this.

Key differences between a free-form database and, for example, a relational database are:
 * All data is text
 * While a free-form database engine may store numerical data as numbers for greater efficiency, for each field it must be possible that it could be filled in with text too. Think that there are data which can be expressed only as numbers? Think again: even a for a purely numerical data, such as an ISBN, valid values might be 'none', 'unknown', 'damaged' etc. External tools could be built which would, for example, offer most used values and types in forms used to enter the data. But there should always be a possibility to enter pure text.


 * All data is multiple
 * It should be possible to have multiple values for each field. To reuse ISBN example, even a unique data such as ISBN could, for example, have a typo, and so both the value with the typo and real ISBN of a book should be used.


 * Values are keys
 * This is actually similar to relational databases, in that value of each field could be used as a primary key of a different (or the same) table. But coupled with the fact that each field could have a textual value, or that each field could have multiple values, it leads to some interesting outcomes.

I was thinking about this for some time, and approached the problem from different angles. I tried to build a specialised tool for this; I thought about creating a wiki from scratch with this abilities; but at the end it turns out that, thanks to MediaWiki's great extendability, I could do it the way I did, via a simple MediaWiki extension.

0.1
Initial release.

0.2

 * More proper way of initializing the extension, as suggested by Patrick; Joindata is now a magic word.
 * function now accepts one argument as default value when no data is present, per code submitted by Olenz.
 * Fixed bug in Special:Joindata with items containing space.

Todo
Todo list may seem longish, but the extension is useful as-is regardless.


 * Make use of the language code in descriptors. (perhaps with a "wgForeignDataDBRepo" based on interwiki ids)
 * Expand the possible conditions of the data block.
 * Make special pages in the way which is now preferrable.
 * The descriptor is case-insensitive, but it shouldn't be (name of the item should be case-sensitive, except for the first character in some wikis, while the rest should not).
 * Solve the problems related to filling in the data about one item from the article on another (probably there should be an option to turn it off for some wikis):
 * if two articles have different data about the same item/key pair, it would flip regarding to which one is the last saved,
 * if a key is deleted from the article, it won't be deleted from the database (because maybe it is still present in some other article).
 * One item/key pair can have only one value, while in theory it should be possible for it to have more values.
 * Obviously there is some potential for abuse, for example .) It should be trivial to do this, but I haven't thought of an actual syntax for it.
 * Functions. They would look like keys, but would actually return value of a calculation. For example, would return 'P'.

Data.php
Note: you should edit the article and take the source from there.

Data.sql
You may wish to create indexes over some of the three columns.

Example uses

 * http://fias.uni-frankfurt.de/~simbio/People