Extension:Data

From MediaWiki.org

Jump to: navigation, search

         

Manual on MediaWiki Extensions
List of MediaWiki Extensions
Crystal Clear action run.png
Data

Release status: unknown

Implementation  Tag, Parser function, Database, Special page
Description Enables getting and setting of data in the articles.
Author(s)  Nikola Smolenski
Last Version  0.2
License No license specified
Download see below
Example  http://www.rastko.net/~nikola/

check usage (experimental)

Data extension is a MediaWiki extension by Nikola Smolenski which enables getting and setting of data in the articles.

Data is described by item/key/value triplets, where 'item' is typically the name of the article, 'key' the name of the data, and 'value' the actual data. Perhaps reading about the Data function will explain it the best.

The extension could be seen in action at http://www.rastko.net/~nikola/

Note that the extension does have some similarity to the Semantic MediaWiki. While coding I took a look at it - it was just a quick look, and I didn't look at actual SMW code - but now when the extension is finished, people tell me that it does much the same thing. But, on the other hand, perhaps attacking the problem from different angles will give a better solution.

Contents

[edit] Usage

[edit] Data function

Data is a parser function which returns value of certain key of certain item, for example:

{{#data:Paris->population}}

will return '2144700', which is the number of people who live in Paris.

In this example, Paris->population is the descriptor, which tells the function to return the value of key 'population' of item 'Paris'. Full form of the descriptor is:

Lang::Item->Key
  • 'Lang' is the code for the language in which the descriptor is; the idea behind this is to have the ability to describe same values in multiple languages. Currently, it is unimplemented. If left out (trailing :: then becomes unnecessary as well), the language of the wiki is used.
  • 'Item' is name of the item for which the value is sought. If left out (together with -> ), name of the article is used instead, except in the data block, see below. If data function is used in a template then, of course, this is the name of the article in which the template is included.
  • 'Key' is name of the key whose value is sought. It too can be left out, in which case name of the item is returned (this is useful in the data block).

To overview leaving outs, in article entitled 'Paris', on an English language wiki, the following descriptors will all return the same value:

{{#data:en::Paris->population}}
{{#data:Paris->population}}
{{#data:population}}

The descriptor is case-insensitive.

Note that you can build the descriptor out of values returned by other data functions, which could be very useful, for example:

{{#data:{{#data:France->capital}}->population}}

If France->capital is 'Paris', the descriptor becomes Paris->population, and that is returned.

[edit] Data block

The data block parses the wiki code enclosed by it using all items which satisfy certain condition. Suppose that you want a list of countries with their population.

<data condition="is a=country">
* [[{{#data:}}]] ({{#data:population}})
</data>

The block will first fetch a list of all items which satisfy the condition (they have key 'is a' whose value is 'country'). Currently, the condition could have only a single comparison, and it could only be = ; this should be expanded in the future.

Then, wiki text enclosed by it will be parsed, once for each item. If, in a descriptor of a data function, name of the item is left out, then name of the current item will be used, instead of article name as usual.

If you need to use a value of the article in which the data block is, you can use

{{#data:{{ARTICLENAME}}->whatever}}

[edit] Sort function

Sort function is only useful inside of the data block, outside of it it has no use. Example:

{{#sort:{{#data:population}}}}

Argument 'descending' or 'desc' will make the data block sort the data in descending order. Example:

{{#sort:{{#data:population}}|descending}}

If the function is not used, the data block is sorted alphabetically. In any case, natural sorting is used (10>2).

[edit] Setdata block

Setdata block has the following form:

<setdata>
is a=city
population=2144700
capital=Paris
</setdata>

To the left of = is the descriptor, which is parsed in the same way as in the data function.

Note that it is possible to set data which belong to one item from the article on another item. This creates some problems, however I won't give up on the ability, which could be immensely useful in some cases, for example if you wish to enter data about a million of items at once.

[edit] Data special page

Special:Data shows all data which belongs to an item. For example, Special:Data/Paris would show:

Paris
  • is a: city
  • population: 2144700
  • is in: France

[edit] Joindata special page

Special:Joindata joins the text of an article (typically a template) using an arbitrary string as its name. For example, Special:Join/Template:City-=-Paris shows Template:City as it would look if included in article Paris, without the need for the article on Paris to exist at all.

Similar to use of -> "arrow" in the descriptor, you may think of -=- as of a "chain" that binds article and name. I initially wanted to use // but MediaWiki for some reason reduces it to / and so I typed the first thing that crossed my mind, later noticing that it looks like a chain.

[edit] Uses

[edit] Wikipedia

The extension as-is works with a simple table in the database of the wiki; but I imagine that if it is actually implemented in Wikimedia projects, Wikipedias would only be able to read the data from a central database, which would be updated from a wiki, say at http://data.wikimedia.org.

An obvious usefulness of this is that, when data is changed (for example, a new census recounts the population of all cities in a country), all Wikipedias will have updated data.

But a much more useful use is solving the eternal dilemma between mass article entry that plagues most Wikipedias (the dilemma, not the entry). With use of Special:Joindata, translating a single template would give to any wikipedia basic articles about all places in the world, or stub biographies of all people which are in the central database, without actually having to insert every article by a bot. And someone who wishes to further expand the article could too use the template as the starting point.

Another useful ability is periodical changing of fluctuating values (either directly in the database of the data wiki or via a bot which would update actual page on it). For example: (I know I'll be ostracised for this) display current weather in an article on a city or (this is actually useful) display current exchange rate in an article on a money.

[edit] Wiktionary

The extension should obviously be very useful for the Wiktionary.

[edit] Theory

This is my first actually working implementation of something I call "free-form database". Both in theory I learned, and in practice I observed, that when an information system is made, it is hindered by its rigidness, because any change of needs of its user, or any unforeseen feature, becomes an obstacle which can't be overcome without actually changing the system. Free-form databases should be able to solve this.

Key differences between a free-form database and, for example, a relational database are:

All data is text
While a free-form database engine may store numerical data as numbers for greater efficiency, for each field it must be possible that it could be filled in with text too. Think that there are data which can be expressed only as numbers? Think again: even a for a purely numerical data, such as an ISBN, valid values might be 'none', 'unknown', 'damaged' etc. External tools could be built which would, for example, offer most used values and types in forms used to enter the data. But there should always be a possibility to enter pure text.
All data is multiple
It should be possible to have multiple values for each field. To reuse ISBN example, even a unique data such as ISBN could, for example, have a typo, and so both the value with the typo and real ISBN of a book should be used.
Values are keys
This is actually similar to relational databases, in that value of each field could be used as a primary key of a different (or the same) table. But coupled with the fact that each field could have a textual value, or that each field could have multiple values, it leads to some interesting outcomes.

I was thinking about this for some time, and approached the problem from different angles. I tried to build a specialised tool for this; I thought about creating a wiki from scratch with this abilities; but at the end it turns out that, thanks to MediaWiki's great extendability, I could do it the way I did, via a simple MediaWiki extension.

[edit] Revision history

[edit] 0.1

Initial release.

[edit] 0.2

  • More proper way of initializing the extension, as suggested by Patrick; Joindata is now a magic word.
  • {{#data:}} function now accepts one argument as default value when no data is present, per code submitted by Olenz.
  • Fixed bug in Special:Joindata with items containing space.

[edit] Todo

Todo list may seem longish, but the extension is useful as-is regardless.

  • Make use of the language code in descriptors.
  • Expand the possible conditions of the data block.
  • Make special pages in the way which is now preferrable.
  • The descriptor is case-insensitive, but it shouldn't be (name of the item should be case-sensitive, except for the first character in some wikis, while the rest should not).
  • Solve the problems related to filling in the data about one item from the article on another (probably there should be an option to turn it off for some wikis):
    • if two articles have different data about the same item/key pair, it would flip regarding to which one is the last saved,
    • if a key is deleted from the article, it won't be deleted from the database (because maybe it is still present in some other article).
  • One item/key pair can have only one value, while in theory it should be possible for it to have more values.
  • Obviously there is some potential for abuse, for example <data></data> will work with all data present, which could include millions of items; this should be fixed by someone experienced in the area, after having an insight in what would actual uses of the extension be.
  • Have a better database schema.
  • Separate database access code from the rest, enabling use of other database engines (perhaps a custom-made one should be necessary for a project the size of Wikipedia).
  • Make the ability to "read data backwards". That is, if France->Capital=Paris, it is very easy to find out what is the capital of France, but it is impossible to find whether Paris is a capital of some country, and what country that is. (Actually it is possible, but in a very cludgy way which I do not recommend: <data condition="is a=country">{{#ifeq:{{#data:capital}}|Paris|{{#data:}}}}</data>.) It should be trivial to do this, but I haven't thought of an actual syntax for it.
  • Functions. They would look like keys, but would actually return value of a calculation. For example, {{#data:Paris->first letter}} would return 'P'.

[edit] Source

[edit] Data.php

Note: you should edit the article and take the source from there.

<?php
# MediaWiki Data extension v0.2
#
# Modelled after ParserFunctions extension at
# http://meta.wikimedia.org/wiki/ParserFunctions
#
# Copyright © 2006-2008 Nikola Smolenski <smolensk@eunet.yu>
# With additions by Wikimedians Patrick and Olenz
#
# Released under GNU LGPL
#
# To install, copy the extension to your extensions directory, create 
# the neccessary table in your wiki's database by using data.sql, and add line
# include("extensions/Data.php");
# to the bottom of your LocalSettings.php
#
# For more information see its page at
# http://www.mediawiki.org/wiki/Extension:Data

if ( !defined( 'MEDIAWIKI' ) ) {
	die("No no!");
}
 
$wgExtensionFunctions[]="wfDataExtension";
$wgExtensionCredits['parserhook'][] = array(
	'name' => 'Data Extension',
	'description' => 'enables getting and setting of data in articles',
	'url' => 'http://www.mediawiki.org/wiki/Extension:Data',
	'author' => 'Nikola Smolenski',
	'version' => '0.2',
);
 
function wfDataExtension() {
	global $IP,$wgParser,$wgHooks,$wgExtData,$wgMessageCache;
 
	require_once($IP."/includes/SpecialPage.php");
	$wgExtData=new ExtData();
 
	$wgHooks['LanguageGetMagic'][] = 'wfDataLanguageGetMagic';
	$wgMessageCache->addMessage( 'joindata', 'Joindata' );
	$wgParser->mFunctionSynonyms[0]["#data"]="data";
	$wgParser->mFunctionSynonyms[0]["#sort"]="sort";
	$wgParser->setFunctionHook("data",array(&$wgExtData,"get"));
	$wgParser->setFunctionHook("sort",array(&$wgExtData,"sort"));
	$wgHooks['ArticleSaveComplete'][] = array($wgExtData,"set"); //Complete
	$wgParser->setHook("data",array(&$wgExtData,"data"));
	$wgParser->setHook("setdata",array($wgExtData,"setData"));
	SpecialPage::addPage(new SpecialPage('Data','',TRUE,'DataExtensionSpecialShow',TRUE));
	SpecialPage::addPage(new SpecialPage('Joindata','',TRUE,'DataExtensionSpecialJoin',TRUE));
}
 
function wfDataLanguageGetMagic( &$magicWords, $langCode ) {
	switch ( $langCode ) {
		default:
			$magicWords['data']          = array( 0, 'data' );
			$magicWords['sort']          = array( 0, 'sort' );
			break;
	}
	return true;
}
 
class ExtData {
	var $title;
	var $magic;
 
	function ExtData() {
		$this->title=array();
		$this->magic="j49w83xFdEj9pR84cj8T9pxY5p2UxjNspwV94cjDfTxqYwI";
	}
 
	function get(&$parser,$desc="",$default="") {
		list($lang,$item,$key)=$this->parseDesc($desc);
		if($key=="")
			return $item;
//		echo "$lang|$item|$key";
		$res=mysql_query("SELECT `value` FROM data_extension WHERE `item`='".
			mysql_real_escape_string($item).
			"' AND `key`='".
			mysql_real_escape_string($key)."'");
		list($r)=mysql_fetch_row($res);
		if($r=="") $r=$default;
		return array($r,"noparse"=>TRUE);
	}
 
	function sort(&$parser,$sort="",$order="") {
		return array(
		"<!-- {$this->magic}$sort{$this->magic}".(($order=="descending"||$order=="desc")?"DESC":"ASC")."{$this->magic} -->",
		"noparse"=>TRUE);
	}
 
	function set(&$article, &$user, &$text, &$summary, &$minoredit, &$watchthis, &$sectionanchor, &$flags) {
		$leaveout=preg_split("'(<nowiki>.*</nowiki>|<!--.*-->)'siU",$text,-1,PREG_SPLIT_DELIM_CAPTURE);
		foreach($leaveout as $k=>$v) {
			if(!($k%2)) {
				$v=preg_split("'(<setdata>.*</setdata>)'siU",$v,-1,PREG_SPLIT_DELIM_CAPTURE);
				foreach($v as $k1=>$v1) {
					if($k1%2) {
						// Parse data
						$v1=explode("\n",substr($v1,9,-10)); // Removing <setdata></setdata>
						foreach($v1 as $v2) {
							if(($pos=strpos($v2,"="))!==FALSE) {
								list($lang,$item,$key)=$this->parseDesc(trim(substr($v2,0,$pos)));
								$value=trim(substr($v2,$pos+1));
$item=mysql_real_escape_string($item);
$key=mysql_real_escape_string($key);
$value_es=mysql_real_escape_string($value);
$res=mysql_query("SELECT `value` FROM data_extension WHERE `item`='$item' AND `key`='$key'");
if(mysql_num_rows($res)) {
	list($val)=mysql_fetch_row($res);
	if($val!=$value) {
		mysql_query("UPDATE data_extension SET `value`='$value_es' WHERE `item`='$item' AND `key`='$key'");
	}
} else {
	mysql_query("INSERT INTO data_extension VALUES ('$item','$key','$value_es')");
}
							}
						}
					}
				}
			}
		}
		return true;
	}
 
	function data($text, $param=array(), $parser=null) {
		if($param['condition']) {
			$condition=explode("=",$param['condition']);
			$res=mysql_query(
			"SELECT item FROM data_extension WHERE
			`key`='".mysql_real_escape_string($condition[0])."' AND
			`value`='".mysql_real_escape_string($condition[1])."'");
		} else {
			$res=mysql_query("SELECT item FROM data_extension");
		}
		$ret=array();
		while(($r=mysql_fetch_assoc($res))!==FALSE) {
			array_unshift($this->title,$r['item']);
			$t=$parser->Parse($text,$parser->mTitle,$parser->mOptions,TRUE,FALSE);
			$ret[]=$t->getText();
			array_shift($this->title);
		}
		usort($ret,"DataExtensionCompare");
		foreach($ret as $k=>$v)
			$ret[$k]=preg_replace("/<!--.*$this->magic.*-->/sU","",$v);
		return implode("",$ret);
	}
 
	function setData() {
		return "";
	}
 
	function specialShow($item) {
		global $wgOut,$wgRequest,$wgTitle;
 
		$wgOut->setPageTitle("Data");
 
		if($item=="")
			$item=$wgRequest->getVal("item");
		if($item=="") {
			$wgOut->addHTML(
'<form action="'.$wgTitle->escapeLocalUrl().'" method="post">
	Item: <input type="text" name="item" />
	<input type="submit" value="Go" />
</form>');
		} else {
			$res=mysql_query("SELECT `key`,`value` FROM data_extension WHERE `item`='".
				mysql_real_escape_string($item)."'");
			$txt="===$item===\n";
			while(($r=mysql_fetch_row($res))!==FALSE) {
				$txt.="* {$r[0]}''':''' {$r[1]}\n";
			}
			$wgOut->addWikiText($txt);
		}
	}
 
	function specialJoin($args) {
		global $wgOut,$wgRequest,$wgTitle;
 
		if(!strpos($args,"-=-")) {
			$args=$wgRequest->getVal("join")."-=-".$wgRequest->getVal("name");
		}
		if($args=="-=-") {
			$wgOut->setPageTitle("Join data");
			$wgOut->addHTML(
'<form action="'.$wgTitle->escapeLocalUrl().'" method="post">
	Join: <input type="text" name="join" /> with
	Name: <input type="text" name="name" />
	<input type="submit" value="Go" />
</form>');
		} else {
			list($join,$name)=explode("-=-",$args);
			$name=strtr($name,'_',' ');
			$this->title[0]=$name;
			$wgOut->setPageTitle("Join of $join with $name");
			$rev = Revision::newFromTitle(Title::newFromText($join));
			if($rev) {
				$wgOut->addWikiText($rev->getText());
			}
		}
	}
 
	function parseDesc($desc) {
		global $wgTitle,$wgContLanguageCode;
 
		if(!count($this->title))
			$this->title[0]=$wgTitle->mTextform;
 
		// Find language
		if(($pos=strpos($desc,"::"))!==FALSE) {
			$lang=trim(substr($desc,0,$pos));
			$item=substr($desc,$pos+2);
		} else {
			$lang=$wgContLanguageCode;
		}
 
		// Find item and key
		$desc=strtr($desc,array("-&gt;"=>"->")); // Ugly hack
		if(($pos=strpos($desc,"->"))!==FALSE) {
			$item=trim(substr($desc,0,$pos));
			$key=trim(substr($desc,$pos+2));
		} else {
			$item=$this->title[0];
			$key=trim($desc);
		}
 
//		echo "$lang|$item|$key<br>\n";
 
		return array($lang,$item,$key);
	}
}
 
function DataExtensionSpecialShow($item) {
	global $wgExtData;
	$wgExtData->specialShow($item);
}
 
function DataExtensionSpecialJoin($args) {
	global $wgExtData;
	$wgExtData->specialJoin($args);
}
 
function DataExtensionCompare($a,$b) {
	global $wgExtData;
 
	$pat="/{$wgExtData->magic}(.*){$wgExtData->magic}(ASC|DESC){$wgExtData->magic}/sU";
	if(preg_match($pat,$a,$akey) && preg_match($pat,$b,$bkey)) {
		if($akey[2]=="ASC") {
			return strnatcmp($akey[1],$bkey[1]);
		} else {
			return strnatcmp($bkey[1],$akey[1]);
		}
	} else {
		return strnatcmp($a,$b);
	}
}
 
?>

[edit] Data.sql

USE wikidb;
CREATE TABLE `data_extension` (
  `item` mediumtext NOT NULL,
  `key` mediumtext NOT NULL,
  `value` mediumtext NOT NULL
);

You may wish to create indexes over some of the three columns.

[edit] See also

[edit] Example uses