Extension:Data Aggregator

From MediaWiki.org
Jump to navigation Jump to search
MediaWiki extensions manual
Crystal Clear action run.svg
Data Aggregator
Release status: experimental
Implementation Data extraction
Description This extension enables you to aggregate the data automatically from different pages that conform to a unified template.
Author(s) Weiming Li (weimingtalk)
Latest version 0.31 (2013-08-04)
PHP 5.0
License GNU General Public License 2.0 or later
Download Source Code
Translate the Data Aggregator extension if it is available at translatewiki.net
Check usage and version matrix.

What can this extension do?[edit]

The extension: Data Aggregator allows users to aggregate data from different pages, those pages must be conformed to a unified template. Data Aggregator will pull the specific data field from those pages and compose it into a single table. This extension can be widely used in comparison table and automated information aggregation.

For example, you have a template like this:

Product Name: {{{Product Name}}}
CPU: {{{CPU}}}
Memory: {{{Memory}}}
Hard drive: {{{Hard drive}}}
Display: {{{Display}}} 

There're two pages using this template:


Product Name: PC1
CPU: 1.8G
Memory: 2G
Hard drive: 250GB
Display: LCD


Product Name: PC2
CPU: 1.5G
Memory: 1G
Hard drive: 320GB
Display: CRT

By using the extension: Data Aggregator, you'll be able to aggregate the data items of two pages into one table like this:

  Page1 Page2
Product Name PC1 PC2
CPU 1.8G 1.5G
Memory 2G 1G
Hard drive 250GB 320GB
Display LCD CRT

Simple Usage[edit]


<da template='TemplateName'></da>.

You have to specify the template name in the syntax,please note the template name must be same as its page name, and it's case-sensitive.

For example: Assume you have a template called 'Template:My Template', and your template page is http://mymediawiki.com/Template:My_Template, that means you have to specify your template name like this:

<da template='My_Template'></da>

Advanced Usage[edit]

You can choose which row title to be displayed in the table with rowtitles option:

<da template='My_Template' rowtitles='Name|Address'></da>

A | (vertical bar/pipe) separates each row title. If you do not supply this option, all rows will be shown.

It is quite possible to have so many pages based on only one template, and different pages have different category, you are able to aggregate the data from the specified category by using this manner:

<da template='My_Template' category='My_Category'></da>

Note Note: The category name should be assigned precisely as what you did for template name, there's no space and case sensitive in the category name. If your category name is 'My Category', then you should specify the value 'My_Category' to category flag.

You may also customize the table with the standard table tags: border, cellspacing, cellpadding, class, align, and style. For example:

<da template='My_Template' style="width: 100%; background-color: white" cellspacing="0" cellpadding="2" border="1" rowtitles='Name|Address'></da>

Any text between <da> and </da> will appear ABOVE the table that is produced.

Installation Instruction[edit]

To install this extension, copy and save the source code as data_aggregator.php, then put it in the mediawiki extension folder e.g. /var/lib/mediawiki/extensions.

Add the following to LocalSettings.php:


Caching Note[edit]

MediaWiki caches pages. So, a change to a template value will not immediately show up in the Data Aggregator table. You can force it to update by going to the page with the table, clicking on edit, and then either saving the page unchanged, or - in the URL - changing "edit" to "purge". It would be nice if this extension added a small "update" link that purges the page.

Note Note: Without some additional cache management, the extension will always show the data that were present the last time someone manually saved or purged the page, making this extension problematic in practice.


This program is licensed under GNU GPL V2, please find the license detail in following link: http://www.gnu.org/licenses/gpl-2.0.html


 * Data Aggregator - A MediaWiki Extension
 * Author: Weiming Li (weiming66@gmail.com)
 * Date: 2011-11-02
 * Version: 0.3
 * The extension: Data Aggregator allows users to aggregate data from different pages,
 * those pages must be conformed to a unified template. Data Aggregator will pull the
 * specific data field from those pages and compose it into a single table. This
 * extension can be widely used in comparison table and automated information aggregation.
 * A portion of this code is based directly on the extension TemplateTable from C. Shaun Wagner.
 *   Copyright (C) 2010  Weiming Li
 *   This program is free software: you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation, either version 2 of the License, or
 *   (at your option) any later version.
 *   This program is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   GNU General Public License for more details.
 *   You should have received a copy of the GNU General Public License
 *   along with this program.  If not, see <http://www.gnu.org/licenses/>.

// Register the tag.
$wgExtensionFunctions[] = "wfDataAggregator";

// Extension credits that will show up on Special:Version
$wgExtensionCredits['other'][] = array(
	'path'		   => __FILE__,
	'name'		   => 'Data Aggregator',
	'version'	   => '0.31',
	'author'	   => 'Weiming Li',
	'url'		   => 'http://www.mediawiki.org/wiki/Extension:DataAggregator',
	'description'  => 'This extension enables you to aggregate the data automatically from different pages which are conformed to one or multiple wiki templates.',
	'license-name' => 'GPL-2.0+',

function wfDataAggregator()
		global $wgParser;
		$wgParser->setHook("da", "renderTable");

* This function renders the table.
* $input = text between <da> and </da>
* $argv = key/val array of options in the <da> tag.
function renderTable($input, $argv)
	global $wgScript;
	// Check the template is given or not.
		return "Error: No template given.  Please use the format <tt>&lt;da template='Name Of Template'&gt;...&lt;/da&gt;</tt>";
		$templates = explode("|", $argv["template"]);
		//If we have multiple wiki templates specified
		if(count($templates) > 1)
			foreach ($templates as $key=>$value)
				$temp_sql = $temp_sql."tl_title='".trim($value)."' or ";
				$temp_reg = trim($value)." |".$temp_reg;
			$temp_sql = substr($temp_sql, 0, -3);
			$temp_reg = str_replace("_"," ",substr($temp_reg, 0, -1));	//take out spare pipeline

			$reg_exp = '/{{ *('.$temp_reg.').[^{{]*}}/';
		//Only one template is specifieid
			$temp_reg = str_replace("_"," ",$argv["template"]);
			$temp_sql = "tl_title='".trim($argv["template"])."'";
			$reg_exp = '/{{ *'.$temp_reg.'.[^{{]*}}/';
	// Header type.  If "rowtitle" is given, use that.
	// Otherwise, use dynamic row title (every variable used in the template)
	$dynhead = true;
	$rowtitles = array();
		$rowtitles = explode("|", $argv["rowtitles"]);
		$dynhead = false;

	// Preset output to the input (stuff between <da> and </da>
	$output = $input;

	global $wgDBprefix, $wgContLang;
	//$namespaceNames = $wgContLang->namespaceNames; // this produces an error, since namespaceNames is a protected attribute
	$namespaceNames = $wgContLang->getNamespaces();
	$data = array();

	// If category is set, then query against the db to get pages belong to the specified category
	if (isset($argv["category"]))
		$query = "select * from ".$wgDBprefix."templatelinks left join ".$wgDBprefix."page on ".$wgDBprefix."templatelinks.tl_from=".$wgDBprefix."page.page_id where (".$temp_sql.") and page.page_id  in (select cl_from from categorylinks where cl_to ='".mysql_real_escape_string($argv["category"])."') order by page_title";
		$query = "select * from ".$wgDBprefix."templatelinks left join ".$wgDBprefix."page on ".$wgDBprefix."templatelinks.tl_from=".$wgDBprefix."page.page_id where (".$temp_sql.") order by page_title";

	//Execute the SQL query
	$result = mysql_query($query)
	or die("Query failed: ".mysql_error()." Actual query: ".$query);

	while($row = mysql_fetch_object($result))
			$q2 = "select rev_text_id from ".$wgDBprefix."revision where rev_page=".$row->page_id." order by rev_timestamp desc limit 1";

			if(($res2 = mysql_query($q2)) && ($row2 = mysql_fetch_object($res2)))
					$q3 = "select * from ".$wgDBprefix."text where old_id=".$row2->rev_text_id;
					if(($res3 = mysql_query($q3)) && ($row3 = mysql_fetch_object($res3)))
					 	$row3->old_text = str_replace("\n", " ", $row3->old_text);	# turn the article into one long line

					preg_match_all ('/{{.[^{{]*}}/', $row3->old_text, $n );
					//If the page only have one wiki template reference
						$matches = $n;
					}else	//Multiple templates reference in one page
						preg_match_all ( $reg_exp, $row3->old_text, $matches );

					//Start to parse detailed content from a transclusion section quoted by {{Template Name....}}
					foreach ($matches[0] as $key=>$value){

						// create an array from of the template fields
						$kvs = explode( "|", substr($value, 2, -2));
						// remove first field: template name
						unset( $kvs[0] );

						foreach($kvs as $kv)
							$kv = trim($kv);
							if($kv == "") continue;
							$eq = strpos($kv, "=");
							if($eq === false) continue;
							$key = trim(substr($kv, 0, $eq));
							$val = trim(substr($kv, $eq+1));
					   		$item[$key] = $val;
							if($dynhead && !in_array($key, $rowtitles)) array_push($rowtitles, $key);
						if(sizeof($item) > 0) {
							$title = str_replace("_", " ", $row->page_title);
							if ( $row->page_namespace != NS_MAIN ) {
								$title = $namespaceNames[$row->page_namespace].":".$title;
							$data[$title] = $item;

	// Skip this if there's no data to display.
	// Otherwise, it will create the table
	if(sizeof($data) > 0)
		$output.= "<table";

		// Read the table style parameters
		foreach($argv as $key=>$val)
			if($key == "template" || $key == "rowtitles" || $key == "category" || $key == "fcc") continue;
			$output.= " $key=\"$val\"";
		$output.= ">\n";
		$output.= "<tr bgcolor=#cccccc><th>&nbsp;</th>\n";

		foreach($data as $page=>$item)
			$link = $page;
			// page name as link title
			$colon = strpos($link, ":");
			if ( $colon >= 0 ) $link = substr($link, $colon);	 # without leading Category:/Wiki:/Special: statements

			// Changing this line will change the text that appears as the article link:
			// $link = "article";

			// Compose the table headers, the header cells style can be modified right here
			$output.= "<th bgcolor=#cccccc><a href='".$wgScript."?title=".urlencode($page)."'>$link</a></th>\n";
		$output.= "</tr>\n";

		// Table rows
		foreach($rowtitles as $rowtitle)
			// Composing the row title
			$output.="<td bgcolor=#cccccc><b>".$rowtitle."</b></td>\n";

			// Tabel content cells
			foreach($data as $page=>$item)
					$output.= "<td align='center'>".$item[$rowtitle]."</td>\n";
					$output.= "<td>&nbsp;</td>\n";

			// Finishing of one row
			$output.= "</tr>\n";

		$output.= "</table>\n";
		// No data found
		$output.= "Template ".$argv["template"]." has no data. Please double check your template name spelling.";

	return $output;

See also[edit]