Extension:Data Aggregator
|
Data Aggregator Release status: experimental |
|
|---|---|
| Implementation | Data extraction |
| Description | This extension enables you to aggregate the data automatically from different pages that conform to a unified template. |
| Author(s) | Weiming Li (weimingtalk) |
| Last version | 0.3 (2011/11/02) |
| PHP | 5.0 |
| License | GNU GPL v2 |
| Download | Source Code |
| Check usage and version matrix | |
Contents |
What can this extension do? [edit]
The extension: Data Aggregator allows users to aggregate data from different pages, those pages must be conformed to a unified template. Data Aggregator will pull the specific data field from those pages and compose it into a single table. This extension can be widely used in comparison table and automated information aggregation.
For example, you have a template like this:
Product Name: {{{Product Name}}}
CPU: {{{CPU}}}
Memory: {{{Memory}}}
Hard drive: {{{Hard drive}}}
Display: {{{Display}}}
There're two pages using this template:
Page1:
Product Name: PC1 CPU: 1.8G Memory: 2G Hard drive: 250GB Display: LCD
Page2:
Product Name: PC2 CPU: 1.5G Memory: 1G Hard drive: 320GB Display: CRT
By using the extension: Data Aggregator, you'll be able to aggregate the data items of two pages into one table like this:
\n
| Page1 | Page2 | |
|---|---|---|
| Product Name | PC1 | PC2 |
| CPU | 1.8G | 1.5G |
| Memory | 2G | 1G |
| Hard drive | 250GB | 320GB |
| Display | LCD | CRT |
Simple Usage [edit]
Sample:
<da template='TemplateName'></da>.
You have to specify the template name in the syntax,please note the template name must be same as its page name, and it's case-sensitive.
For example: Assume you have a template called 'Template:My Template', and your template page is http://mymediawiki.com/Template:My_Template, that means you have to specify your template name like this:
<da template='My_Template'></da>
Advanced Usage [edit]
You can choose which row title to be displayed in the table with rowtitiles option:
<da template='My_Template' rowtitiles='Name|Address'></da>
A | (vertical bar/pipe) separates each row title. If you do not supply this option, all rows will be shown.
It is quite possible to have so many pages based on only one template, and different pages have different category, you are able to aggregate the data from the specified category by using this manner:
<da template='My_Template' category='My_Category'></da>
Note: The category name should be assigned precisely as what you did for template name, there's no space and case sensitive in the category name. If your category name is 'My Category', then you should specify the value 'My_Category' to category flag.
You may also customize the table with the standard table tags: border, cellspacing, cellpadding, class, align, and style. For example:
<da template='My_Template' style="width: 100%; background-color: white" cellspacing="0" cellpadding="2" border="1" rowtitiles='Name|Address'></da>
Any text between <da> and </da> will appear ABOVE the table that is produced.
Installation Instruction [edit]
To install this extension, copy and save the source code as data_aggregator.php, then put it in the mediawiki extension folder e.g. /var/lib/mediawiki/extensions. Add the following to LocalSettings.php:
require_once("extensions/data_aggregator.php");
Caching Note [edit]
MediaWiki caches pages. So, a change to a template value will not immediately show up in the Data Aggregator table. You can force it to update by going to the page with the table, clicking on edit, and then either saving the page unchanged, or - in the URL - changing "edit" to "purge". It would be nice if this extension added a small "update" link that purges the page.
Note: Without some additional cache management, the extension will always show the data that were present the last time someone manually saved or purged the page, making this extension problematic in practice.
License [edit]
This work is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the license, or (at your option) any later version. This work is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
This program is licensed under GNU GPL V2, please find the license detail in following link: http://www.gnu.org/licenses/gpl-2.0.html
Code [edit]
<?php /* * Data Aggregator - A MediaWiki Extension * * Author: Weiming Li (weiming66@gmail.com) * Date: 2011-11-02 * Version: 0.3 * * The extension: Data Aggregator allows users to aggregate data from different pages, * those pages must be conformed to a unified template. Data Aggregator will pull the * specific data field from those pages and compose it into a single table. This * extension can be widely used in comparison table and automated information aggregation. * * A portion of this code is based directly on the extension TemplateTable from C. Shaun Wagner. * * Copyright (C) 2010 Weiming Li * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. * */ // Register the tag. $wgExtensionFunctions[] = "wfDataAggregator"; // Extension credits that will show up on Special:Version $wgExtensionCredits['other'][] = array( 'path' => __FILE__, 'name' => 'Data Aggregator', 'version' => '0.3', 'author' => 'Weiming Li', 'url' => 'http://www.mediawiki.org/wiki/Extension:DataAggregator', 'description' => 'This extension enables you to aggregate the data automatically from different pages which are conformed to one or multiple wiki templates.' ); function wfDataAggregator() { global $wgParser; $wgParser->setHook("da", "renderTable"); } /** * This function renders the table. * $input = text between <da> and </da> * $argv = key/val array of options in the <da> tag. */ function renderTable($input, $argv) { global $wgScript; // Check the template is given or not. if(!isset($argv["template"])) { return "Error: No template given. Please use the format <tt><da template='Name Of Template'>...</da></tt>"; }else { $templates = explode("|", $argv["template"]); //If we have multiple wiki templates specified if(count($templates) > 1) { foreach ($templates as $key=>$value) { $temp_sql = $temp_sql."tl_title='".trim($value)."' or "; $temp_reg = trim($value)." |".$temp_reg; } $temp_sql = substr($temp_sql, 0, -3); $temp_reg = str_replace("_"," ",substr($temp_reg, 0, -1)); //take out spare pipeline $reg_exp = '/{{ *('.$temp_reg.').[^{{]*}}/'; }else //Only one template is specifieid { $temp_reg = str_replace("_"," ",$argv["template"]); $temp_sql = "tl_title='".trim($argv["template"])."'"; $reg_exp = '/{{ *'.$temp_reg.'.[^{{]*}}/'; } } // Header type. If "rowtitle" is given, use that. // Otherwise, use dynamic row title (every variable used in the template) $dynhead = true; $rowtitles = array(); if(isset($argv["rowtitles"])) { $rowtitles = explode("|", $argv["rowtitles"]); $dynhead = false; } // Preset output to the input (stuff between <da> and </da> $output = $input; global $wgDBprefix, $wgContLang; //$namespaceNames = $wgContLang->namespaceNames; // this produces an error, since namespaceNames is a protected attribute $namespaceNames = $wgContLang->getNamespaces(); $data = array(); // If category is set, then query against the db to get pages belong to the specified category if (isset($argv["category"])) { $query = "select * from ".$wgDBprefix."templatelinks left join ".$wgDBprefix."page on ".$wgDBprefix."templatelinks.tl_from=".$wgDBprefix."page.page_id where (".$temp_sql.") and page.page_id in (select cl_from from categorylinks where cl_to ='".mysql_escape_string($argv["category"])."') order by page_title"; }else { $query = "select * from ".$wgDBprefix."templatelinks left join ".$wgDBprefix."page on ".$wgDBprefix."templatelinks.tl_from=".$wgDBprefix."page.page_id where (".$temp_sql.") order by page_title"; } //Execute the SQL query $result = mysql_query($query) or die("Query failed: ".mysql_error()." Actual query: ".$query); while($row = mysql_fetch_object($result)) { $q2 = "select rev_text_id from ".$wgDBprefix."revision where rev_page=".$row->page_id." order by rev_timestamp desc limit 1"; if(($res2 = mysql_query($q2)) && ($row2 = mysql_fetch_object($res2))) { $q3 = "select * from ".$wgDBprefix."text where old_id=".$row2->rev_text_id; if(($res3 = mysql_query($q3)) && ($row3 = mysql_fetch_object($res3))) { $row3->old_text = str_replace("\n", " ", $row3->old_text); # turn the article into one long line preg_match_all ('/{{.[^{{]*}}/', $row3->old_text, $n ); //If the page only have one wiki template reference if(count($n[0])==1) { $matches = $n; }else //Multiple templates reference in one page { preg_match_all ( $reg_exp, $row3->old_text, $matches ); } //Start to parse detailed content from a transclusion section quoted by {{Template Name....}} foreach ($matches[0] as $key=>$value){ // create an array from of the template fields $kvs = explode( "|", substr($value, 2, -2)); // remove first field: template name unset( $kvs[0] ); foreach($kvs as $kv) { $kv = trim($kv); if($kv == "") continue; $eq = strpos($kv, "="); if($eq === false) continue; $key = trim(substr($kv, 0, $eq)); $val = trim(substr($kv, $eq+1)); $item[$key] = $val; if($dynhead && !in_array($key, $rowtitles)) array_push($rowtitles, $key); } if(sizeof($item) > 0) { $title = str_replace("_", " ", $row->page_title); if ( $row->page_namespace != NS_MAIN ) { $title = $namespaceNames[$row->page_namespace].":".$title; } $data[$title] = $item; } } } } } // Skip this if there's no data to display. // Otherwise, it will create the table if(sizeof($data) > 0) { $output.= "<table"; // Read the table style parameters foreach($argv as $key=>$val) { if($key == "template" || $key == "rowtitles" || $key == "category" || $key == "fcc") continue; $output.= " $key=\"$val\""; } $output.= ">\n"; $output.= "<tr bgcolor=#cccccc><th> </th>\n"; foreach($data as $page=>$item) { $link = $page; // page name as link title $colon = strpos($link, ":"); if ( $colon >= 0 ) $link = substr($link, $colon); # without leading Category:/Wiki:/Special: statements // Changing this line will change the text that appears as the article link: // $link = "article"; // Compose the table headers, the header cells style can be modified right here $output.= "<th bgcolor=#cccccc><a href='".$wgScript."?title=".urlencode($page)."'>$link</a></th>\n"; } $output.= "</tr>\n"; // Table rows foreach($rowtitles as $rowtitle) { // Composing the row title $output.="<td bgcolor=#cccccc><b>".$rowtitle."</b></td>\n"; // Tabel content cells foreach($data as $page=>$item) { if(isset($item[$rowtitle])) $output.= "<td align='center'>".$item[$rowtitle]."</td>\n"; else $output.= "<td> </td>\n"; } // Finishing of one row $output.= "</tr>\n"; } $output.= "</table>\n"; } else { // No data found $output.= "Template ".$argv["template"]." has no data. Please double check your template name spelling."; } return $output; } ?>
