Extension:Infobox Data Capture

From MediaWiki.org

Jump to: navigation, search
Manual on MediaWiki Extensions
List of MediaWiki Extensions
Infobox Data Capture

Release status: beta

Implementation Parser function, Database
Description Tags to enable capture of typed data in infoboxes.
Author(s) sortaSean, eParka
Version 0.2 (2007.April.10)
MediaWiki 1.9.3
Download Code on page.
Example Code on page.
Hooks used

ArticleSaveComplete
ArticleDeleteComplete

Contents

[edit] What can this extension do

Enable typed data storage in a wiki.

Primarily this involves capturing typed data passed to templates, but is more flexible.

[edit] Usage

Creates a parser function called #dataentry with two arguments:

  • Title: The name of the block of data to be stored. (Usually the template name)
  • Key-value pairs: A '|' deliminated list of key-value pairs separated by ';'.
    • validTag (Optional): Following each key value pair, a number 1=valid, 0=invalid. Non integer values and negative values will be treated as a comment. All information is stored, regardless of value of isValid tag.
    • comment (Optional): Following the valid tag, a comment on the valid tag.
    • Example:
|Key1;Value1;1;Value is valid
|Key2;Value2;0;Value is invalid

Data Entry tag example:

{{#dataentry:Data block title
|Key1;Value1;1;Value is valid
|Key2;Value2;0;Value is invalid
}}


[edit] Handling Lists

List of values are handled using the #listsplit parser function: Unlike templates, the infobox data capture works with multiple entries with the same key. To handle deliminated lists passed to templates, the #listsplit function creates a key-value pair entry for each list item.

The #listsplit function takes 5 arguments:

  • key - The attribute name for the value list
  • list - The list of values separated by the separator
  • separator - (Optional) The separator used in the list, defaults to ",".
  • validTag- (Optional) The valid tag to be used for the entire list, defaults to 0.
  • comment - (Optional) The comment to be used for the entire list, defaults to "".
{{#listsplit:key|list|separator|validTag|comment}}

NOTE: Only for use within the dataentry parser function. Generates its own opening '|' (or not if no values), for use directly inside #dataentry. See example below.

[edit] Examples

The following:

{{#listsplit:synonyms|function;foo;foobar;method|;|1|All valid names for a function}}

Produces:

|synonyms;function;1;All valid names for a function
|synonyms;foo;1;All valid names for a function
|synonyms;foobar;1;All valid names for a function
|synonyms;method;1;All valid names for a function

This creates 7 records in the database for the "function" data block: one description record, four valid synonym records and two invalid synonym records. It displays nothing on the page.

{{#dataentry:function
|description;A block of code...;2;a simplistic view
{{#listsplit:synonym|function;foo;foobar;method|;|1|All valid names for a function}}
{{#listsplit:synonym|attribute;variable|;|0|Not functions}}
}}

[edit] Installation

Add the database table `infoboxdata`, move the code to the extensions folder and include the code in LocalSettings.php.

[edit] Database Table Addition

Requires one table called {{{tag}}}_infoboxdata.

CREATE TABLE  {{{Schema}}}.`{{{tag}}}_infoboxdata` (
  `ib_from` INT(8) UNSIGNED NOT NULL DEFAULT '0',
  `ib_datablock_order` INT(11) NOT NULL DEFAULT '7',
  `ib_datablock_name` VARBINARY(255) NOT NULL DEFAULT '',
  `ib_attribute_order` INT(11) NOT NULL DEFAULT '7',
  `ib_attribute` VARBINARY(255) NOT NULL DEFAULT '',
  `ib_value` BLOB,
  `ib_isvalid` INT(1) UNSIGNED NOT NULL DEFAULT '1',
  `ib_comment` BLOB,
  KEY `ib_from` (`ib_from`,`ib_datablock_order`,`ib_datablock_name`,`ib_attribute`),
  KEY `ib_datablock_name` (`ib_datablock_name`,`ib_from`)
) ENGINE=MyISAM DEFAULT CHARSET=BINARY;

[edit] Changes to LocalSettings.php

require_once("$IP/extensions/InfoboxData/InfoboxData.php");

[edit] Code

<?php
 
if ( !defined( 'MEDIAWIKI' ) ) {
        die( 'This file is a MediaWiki extension, it is not a valid entry point' );
}
$wgExtensionCredits['parserhook'][] = array(
        'name' => 'Infobox Data Capture',
        'version' => '0.2', // April 10, 2007.
        'url' => 'http://www.mediawiki.org/wiki/Extension:Infobox_Data_Capture',
        'author' => 'sortaSean, eParka',
        'description' => 'Enable database capture of typed infobox data',
);
 
$wgInfoboxDataCapture = new InfoboxDataCapture();
 
$wgExtensionFunctions[] = 'wfSetupInfoboxDataCapture';
$wgHooks['LanguageGetMagic'][] = 'wfInfoboxDataLanguageGetMagic';
 
//have to place these hooks outside the setup function, otherwise they don't get called
$wgHooks['ArticleSaveComplete'][] = array(&$wgInfoboxDataCapture, 'save');
$wgHooks['ArticleDeleteComplete'][] = array(&$wgInfoboxDataCapture, 'delete');
 
 
function wfSetupInfoboxDataCapture() {
        global $wgParser;
        global $wgInfoboxDataCapture;
 
        # Set a function hook associating the "example" magic word with our function
        $wgParser->setFunctionHook( 'dataentry', array(&$wgInfoboxDataCapture, 'dataEntryParser' ));
        $wgParser->setFunctionHook( 'listsplit', array(&$wgInfoboxDataCapture, 'listSplit' ));
}
 
function wfInfoboxDataLanguageGetMagic( &$magicWords, $langCode ) {
        # The first array element is case sensitive, in this case it is not case sensitive
        # All remaining elements are synonyms for our parser function
        switch ( $langCode ) {
                default:
                $magicWords['dataentry'] = array( 0, 'dataentry' );
                $magicWords['listsplit'] = array( 0, 'listsplit' );
        }
        # unless we return true, other parser functions extensions won't get loaded.
        return true;
}
 
/**
 * InfoboxDataCapture
 * class for handling dataentry tags and uploading data to the database
 * @package MediaWiki
 */
class InfoboxDataCapture {
 
        /**#@+
         * @private
         */
        # Persistent:
        var $mInfoboxData;
 
 
        /**
         * Constructor, initializes the infobox data array
         *
         * @private
         */
        function InfoboxDataCapture() {
                $this->mInfoboxData = array();
        }
 
        /**
         * Receives the infobox data as input. First element is the title, then key-value pairs separated by ";".
         * Use the listsplit parserfunction to separate lists and store them as individual items.
         * WIKI TEXT EXAMPLE:
         * 
         * {{#dataentry:function
         * |description;A block of code...;2;a simplistic view
         * {{#listsplit:synonyms|function;foo;foobar;method|;|1|All valid names for a function}}
         * {{#listsplit:synonyms|attribute;variable|;|0|Not functions}}
         *      }}
         *
         * @param parser        The parser object
         * @param title         The name of the data block, to be stored in the database.
         * @private
         */
        function dataEntryParser ( &$parser, $title) {           
                //nothing passed in, crap out
                if(!(func_num_args() > 1))
                        return "";
 
                $argString = func_get_arg(2);
                //since there is a varible number of arguments, handle them here
                //and some arguments maybe merged if the are from a nested parser function (i.e. listsplit)
                for($i = 3; $i < func_num_args() ;$i++) {
                        $argString .= "|".func_get_arg($i);
                }
                $argString = $parser->mStripState->unstripBoth($argString);
                $args = explode("|",$argString);
 
                $infoboxData = $this->parseInfoboxData( $args );
                $this->addKeyValuePairs($title, $infoboxData);
                return "";
        }
 
        /**
         * A parser function used to separate lists into multiple value insertions.  
         * Unlike templates, the infoboxdata capture works with multiple entries with the same key.  
         * For use within the dataentry parser function.
         * NOTE: Generates its own opening '|' (or not if no values), for use directly inside #dataentry
         * Called via:
         *              {{#listsplit:synonyms|function;foo;foobar;method|;|1|All valid names for a function}}
         *      
         * As is:
         * {{#dataentry:function
         * |description;A block of code...;2;a simplistic view
         * {{#listsplit:synonyms|function;foo;foobar;method|;|1|All valid names for a function}}
         * {{#listsplit:synonyms|attribute;variable|;|0|Not functions}}
         *      }}
         *
         * @param parser        parser object
         * @param key           The name of the  
         * @param list          A list of values seperated by the separator
         * @param separator The separator used in the list, default ","
         * @param isValid       A valid tag, used for anti-metacrap
         * @param comment       The separator used in the list, default ","
         * @private
         */
        function listSplit( &$parser, $key, $list, $separator = ",", $isValid = 1, $comment = "") {
 
                if(!is_numeric($isValid) ||(intval($isValid) != $isValid) || ($isValid < 0)) {
                        if(!$comment) {
                                $comment = $isValid;
                        }
                        $isValid = 1;
                } 
                $valueList = explode($separator, $list);
                $output = "";
                foreach( $valueList as $value ) {
                        $output .= "|$key;$value;$isValid;$comment\n";
                }
                return $output;
        }
 
        /**
         * Adds each data block to the mInfoboxData hash using the title as a key.  
         * Handles multiple blocks of the same title by pushing onto an array
         *
         * @param title         The name of the data block
         * @param args          The array of "InfoboxDatum"s
         * @private
         */
        function addKeyValuePairs( $title, $args) {
                if ( !isset( $this->mInfoboxData[$title] ) ) {
                        $this->mInfoboxData[$title] = array();
                }
 
                if($args) {
                        //FIXME: should create a datablock object that has a name and a data array
                        $datablock = array();
                        $datablock[$title] = $args;
 
                        array_push ($this->mInfoboxData, $datablock);
                }
 
        }
 
        /**
         * Creates the InfoboxDatum objects.  Handles lack of key by using $index.
         *
         * @param args          An array of key value pairs with valid tag and comment info - all ";" separated
         * @private
         */
        function parseInfoboxData( $args ) {
                $infoboxData = array();
                $index = 1;
                foreach( $args as $arg ) {
                        $values = explode(";", $arg); 
                        if(count($values) == 1) {
                                $values[1] = $values[0];
                                $values[0] = $index++;     
                        } elseif (!$values[0]) {
                                $values[0] = $index++;
                        }
                        if($values[1]) {//call different constructors, FIXME: should be able to call just case 4
                                switch (count($values)) {
                                case 2:
                                        array_push($infoboxData, new InfoboxDatum($values[0], $values[1]));
                                        break;
                                case 3:
                                        array_push($infoboxData, new InfoboxDatum($values[0], $values[1], $values[2]));
                                        break;
                                case 4:
                                        array_push($infoboxData, new InfoboxDatum($values[0], $values[1], $values[2], $values[3]));
                                        break;
                                }
                        }
                } 
                return $infoboxData;
        }
 
        /**
         * Saves the InfoboxData on record save.  Called using the ArticleSaveComplete hook.
         *
         * @param &$article The article object already saved.
         * @param                       Many others... all required by the hook. Not used.
         * @private
         */
        function save(&$article, &$user, &$text, &$summary, &$minoredit, &$watchthis, &$sectionanchor, &$flags) { 
                # Update the links tables
                $ibd = new InfoboxDataUpdate( $article->getTitle(), $this->mInfoboxData );
                $ibd->doUpdate();
        }
 
        /**
         * Deletes the InfoboxData on record delete.  Called using the ArticleDeleteComplete hook.
         *
         * @param &$article The article object deleted.
         * @param                       Others... all required by the hook. Not used.   
         * @private
         */
        function delete(&$article, &$user, $reason) {
                $dbw =& wfGetDB( DB_MASTER );
                $dbw->delete( 'infoboxdata', array( 'ib_from' => $article->getID() ) );
        }
}
 
/**
 * InfoboxDatum
 * 
 * @package MediaWiki
 */
class InfoboxDatum {
 
        var    $mName,
                        $mValue,
                        $mIsValid,
                        $mComment;
 
        /**
         * Constructor
         *
         * @param name          
         * @param value
         * @param isValid       Optional, default is 1
         * @param comment       Optional
         * @private
         */
        function InfoboxDatum( $name, $value, $isValid = 1, $comment = "") {
                $this->mName = trim($name);
                $this->mValue = trim($value);
                $this->mIsValid = intval($isValid);
                $this->mComment = trim($comment);
 
                if( !is_numeric($isValid) ||($this->mIsValid != $isValid) || ($this->mIsValid < 0)) {
                        $this->mComment = $isValid;
                        $this->mIsValid = 1;
                }
        }
 
        function getName()                   { return $this->mName; }
        function getValue()                  { return $this->mValue; }
        function getValidFlag()              { return $this->mIsValid; }
        function getComment()                { return $this->mComment; }
}
 
/**
 * Modeled after LinksUpdate from 1.9.3 
 * Can't use ParserOutput object because called from ArticleSaveComplete hook
 * should probably refactor to be generic or merge with InfoboxDataCapture class
 * @package MediaWiki
 */
class InfoboxDataUpdate {
 
        /**@{{
         * @private
         */
        var $mId,            //!< Page ID of the article linked from
                $mTitle,         //!< Title object of the article linked from
                $mDb,            //!< Database connection reference
                $mOptions,       //!< SELECT options to be used (array)
                $mInfoboxData;   //!< infobox data to be uploaded into database
        /**@}}*/
 
        /**
         * Constructor
         * Initialize private variables
         * @param title                 Title object
         * @param infoboxData   3-D array holding key value pairs for multiple data blocks
         */
        function InfoboxDataUpdate( $title, $infoboxData) {
                global $wgAntiLockFlags;
 
                if ( $wgAntiLockFlags & ALF_NO_LINK_LOCK ) {
                        $this->mOptions = array();
                } else {
                        $this->mOptions = array( 'FOR UPDATE' );
                }
                $this->mDb =& wfGetDB( DB_MASTER );
 
                $this->mTitle = $title;
                $this->mId = $title->getArticleID();
 
                $this->mInfoboxData = $infoboxData;               
        }
 
        /**
         * Update link tables with outgoing links from an updated article
         */
        function doUpdate() {
                $this->doDumbUpdate();
        }
 
        /**
         * Link update which clears the previous entries and inserts new ones
         * May be slower or faster depending on level of lock contention and write speed of DB
         * Also useful where link table corruption needs to be repaired, e.g. in refreshLinks.php
         */
        function doDumbUpdate() {
                $fname = 'InfoboxData::doDumbUpdate';
                wfProfileIn( $fname );
 
                $this->dumbTableUpdate( 'infoboxdata',  $this->getInfoboxDataInsertions(), 'ib_from' );
                wfProfileOut( $fname );
        }
 
        function dumbTableUpdate( $table, $insertions, $fromField ) {
                $fname = 'InfoboxData::dumbTableUpdate';
                $this->mDb->delete( $table, array( $fromField => $this->mId ), $fname );
                if ( count( $insertions ) ) {
                        $this->mDb->insert( $table, $insertions, $fname, array( 'IGNORE' ) );
                }
        }
 
        /**
         * Get an array of data from dataentry tags insertions. Like getLinkInsertions()
         * @private
         */
        function getInfoboxDataInsertions( $existing = array() ) {
                wfProfileIn( __METHOD__ );
                $arr = array();
                foreach( $this->mInfoboxData as $blockOrder => $datablock ) {
                        foreach ( $datablock as $name => $attributes ) {
                                foreach ( $attributes as $attributeOrder => $attribute ) {
                                        $arr[] = array(
                                                'ib_from'                      => $this->mId,
                                                'ib_datablock_order'=> $blockOrder,
                                                'ib_datablock_name'    => $name,
                                                'ib_attribute_order'=> $attributeOrder,
                                                'ib_attribute'                 => $attribute->getName(),
                                                'ib_isvalid'           => $attribute->getValidFlag(),
                                                'ib_value'                     => $attribute->getValue(),
                                                'ib_comment'                   => $attribute->getComment()
                                        );
                                }
                        }
                }
                wfProfileOut( __METHOD__ );
                return $arr;
        }
}

[edit] See also

Personal tools