Extension:Page Object Model

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual
Crystal Clear action run.png
Page Object Model (POM)

Release status: beta

Implementation Data extraction, API
Description A set of classes to be used by bots or other extensions for MediaWiki page manipulation in the manner similar to HTML's DOM
Author(s) Sergey Chernyshevtalk
Latest version 0.1.3 (2008-05-29)
License GNU Lesser General Public License
Download SVN, tarball

Translate the Page Object Model extension if it is available at translatewiki.net

Check usage and version matrix; code metrics

Page Object Model or POM is a set of classes for abstraction of MediaWiki syntax to allow easy extraction and manipulation of pages within other programs (extensions, bots, API handlers and etc). The name shows the similarity of concepts between POM and DOM.

This extension also provides several MediaWiki API calls to allow wiki page changes using POM.

Rationale[edit | edit source]

Semantic Forms development[edit | edit source]

Original idea of POM was developed when Semantic Forms code became quite complex and in the same time useful for other extensions. Extension is currently under development and this page will be updated as it progresses.

Third party extensions[edit | edit source]

Other extensions and maintenance scripts can benefit from having page and template parsing abstracted for them.

Examples include:

  • command line tools that want to manipulate just some template parameters without parsing whole page
  • MediaWiki API extensions to allow AJAX tools to edit some parts of the page without caring about parsing the content of the pages
  • tools to allow for automated addition or editing of information, also using the MW API
  • outside applications that can use a semantic wiki as a database, such an online spreadsheet or a PDA application, also using the MW API

MediaWiki syntax[edit | edit source]

Although Semantic Forms will require more complex and high level handling of data which will probably use parameter typing and page definitions, it's better to separate that functionality into another logical layer to allow other code (not necessarily using Semantic Forms or Extension:Semantic MediaWiki) to be able to use it without introducing unnecessary dependencies, therefore it's proposed to only include handling for native MediaWiki syntax rules with initial development of template handling logic.

Fundamental rules[edit | edit source]

There are several rules that must be honored by all code and extensions and tested by test framework:

  • No side-effects - any wiki-source parsed by POM and saved back without modification must match the original

Class structures[edit | edit source]

Here is a possible (partially implemented) class structure for POM:

  • POMPage - represents a page.
    • templates() gets either all templates or all POMTemplateCollection objects for the page
    • templates('template_name') is a reference to the POMTemplateCollection object for 'template name' template
  • POMElement - abstract class, represents any data element within a page
  • POMTemplateCollection - represents all templates on a page that are of a certain type.
    • count() gets the number of such templates
    • [i] references the i-th instance of this set
    • append($template) adds another template to this set
  • POMTemplate - subclass of POMElement - represents a single template instance on a page.
    • parameter('parameter_name') is a reference to the POMTemplateParameter object with this name
    • setParameter($name, value) sets the value of a parameter in this template
  • POMTemplateParameter - subclass of POMElement - represents a single parameter in a template. (This class might not be necessary.)
  • POMTextNode - subclass of POMElement - represents plain text node

Parsers[edit | edit source]

Parsers are parsing functionality modules that are called when POMPage object is being created. POMPage just creates single POMTextNode of passed text and then relies on these parsers to add different types nodes. This structure allows for modular parsing, e.g. some program might need to work only with templates and another program might need to work only with links or categories - there is no need to do all types of parsing for all types of tasks.

  • POMParser - abstract class to be subclassed by specific parsers
  • POMTemplateParser - parser class that identifies MediaWiki templates within text nodes of the page and replaces them with POMTemplate objects
  • POMCategoryParser - parser class that identifies category definitions within text nodes of the page and replaces them with POMCategory objects (example of non-template parser)
  • POMLinkParser - parser class that identifies links within text nodes of the page and replaces them with POMLink (POMInternalLink and POMExternalLink) objects (example of parser that adds several types of objects

Code use cases[edit | edit source]

Creating POM object[edit | edit source]

$pagecontents = getPageFromName('User:Sergey Chernyshev'); // some function that returns page contents
$userpage = new POMPage($pagecontents);

Handling multiple instance templates[edit | edit source]

// instance of POMTemplateCollection class (a collection of POMTemplate objects)
my $favorites = $userpage->templates('Favorite');

// instance of POMTemplate class
my $new_favorite = $favorites->createInstance();

$new_favorite->setParameter('url', 'http://google.com');
$new_favorite->setParameter('title', 'Google');


$favorites[4]->setParameter('url', 'http://www.google.com');

Use cases[edit | edit source]

Command line image localization tool[edit | edit source]

Some fictional tool that will download files referenced in template by "fileURL" parameter and add local version using "mediaTitle" parameter to change external links to images/PDF documents to locally hosted.

Real life scenario is to have local copies for TechPresentations.org presentation files backed up locally.

Original wiki article example[edit | edit source]


Pseudocode[edit | edit source]

Get the url of the file to download

$javascript_the_good_parts = getPageContents('JavaScript The Good Parts');
$article = new POMPage($javascript_the_good_parts);
$url = $article->templates('Presentation')[0]->getParameter('fileURL');

Download the file and save it as local Wiki media title (out of scope of POM)

$filename = download($url);
$mediaTitle = addLocalFile(makeLocalTitle($url), $filename);

Set new parameter that references this new local media title

$article->templates('Presentation')[0]->setParameter('mediaTitle') = $mediaTitle;
$resultwikitext = $article->toWikiText();

Resulting wiki article[edit | edit source]


Usage[edit | edit source]

To start using Page Object Model classes, just download the code and add this line to your code:


Examples[edit | edit source]

Get this page and increase version number[edit | edit source]


# Get document from MediaWiki server
$pom = new POMPage(join(file('http://www.mediawiki.org/w/index.php?title=Extension:Page_Object_Model&action=raw')));

# Check current version
$ver = $pom->templates['Extension'][0]->getParameter('version');
echo "Current version: $ver\n";

# Increase version number by fraction
$pom->templates['Extension'][0]->setParameter('version', $ver + 0.1);

# Do whatever you want with result (we'll just display it)
echo "Document with increased version:\n".$pom->asString();

For more examples see examples folder in SVN repository for code examples.

MediaWiki API calls[edit | edit source]

If used as MediaWiki extension, it also adds pomgettplparam and pomsettplparam MediaWiki API calls. When installed, see api.php page for the list of parameters and documentation.

Download[edit | edit source]

You can download just class library in a tarball or full code with test and stuff from SVN:

Class Library tarball[edit | edit source]

If you just want to use this library, just a set of classes is enough. Just download a tarbal below and unpack it:

SVN[edit | edit source]

To download code for the last release of Page Object Model project, issue following command:

svn checkout http://svn.wikimedia.org/svnroot/mediawiki/tags/extensions/PageObjectModel/REL_0_1_3/ PageObjectModel

Latest development code is moved to MediaWiki repository and now available here:

svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/PageObjectModel

Old Google Code repository can be found here: http://mediawiki-page-object-model.googlecode.com/svn/

Installation[edit | edit source]

If you're planning to use POM's MediaWiki API calls, you need to move this code to $IP/extensions/PageObjectModel/ folder and include this line into your LocalSettings.php:


Development[edit | edit source]

Page Object Model should be developed independent from MediaWiki or any MediaWiki extensions, special PEAR modules or other PHP configuration parameters. The only dependency is PHP v5.x

These requirements are imposed to make sure that this code can be used in any configuration of user's system.

Testing[edit | edit source]

Project is tested using PHPUnit, all tests are stored in /tests/ folder and tests.xml is a configuration file which is used in main Makefile of the project.

To test the project, all you need to do is to type:



make test

in the root of the project.

CHANGES[edit | edit source]

  • 0.1.3 (May 29, 2008) - Fixed template parser to support nested templates (templates as parameter values).
  • 0.1.2 (May 14, 2008) - Added pomgettplparam and pomsettplparam MediaWiki API calls.

See also[edit | edit source]

External links[edit | edit source]