Extension:Page Object Model
|
Page Object Model (POM) Release status: beta |
|
|---|---|
| Implementation | Data extraction, API |
| Description | A set of classes to be used by bots or other extensions for MediaWiki page manipulation in the manner similar to HTML's DOM |
| Author(s) | Sergey Chernyshevtalk |
| Last version | 0.1.3 (May 29, 2008) |
| License | LGPL |
| Download | SVN, tarball |
| Check usage and version matrix | |
Page Object Model or POM is a set of classes for abstraction of MediaWiki syntax to allow easy extraction and manipulation of pages within other programs (extensions, bots, API handlers and etc). The name shows the similarity of concepts between POM and DOM.
This extension also provides several MediaWiki API calls to allow wiki page changes using POM.
Contents |
Rationale [edit]
Semantic Forms development [edit]
Original idea of POM was developed when Semantic Forms code became quite complex and in the same time useful for other extensions. Extension is currently under development and this page will be updated as it progresses.
Third party extensions [edit]
Other extensions and maintenance scripts can benefit from having page and template parsing abstracted for them.
Examples include:
- command line tools that want to manipulate just some template parameters without parsing whole page
- MediaWiki API extensions to allow AJAX tools to edit some parts of the page without caring about parsing the content of the pages
- tools to allow for automated addition or editing of information, also using the MW API
- outside applications that can use a semantic wiki as a database, such an online spreadsheet or a PDA application, also using the MW API
MediaWiki syntax [edit]
Although Semantic Forms will require more complex and high level handling of data which will probably use parameter typing and page definitions, it's better to separate that functionality into another logical layer to allow other code (not necessarily using Semantic Forms or Extension:Semantic MediaWiki) to be able to use it without introducing unnecessary dependencies, therefore it's proposed to only include handling for native MediaWiki syntax rules with initial development of template handling logic.
Fundamental rules [edit]
There are several rules that must be honored by all code and extensions and tested by test framework:
- No side-effects - any wiki-source parsed by POM and saved back without modification must match the original
Class structures [edit]
Here is a possible (partially implemented) class structure for POM:
- POMPage - represents a page.
- templates() gets either all templates or all POMTemplateCollection objects for the page
- templates('template_name') is a reference to the POMTemplateCollection object for 'template name' template
- POMElement - abstract class, represents any data element within a page
- POMTemplateCollection - represents all templates on a page that are of a certain type.
- count() gets the number of such templates
- [i] references the i-th instance of this set
- append($template) adds another template to this set
- POMTemplate - subclass of POMElement - represents a single template instance on a page.
- parameter('parameter_name') is a reference to the POMTemplateParameter object with this name
- setParameter($name, value) sets the value of a parameter in this template
- POMTemplateParameter - subclass of POMElement - represents a single parameter in a template. (This class might not be necessary.)
- POMTextNode - subclass of POMElement - represents plain text node
Parsers [edit]
Parsers are parsing functionality modules that are called when POMPage object is being created. POMPage just creates single POMTextNode of passed text and then relies on these parsers to add different types nodes. This structure allows for modular parsing, e.g. some program might need to work only with templates and another program might need to work only with links or categories - there is no need to do all types of parsing for all types of tasks.
- POMParser - abstract class to be subclassed by specific parsers
- POMTemplateParser - parser class that identifies MediaWiki templates within text nodes of the page and replaces them with POMTemplate objects
- POMCategoryParser - parser class that identifies category definitions within text nodes of the page and replaces them with POMCategory objects (example of non-template parser)
- POMLinkParser - parser class that identifies links within text nodes of the page and replaces them with POMLink (POMInternalLink and POMExternalLink) objects (example of parser that adds several types of objects
Code use cases [edit]
Creating POM object [edit]
$pagecontents = getPageFromName('User:Sergey Chernyshev'); // some function that returns page contents $userpage = new POMPage($pagecontents);
Handling multiple instance templates [edit]
// instance of POMTemplateCollection class (a collection of POMTemplate objects) my $favorites = $userpage->templates('Favorite'); // instance of POMTemplate class my $new_favorite = $favorites->createInstance(); $new_favorite->setParameter('url', 'http://google.com'); $new_favorite->setParameter('title', 'Google'); $favorites->append($new_favorite); $favorites[4]->setParameter('url', 'http://www.google.com'); $favorites[5]->delete();
Use cases [edit]
Command line image localization tool [edit]
Some fictional tool that will download files referenced in template by "fileURL" parameter and add local version using "mediaTitle" parameter to change external links to images/PDF documents to locally hosted.
Real life scenario is to have local copies for TechPresentations.org presentation files backed up locally.
Original wiki article example [edit]
{{Presentation
...
|fileURL=http://ajaxexperience.techtarget.com/images/Presentations/Crockford_Douglas_JavaScript.ppt.pdf
}}
Pseudocode [edit]
Get the url of the file to download
$javascript_the_good_parts = getPageContents('JavaScript The Good Parts'); $article = new POMPage($javascript_the_good_parts); $url = $article->templates('Presentation')[0]->getParameter('fileURL');
Download the file and save it as local Wiki media title (out of scope of POM)
$filename = download($url); $mediaTitle = addLocalFile(makeLocalTitle($url), $filename);
Set new parameter that references this new local media title
$article->templates('Presentation')[0]->setParameter('mediaTitle') = $mediaTitle; $resultwikitext = $article->toWikiText(); ...
Resulting wiki article [edit]
{{Presentation
...
|fileURL=http://ajaxexperience.techtarget.com/images/Presentations/Crockford_Douglas_JavaScript.ppt.pdf
|mediaTitle=Media:Crockford_Douglas_JavaScript.ppt.pdf
}}
Usage [edit]
To start using Page Object Model classes, just download the code and add this line to your code:
require_once('/path/to/extracted/archive/POM.php')
Examples [edit]
Get this page and increase version number [edit]
require_once('POM.php'); # Get document from MediaWiki server $pom = new POMPage(join(file('http://www.mediawiki.org/w/index.php?title=Extension:Page_Object_Model&action=raw'))); # Check current version $ver = $pom->templates['Extension'][0]->getParameter('version'); echo "Current version: $ver\n"; # Increase version number by fraction $pom->templates['Extension'][0]->setParameter('version', $ver + 0.1); # Do whatever you want with result (we'll just display it) echo "Document with increased version:\n".$pom->asString();
For more examples see examples folder in SVN repository for code examples.
MediaWiki API calls [edit]
If used as MediaWiki extension, it also adds pomgettplparam and pomsettplparam MediaWiki API calls. When installed, see api.php page for the list of parameters and documentation.
Download [edit]
You can download just class library in a tarball or full code with test and stuff from SVN:
Class Library tarball [edit]
If you just want to use this library, just a set of classes is enough. Just download a tarbal below and unpack it:
SVN [edit]
To download code for the last release of Page Object Model project, issue following command:
svn checkout http://svn.wikimedia.org/svnroot/mediawiki/tags/extensions/PageObjectModel/REL_0_1_3/ PageObjectModel
Latest development code is moved to MediaWiki repository and now available here:
svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/PageObjectModel
Old Google Code repository can be found here: http://mediawiki-page-object-model.googlecode.com/svn/
Installation [edit]
If you're planning to use POM's MediaWiki API calls, you need to move this code to $IP/extensions/PageObjectModel/ folder and include this line into your LocalSettings.php:
include_once("$IP/extensions/PageObjectModel/PageObjectModel.php");
Development [edit]
Page Object Model should be developed independent from MediaWiki or any MediaWiki extensions, special PEAR modules or other PHP configuration parameters. The only dependency is PHP v5.x
These requirements are imposed to make sure that this code can be used in any configuration of user's system.
Testing [edit]
Project is tested using PHPUnit, all tests are stored in /tests/ folder and tests.xml is a configuration file which is used in main Makefile of the project.
To test the project, all you need to do is to type:
make
or
make test
in the root of the project.
CHANGES [edit]
- 0.1.3 (May 29, 2008) - Fixed template parser to support nested templates (templates as parameter values).
- 0.1.2 (May 14, 2008) - Added
pomgettplparamandpomsettplparamMediaWiki API calls.
See also [edit]
- MediaWiki markup specification (ANTLR, BNF)
- Extension:Semantic Forms
- Extension:Semantic MediaWiki
- Original proposal for POM
External links [edit]
- Document Object Model (DOM) - W3C's standard object model for representing HTML or XML and related formats
- Google code project for this extension
