Extension:Page Object Model

Page Object Model or POM is a set of classes for abstraction of MediaWiki syntax to allow easy extraction and manipulation of pages within other programs (extensions, bots, API handlers and etc). The name shows the similarity of concepts between POM and DOM.

Semantic Forms development
Original idea of POM was developed when Semantic Forms code became quite complex and in the same time useful for other extensions. Extension is currently under development and this page will be updated as it progresses.

Third party extensions
Other extensions and maintenance scripts can benefit from having page and template parsing abstracted for them.

Examples include:
 * command line tools that want to manipulate just some template parameters without parsing whole page
 * MediaWiki API extensions to allow AJAX tools to edit some parts of the page without caring about parsing the content of the pages
 * tools to allow for automated addition or editing of information, also using the MW API
 * outside applications that can use a semantic wiki as a database, such an online spreadsheet or a PDA application, also using the MW API

MediaWiki syntax
Although Semantic Forms will require more complex and high level handling of data which will probably use parameter typing and page definitions, it's better to separate that functionality into another logical layer to allow other code (not necessarily using Semantic Forms or Extension:Semantic MediaWiki) to be able to use it without introducing unnecessary dependencies, therefore it's proposed to only include handling for native MediaWiki syntax rules with initial development of template handling logic.

Fundamental rules
There are several rule that must be honored by all code and extensions and tested by test framework:
 * No side-effects - any wiki-source parsed by POM and saved back without modification must match the original

Class structures
Here is a possible (partially implemented) class structure for POM:


 * POMPage - represents a page.
 * templates gets either all templates or all POMTemplateCollection objects for the page
 * templates('template_name') is a reference to the POMTemplateCollection object for 'template name' template


 * POMElement - abstract class, represents any data element within a page


 * POMTemplateCollection - represents all templates on a page that are of a certain type.
 * count gets the number of such templates
 * [i] references the i-th instance of this set
 * append($template) adds another template to this set


 * POMTemplate - subclass of POMElement - represents a single template instance on a page.
 * parameter('parameter_name') is a reference to the POMTemplateParameter object with this name
 * setParameter($name, value) sets the value of a parameter in this template


 * POMTemplateParameter - subclass of POMElement - represents a single parameter in a template. (This class might not be necessary.)


 * POMTextNode - subclass of POMElement - represents plain text node

Parsers
Parsers are parsing functionality modules that are called when POMPage object is being created. POMPage just creates single POMTextNode of passed text and then relies on these parsers to add different types nodes. This structure allows for modular parsing, e.g. some program might need to work only with templates and another program might need to work only with links or categories - there is no need to do all types of parsing for all types of tasks.


 * POMParser - abstract class to be subclassed by specific parsers
 * POMTemplateParser - parser class that identifies MediaWiki templates within text nodes of the page and replaces them with POMTemplate objects
 * POMCategoryParser - parser class that identifies category definitions within text nodes of the page and replaces them with POMCategory objects (example of non-template parser)
 * POMLinkParser - parser class that identifies links within text nodes of the page and replaces them with POMLink (POMInternalLink and POMExternalLink) objects (example of parser that adds several types of objects

Command line image localization tool
Some fictional tool that will download files referenced in template by "fileURL" parameter and add local version using "mediaTitle" parameter to change external links to images/PDF documents to locally hosted.

Real life scenario is to have local copies for TechPresentations.org presentation files backed up locally.

Pseudocode
Get the url of the file to download Download the file and save it as local Wiki media title (out of scope of POM) Set new parameter that references this new local media title

Usage
To start using Page Object Model classes, just download the code and add this line to your code:

Get this page and increase version number
For more examples see examples folder in SVN repository for code examples.

Download
You can download just class library in a tarball or full code with test and stuff from SVN:

Class Library tarball
If you just want to use this library, just a set of classes is enough. Just download a tarbal below and unpack it:
 * PageObjectModel_0_1_1.tgz

SVN
To download code for Page Object Model project, issue following command:

svn checkout http://mediawiki-page-object-model.googlecode.com/svn/trunk/ PageObjectModel

Development
Page Object Model should be developed independent from MediaWiki or any MediaWiki extensions, special PEAR modules or other PHP configuration parameters. The only dependency is PHP v5.x

These requirements are imposed to make sure that this code can be used in any configuration of user's system.

Testing
Project is tested using PHPUnit, all tests are stored in /tests/ folder and tests.xml is a configuration file which is used in main Makefile of the project.

To test the project, all you need to do is to type: make or make test in the root of the project.

Links

 * MediaWiki markup specification (ANTLR, BNF)
 * Extension:Semantic Forms extension
 * Extension:Semantic MediaWiki extension
 * original proposal for POM - might be a place for development of higher level of abstraction needed for Semantic Forms and similar extensions.
 * Document Object Model (DOM) - W3C's standard object model for representing HTML or XML and related formats
 * Google code project for this extension