Extension:WikiOpener

Foreword
One of the main limitation of wikis is the fact that the data must all be contained in the database on which it relies, meaning that it cannot import data coming from elsewhere. In some fields, were data are organised in tables, this limitation is quite harmful as it might prevent potential users to use Mediawiki to make their database available on the web and all the wiki facilities would then have to be reimplemented.

This extension has been developed in the context of a bioinformatics/genomic context aiming at augmenting the wiki features to be able to include and update external (structured) data and run analysis tools. It allows to rapidly add specific component to include content extracted from databases (local or remote), files, and so on, to retrieve online data such as DAS servers, xml feeds, and so on, and also pass parameters to some software and format the results prior to their inclusion in the generated page.

For the above reasons, please bear in mind that this extension is generic and needs to be extended by specific components and is thus intended to developers.

What can this extension do?
The main purpose of this 'generic' extension is to interact with outside through the registration of specific components to: With this aim, a new tag, inout, has been defined and has the following usage:  Specific Component | Layout template | optional parameters separated by vertical bars . The extension will fetch the layout template (wiki text stored as a wiki article) and will replace the variable names by their actual values returned by the specific component (PHP wrapper).
 * manage structured data (databases, files, ...)
 * run (on-the-fly) analysis tools

For example, a specific component can retrieve data from a database about genes associated to a specific congenital heart defect (CHD) and would be called as follows:  genesForChdProvider | genesForChdLayoutTemplate | Atrial septal defect 

Additionally, the extension provides the possibility to register specific components to be called to include content before and/or after an article. The registration can be done for all pages, a specific page, all pages in a namespace, or a specific page of a given namespace. When combined with the possibility of showing data present elsewhere, this is of great interest. Indeed, a page in which no data has already been manually entered via the classical way of editing wiki can already be populated via automatic queries in other databases. In our example of congenital heart defect database, we might have a wiki page containing information found in other databases about a specific heart defect with no other manually entered information.

Editing the other data sources on which the wiki relies should be as user friendly as editing a simple wiki page. This is why, to ease interaction with structured data, a mechanism to generate web forms has been implemented. Similar to the rendering of a components results, the layout is specified in wiki text in an article. Also, another article describes the form input fields (textfield, textarea, popup menu, ...). Then a specific component may be implemented to retrieve and provide default values in the web form and also to process posted values.

As said previously, articles with no wiki content may actually contain automatically included content (all pages in a namespace for example). However, these pages won't be discovered by the classical wiki Search tool, the extension takes this into account by extending :
 * the search function to perform a search in a database for example,
 * the articleExists function so that links look like they point to an existing page.

Download instructions
The extension code is available at http://www.esat.kuleuven.be/~bioiuser/chdwiki/inout.tar.gz

Installation
After downloading, unpack and move (or copy for file ownership) the entire inout directory to the extensions directory. Note that only inout/inout.php is required, the rest of the files are provided as a tutorial or quickstart guide.

To install this extension, add the following to LocalSettings.php:

Manual modifications to MediaWiki source
The extension makes use of hooks to
 * 1) automatically include content before or after an article, therefore the hook must be added in includes/Article.php
 * 2) extend the search engine to be able to search outside resources (e.g. databases). This means that includes/searchEngine.php is to be modified.
 * 3) trick the articleExists function to properly link to empty wiki articles that do contain automatic content. This is done in includes/Title.php

Note: if you only intend to use the inout tag, then these modifications are not needed.

MediaWiki 1.13.1
In MediaWiki 1.13.1 before line 801, i.e. in method view before the comment Fetch content and check for errors add the following hook:

Then after the previous block (if (! $outputDone) ...), originally after line 813 in MediaWiki 1.13.1 and before the comment Another whitelist check in case oldid is altering the title:

MediaWiki 1.14.0
In MediaWiki 1.14.0, before line 746, i.e., in method view before the comment If we got diff and oldid in the query, we want to see a diff page instead of the article. add the following hook : Then, just before line 924 and comment title may have been set from the cache add the following hook:

includes/SearchEngine.php
In method getNearMatch around lines 64-72 (MW 1.13.1) or lines 78-86 (MW 1.14.0), replace with

MediaWiki 1.13.1
Replace method (lines 3058-3069 in MW 1.13.1) with

MediaWiki 1.14.0
Replace method (lines 3143-3166 in MW 1.14.0) by method

Registration
Actually, components correspond to files. Therefore, we need to specify the path to the directory that contains those files.

In LocalSettings or wherever after the execution of

Specify the path where the PHP files for the components can be found. For example:

An example component: HugoTextMining
Create the file for your component. For example, we will define a HugoTextMining component that retrieves XML encoded data from a DAS server on publications related to the passed query, thus we will create the file hugotextmining.php in the directory extensions/inout/components</tt> previously registered:

Using a component
The syntax to call a component is the following:  component.method | layout [ | parameters separated by vertical bars ] </tt>

For example to call our HugoTextMiningComponent to retrieve publications relative to protein TP53 with default layout we will write in the wikitext of an article:  HugoTextMining.getReferences|defaultLayout|TP53 </tt>

Alternatively, we can define a specific layout. For this, replace the defaultLayout in the previous call by the wikipage containing the template:  HugoTextMining.getReferences|HugoTextMining.getReferences.layout|TP53 </tt>

To this, create a wikipage called HugoTextMining.getReferences.layout</tt> to create the layout.

Here is an example of a possible layout: {REFS}.foreach: * [{LINK} {PMID}]: {NOTE} {REFS}.end_foreach We will obtain a bulleted list. Each item will be made of
 * 1) a link to the publication's abstract [{LINK} {PMID}]</tt>
 * 2) extract of the publications where the query is cited

Automating the call to components
It is possible to register a component to be called:
 * on a specific namespace:article (though it's easier to put the call directly in the article source)
 * on every page of a given namespace
 * on every page

Going on with our HugoTextMining example, we will create the HugoTextMining namespace and register this component to be called to include references at the end of every article in this namespace.

For this, in LocalSettings or wherever after the execution of we add the following

The result will appear on any page of that namespace, for example for TP53: index.php/HugoTextMining:TP53</tt>.

Taking automatically created pages into account
The possibility to add automatic content on pages of a given namespace may present the drawback that given pages exist when you enter their URL (e.g. index.php/HugoTextMining:TP53) but cannot be linked to (red link) and cannot be searched via the classical wiki search tool.

The inout extension allows to circumvent these drawback.

Add existing pages
When applying the function _inout_register_namespace_specific_content_afterArticle</tt> that allow to fill every page of the namespace HugoTextMining:</tt> with data available in an external resource, these pages cannot be linked to as, for the MediaWiki motor, this page has never been created. These pages can thus only be reached by entering the URL in the address bar of the browser.

We thus have to indicate manually to MediaWiki which pages exist. In our text-mining example, let's assume that all genes for which there is at least one reference do correspond to a page of the wiki.

To this, add the following lines to the LocalSettings.php</tt> file.

Search automatic
If you want to search for a page that was not created but that is automatically filled. You need to circumvent the MediaWiki basic search function.

In our TextMining test case, we have to circumvent the search function so that it redirects the user to the right page of the wiki if the gene name he enters in the search filled contains references in the database.

Tip : removing the empty page tag
Even if the the page is filled with content from external resources, the page will contain the classical wiki tag ''There is currently no text in this page. You can search for ...''. To remove it from all pages of a given namespace, you can follow this tip.

First, add the following function in the MediaWiki code (e.g. in inout.php</tt> or in LocalSettings.php</tt>).

Then add an automatic call to this function after each page of the given namespace. In our classical example, the HugoTextMining</tt> function we previously described will become.

Web Forms
A simple mechanism for web form creation and posting is implemented. Its principle is to register the provided internal component to a given namespace (in our example, it will be 'Form'). Then, when an article from that namespace is consulted, the layout is retrieved from the wiki (in our example 'Form.article.layout', article being the actual page viewed) as well as the specification of the input fields (in our example 'Form.article.inputFields). Then, the form is generated and filled with values provided by a specific component (in our example, it is registered in extensions/forms directory and must be named article.php, article being the page viewed).

Form namespace creation
The first step is to create a namespace to automatically fetch form layout and specifications, and then to generate the final forms.

In our example, we will add the following to LocalSettings.php

Creating a Web Form
We will create a form to interact with a fake database. Therefore, our form example is named FakeDb</tt> and will be accessible at <tt>index.php/Form:FakeDb</tt>. To specify the layout, we have to create a page <tt>Form.FakeDb.layout</tt> with the following wiki source:

{save} {delete} {id}

which should look like

{save} {delete}

{id}

In this layout, the input fields are named between curly braces and must be specified in <tt>index.php/Form.FakeDb.inputFields</tt>, which for our example contains the following: support     | select     | 1    | {SUPPORT}      | no good correlation between CHD type and candidate gene=0;unconfirmed: a single case report=1;likely: 2 or more patients (with CHD and a mutation in the candidate gene)=2;confirmed: 2 or more independent reports > 1% incidence=3 chd         | text       | 80   | {CHD}          | unused gene        | text       | 16   | {GENE}         | unused comments    | textarea   | 3x80 | {COMMENTS}     | Free text to comment the update. save        | submit     | 1    | Save           | delete      | confirm    | 1    | Delete         | Delete this association | youpi id          | hidden     | 16   | {ID}           | used for updates

The syntax is :
 * input field name
 * type (text for textfield, textarea, select, submit, confirm for submit with javascript confirmation, hidden, locked for uneditable textfield)
 * size: not always relevant which in this case is ignored. For text, it corresponds to the width. For textarea, it is in the form nblines x nbColumns.
 * default value: retrieved from the specific component and thus should correspond to what is returned by the specific component PHP code.
 * select options for a select field, otherwise the last value is ignored. The syntax for select options is pairs of text=value separated by semi colons.

Now, we need to provide default values and process what is posted by the form. In our example, we have to create the <tt>extensions/forms/fakedb.php</tt> (<tt>fakedb</tt> is the page name AND the filename).This file must contain the implementation of 2 methods:
 * _fakedb_fetchData($_GET): to provide default values. The $_GET variable is passed so that <tt>Form:FakeDB?id=3</tt> will be prefilled with value correspond to the tuple having id=3 in our database.
 * _fakedb_processPost($_POST)

Now, we are ready, and the result can be seen at <tt>Form:FakeDb</tt> for a new entry or <tt>Form:FakeDb?id=3</tt> to edit an existing entry.

Using an external resource for the select fields
Let us be a bit more tricky!

When there are many possibilites for the select field. It might be useful to load all the possibilites from an external resource or a database instead of filling the numerous possibilities in the <tt>Form.Article.inputFields</tt> wiki page. To do this, we need to use the <tt> </tt> tag in the <tt>Form.Article.inputFields</tt> wiki page. This tag behaves exactly as the previously described <tt> </tt> tag except that it only can be used in <tt>inputFields</tt> pages.

Let's update the previous example, with a new form that we call <tt>Form:FakeDbSelect</tt>. In this example, the CHD field of the previous <tt>Form:FakeDb</tt> form will automatically be prefilled with data coming from an external file called <tt>chd.tab</tt> that contains a list of CHD (one desease per line).

First of all, we need to use in the in the <tt>Form.FakeDbSelect.inputFields</tt>. support     | select     | 1    | {SUPPORT}      | no good correlation between CHD type and candidate gene=0;unconfirmed: a single case report=1;likely: 2 or more patients (with CHD and a mutation in the candidate gene)=2;confirmed: 2 or more independent reports > 1% incidence=3 chd         | select     | 1    | {CHD}          | chdlist|selectlayout gene        | text       | 16   | {GENE}         | unused comments    | textarea   | 3x80 | {COMMENTS}     | Free text to comment the update. save        | submit     | 1    | Save           | delete      | confirm    | 1    | Delete         | Delete this association | youpi id          | hidden     | 16   | {ID}           | used for updates

Secondly, as shown in the <tt> </tt> tag, we need to implement the default function of the component file <tt>components/chdlist.php</tt> that will return the list of CHD stored in the chd.tab file in the format required by the select <tt>(option 1=1;option 2=2)</tt>.

The function _chdlist_getChds will return an array containing the only field <tt>SELECT</tt>. The layout wiki page <tt>selectlayout</tt> is thus only <tt>{SELECT}</tt>

Thirdly, we have to modify the fetchData function that we presented for the file <tt>fakedb.php</tt> so that it takes into account the numerical value of the CHD corresponding to the entry that will be edited.

Code
<tt>extensions/inout/inout.php:</tt>