User:Peter17/Reasonably efficient interwiki transclusion

This is a draft for my GSoC-2010 project, reasonably efficient interwiki transclusion, written after discussing with my mentor User:Catrope.

Conventionally, we will use the expressions:
 * home wiki for the wiki which hosts a template
 * distant wiki for another wiki that wants to use that template
 * wanted template for the template hosted on the home wiki and called by a page of the distant wiki

Of course, there might be a lot of pages and a lot of distant wikis that request the wanted template.

Current state
Currently, some functions ( and   in  ) allow interwiki transclusion from a home wiki to a distant wiki.

The interwiki table contains the known interwiki prefixes. For each of them, the value  can be set to 1 to allow (or 0 to disallow) transclusion from that wiki.

If  is set to true, when a transclusion call refers to an article of another wiki and if transclusion from that wiki is allowed by , then:
 * checks whether the template has been cached less that 1h (by default) ago
 * if yes, then, the cached template is used
 * if not, then, a GET request is made to retrieve the content from the home wiki

There are two different possible formats to retrieve (and cache) the content:  and.

The default with this system is that the data is cached for an arbitrary time, which means:
 * When a template is almost never modified, the cache is still updated whereas it is useless, so, we lose some performance.
 * When a template is actually modified, in the worst case, the cache will have to wait 1h before being updated and the users of the distant wiki will not see the changes made to the template during that time.

So, the cache should be updated if and only if necessary.

I made some tests on May 10th.

Good points

 * It is working. I mean I can transclude my userpage from this wiki (mediawiki.org) to a distant wiki hosted on my computer using the syntax !
 * The links become full links: /Reasonably efficient interwiki transclusion becomes http://www.mediawiki.org/wiki/User:Peter17/Reasonably_efficient_interwiki_transclusion
 * If the wanted page calls some templates or parser functions, then, they are used to render the content, which is good!

Current issues

 * When transcluding, what is transcluded is just the content of the page templateName of the home wiki, which means:
 * The parameters are totally ignored.
 * The instructions, behave as if it was not a transclusion which is the opposite of the expected behavior...
 * The transcluded content is actually cached for 1h, which means even purging the cache will not update it...

First version
Here, we assume that each distant wiki caches the templates it transcludes from the home wiki.

On the distant wiki, when a page that calls a template is rendered:
 * First, the distant wiki makes a request to the API of the home wiki to get the last modification timestamp of the wanted template.
 * Then it looks at the local cache to compare that timestamp with the timestamp of the cached template.
 * Then, the cached template is updated only if necessary and used to render the page.

The infrastructure for this request already exists in the API.

Second version
Since the template might be requested by a lot of distant wikis, we think that it would be more adapted to cache the data on the home wiki instead of doing that for each distant wiki.

Actually, if, like in the previous version, the cache is updated only when the page is rendered, it seems useless to cache the template on the distant wiki(s), because the page(s) that calls the template are themselves cached.

Consequence
In both first and second versions above, the page and template are updated only when the page is rendered. As a consequence, if the template has been updated in the meantime, purging the cache is necessary to properly see the transcluded template.

Third version (currently preferred)
Instead of checking whether a template has changed or not when parsing the page, we could use a more efficient solution getting inspiration from Extension:GlobalUsage. In this solution, we would use a shared database, with a table. Every home and distant wikis could read and write in this table:
 * When a distant wiki parses a page, all its remote template calls will be listed in this table, with:
 * The address of the page that calls the template (maybe interwikiprefix + page name?)
 * The address of the called template (maybe interwikiprefix + template name?)
 * When a remote template is modified (or deleted), it will purge the cache of all the pages that link to it (listed in the table) through API:Purge

This way, distant pages are always up-to-date. However, a question remains about wikis that don't have access to the share database.

Remarks

 * The template parameters might sometimes lead to large requests, so, a POST request to the API would be more adapted that the current GET.
 * It would be convenient to have the addresses of each  in the   table, in order to use them to perform the requests (retrieving the templates, purging the caches...).
 * We are not forced to use one shared database for all interwiki transclusions: we could also add database informations in the  to know which shared table to use when transcluding templates from each particular home wiki.

Special cases
It seems quite easy to understand how this approach would work for simple cases, such as transcluding a template which has no parameter and simply returns some HTML content.

However, some more complex cases may occur.

Complex templates
What should happen if the wanted template requires the transclusion of other templates or the execution of parser functions?

Simply getting its wikitext and parse it on the distant wiki might lead to some problems:
 * the needed templates might not exist on the distant wiki and, so, should be transcluded from the home wiki (and so on if they include other templates)
 * if the templates exist on the distant wiki, then, they could be different on two distinct distant wikis which would lead to different results when transcluding templates from different distant wikis

So, it seems to be a better idea to preprocess the required template on the home wiki and then send the result to the distant wiki. This way, template calls and parser function calls are done on the home wiki and the result is the same for any distant wiki.

If some simple (simple text) parameters are given, then, they should be substituted during this preprocessing.

It seems that the API is already capable of doing such a thing (see API:Parsing wikitext): http://fr.wikipedia.org/w/api.php?action=expandtemplates&text= returns:

 &lt;span style=&quot;white-space:nowrap&quot;&gt;1&amp;nbsp;km&lt;/span&gt;

So, the parser functions of the template are executed and the parameters are substituted.

Templates with complex parameters
Now, let's assume that the value of a parameter is the result of a parser function or template.

If we pass those parameters to the home wiki when requiring the template, then, that wiki might not know what to do with them, because it might not know the corresponding templates.

So, we think that those complex parameters should be parsed on the distant wiki, so they keep their usual behavior on this wiki, and, then, the corresponding results should be passed as arguments for the wanted template.

Caching
The need for caching every template request via the API or just some of them (requests for the templates which have no parameter) still needs to be discussed.

On one hand, we should cache as much as possible to avoid parsing again and again the same templates. On the other hand, some templates will be called with a lot of different arguments and maybe it's not necessary to cache all of them...

Some tests should be made to decide which solution is better.