User:Peter17/Reasonably efficient interwiki transclusion

This is a draft for my GSoC-2010 project, reasonably efficient interwiki transclusion, written after discussing with my mentor User:Catrope.

Conventionally, we will use the expressions:
 * home wiki for the wiki which hosts a template
 * distant wiki for another wiki that wants to use that template
 * wanted template for the template hosted on the home wiki and called by a page of the distant wiki

Of course, there might be a lot of pages and a lot of distant wikis that request the wanted template.

Current state
Currently, some functions (interwikiTransclude and fetchScaryTemplateMaybeFromCache in Parser.php) allow interwiki transclusion from a home wiki to a distant wiki.

Once $wgEnableScaryTranscluding is set to true, if a template is to be transcluded from another wiki, then:
 * fetchScaryTemplateMaybeFromCache checks whether the template has been cached less that 1h (by default) ago
 * if yes, then, the cached template is used
 * if not, then, a GET request is made to retrieve the content from the home wiki

There are two different ways to retrieve (and cache) the content: raw wikitext and html.

The default with this system is that the data is cached for an arbitrary time, which means:
 * When a template is almost never modified, the cache is still updated whereas it is useless, so, we loose some performance.
 * When a template is actually modified, in the worst case, the cache will have to wait 1h before being updated and the users of the distant wiki will not see the changes made to the template during that time.

So, the cache should be updated if and only if necessary.

Proposed approach
Since the template might be requested by a lot of distant wikis, we think that it would be more adapted to cache the data on the home wiki instead of doing that for each distant wiki.

Then, when a page on the distant wiki calls a template:
 * First, the distant wiki makes a request to the API of the home wiki to get the last modification timestamp of the template.
 * Then it looks at the local cache to compare that timestamp with the timestamp of the page that calls the template.
 * Then, the cached page is updated only if necessary by retrieving the template.

The infrastructure for this request already exists in the API.

For performance reasons, this might be done only when action=purge is passed to index.php (???)

Special cases
It seems quite easy to understand how this approach would work for simple cases, such as transcluding a template which has no parameter and simply returns some HTML content.

However, some more complex cases may occur.

Complex templates
What should happen if the wanted template requires the transclusion of other templates or the execution of parser functions?

Simply getting its wikitext and parse it on the distant wiki might lead to some problems:
 * the needed templates might not exist on the distant wiki and, so, should be transcluded from the home wiki (and so on if they include other templates)
 * if the templates exist on the distant wiki, then, they could be different on two distinct distant wikis which would lead to different results when transcluding templates from different distant wikis

So, it seems to be a better idea to preprocess the required template on the home wiki and then send the result to the distant wiki. This way, template calls and parser function calls are done on the home wiki and the result is the same for any distant wiki.

If some simple (simple text) parameters are given, then, they should be substituted during this preprocessing.

It seems that the API is already capable of doing such a thing (see API:Parsing wikitext): http://fr.wikipedia.org/w/api.php?action=expandtemplates&text= returns:

 &lt;span style=&quot;white-space:nowrap&quot;&gt;1&amp;nbsp;km&lt;/span&gt;

So, the parser functions of the template are executed and the parameters are substituted.

Templates with complex parameters
Now, let's assume that the value of a parameter is the result of a parser function or template.

If we pass those parameters to the home wiki when requiring the template, then, that wiki might not know what to do with them, because it might not know the corresponding templates.

So, we think that those complex parameters should be parsed on the distant wiki, so they keep their usual behavior on this wiki, and, then, the corresponding results should be passed as arguments for the wanted template.

Comments

 * The template parameters might sometimes lead to large requests, so, a POST request to the API would be more adapted that the current GET.

Caching
The need for caching every template request via the API or just some of them (requests for the templates which have no parameter) still needs to be discussed.

On one hand, we should cache as much as possible to avoid parsing again and again the same templates. On the other hand, some templates will be called with a lot of different arguments and maybe it's not necessary to cache all of them...

Some tests should be made to decide which solution is better.