User:Robchurch/Interwiki existence checks


Copied from IRC:

<robchurch> Hmm.
<robchurch> One horrible problem with interwiki links is, as mentioned, the
   resolution problem.
<robchurch> w:en:Foo and en:w:Foo are the same but work differently.
<robchurch> It would be rather nice to have a consistent mechanism to convert
   w:en:Foo into in a single pass without
   reliance upon interwiki redirection. :)
<robchurch> We could allow a specific format in interwiki.iw_url, e.g.
   http://$ for 'w'
<robchurch> [[en:w:Foo]] gets split up, "en" would be detected as a language
   code and replaced into it.
<robchurch> For something like [[w:Foo]], since there's no language code,
   use the content language code.
<robchurch> The biggest problem I envision with that is making sure it's
<robchurch> The actual existence checking is quite simple; we can have
   something like LinkCache but for interwiki links.
<robchurch> This could have a special batch job like LinkBatch and we could
   introduce a simple API method to do batch existence lookups.
<robchurch> On wiki farms, this cache could be shared across the entire cluster.
<robchurch> In that particular case, updating the cache is rather straightforward,
   since each wiki can maintain its own entries.
<robchurch> For non-farm setups, what we could potentially do is introduce a
   special kind of link table which stores a URL as the "from" value - when doing
   a cache update, do some sort of specific XML callbackesque thing to the wiki
   that requested existence state in the first place.
<robchurch> Later on, if we wanted, we could extend and override bits for a custom
   implementation for Wikimedia using direct access to the databases to make it that bit faster.