WMDE contract offers/Store interwiki-links in the database
From MediaWiki.org
Interwiki-Links are currently not stored in the database by MediaWiki, as are e.g. language-links, external links, image links, etc. Because of this, it's e.g. not possible to query the database for pages that refer to Commons -- this makes things much more difficult (and slow) for tools like CommonSense.
It would therefore be nice to store those links in the database, too.
[edit] Specification
- Store all interwiki-links in a database table, just like it is done for language links and external links.
- Interwiki-Links are links that have an interwiki-prefix, e.g. [[mw:Foo]], which are shown inline in the text, i.e. that are not language-links.
- This includes links that have a language code as their prefix, if it's prefixed with ":" as in [[:de:Foo]].
- The database table shall be structured like the table langlinks and contain the following fields:
- iwl_from: ID of the page that contains the link
- iwl_wiki: prefix-code of the wiki the link points to (without leading or trailing ":")
- iwl_title: the title of the page in the other wiki, that is, the part of the link after the prefix.
- Indexes: (iwl_from, iwl_wiki) and (iwl_wiki, iwl_title)
- the table must be kept synchronized with page content, i.e has to be updated whenever the page is re-parsed.
- Create a special page that allows users to list all pages that link to a specific page on a given wiki.
- Create a special page that shows all broken references to pages on other wikis, if possible
- this will be possible only for wikis for which the database can be accessed directly.
- configuration settings are to be devised for controlling this.
- Caveat: there are nearly 800 Wikimedia wikis now. It's probably best if only one target wiki can be queried at a time
- Caveat: there are three database clusters Wikimedia wikis. Doing this across clusters will probably not be possible.
The implementation is to be performed analogous to language-links and external links. It will mostly impact the following files:
- ParserOutput.php
- LinksUpdate.php
[edit] Implementation Notes
It weas noted that the specification does not address multiple prefixes like b:de:Foo fopr linking to german wikibooks.
This case is actually resolved in a peculiar manner by MediaWiki: only the first prefix is considered and resolved to an entry in the interwiki map. The rest, including any subsequent prefixes, is appended/inserted into the URL pattern from the interwiki map. That is, [[sw:w:Nairobi]] actually resolves to http://sw.wikipedia.org/wiki/w:Nairobi . The target wiki then notices that the requested page name begins with an interwiki prefix, resolves it using the interwiki map (to http://en.wikipedia.org/wiki/Nairobi in this case) and triggers a "magic" redirect to the resulting url.
So, mediawiki only ever looks at the first prefix when resolving links. you can make a chain of 10 prefixes, which will result in 9 redirects, one wiki ofter the other resolving the prefix to build a url of another wiki. Only the very first prefix is actually used to build the link (check the url it generates).
You could try to be smart and resolve all prefixes right away - but that does not work: because if we have b:de:Foo in the english wikipedia, linking to german wikibooks, you can't resolve that using wikipedia's interwiki map: if you try that, "de" resolves to the german wikipedia instead of german wikibooks.
So, looking only at the first prefix is the best we can do, I think. It would be nice to know the actual target of the link, but I see no good way tachieve this.
[edit] Apply
See WMDE_contract_offers for how to apply for a contract.