Canonical interwiki prefixes

It is time to establish some canonical interwiki prefixes that will become the standard for those wikis that wish to accept them. As functionalities such as those offered by Extension:InterwikiMap for mass adoption of standardized interwiki maps becomes more widely used, it will be beneficial to avoid unnecessary disparities between the interwiki prefixes in the maps used by various sites.

Most of them should be pretty uncontroversial. But the technical question arises, given that there are so many wikis, how does one pick the best interwiki prefixes? It would be impracticable to individually contact all 20,000 wiki owners to ask what their preferences are and then manually process the responses. Here are some ideas:


 * Use the prefixes available from interwiki map for those wikis that are listed there.
 * In general, a balance needs to be struck between picking an interwiki prefix that's short enough to be convenient to use, but long enough to avoid ambiguity and conflicts with other prefixes. Keep interwiki links to at least four characters.
 * Generally go with the URL over the official name of the wiki, because it's likely to be shorter but not too short. If it's a URL like examplewiki.examplehost.com, only use the first segment (viz. examplewiki) if it makes sense to do so.
 * Begin migrating toward a system in which wiki owners volunteer their preferred interwiki prefixes. This could be done via directories such as Wikiindex. Or maybe it could be made a configuration setting that could be retrieved through siprop=general.
 * Sort out collisions manually, with regard for such factors as who owns a trademark to a name; who has the official URL (e.g. examplewiki.com would take precedence over examplewiki.examplehost.com in most cases); which wiki was established first; which wiki asked for the interwiki prefix first; and size, popularity, wikifactor and other measures of importance of the wikis.
 * Strip out spaces, colons , ampersands (&), and equal signs (=).
 * Avoid conflicts with languages (e.g. en:).
 * Watch out for interwiki prefixes that might collide with mainspace articles. E.g., a wiki prefix Commandos: would collide with Commandos: Behind Enemy Lines. Prefixes that have the word "wiki" or "pedia" in them, or that are not dictionary words, movie titles, etc. should be okay.
 * ✅ Solution: Check each parsed wiki name against a list of Wikipedia page titles. If it's in the list, and doesn't already have wiki in the name, then append "wiki" to the end. (However, meta-wiki's interwiki map still overrides this.) Exception: If the Wikipedia article is about the wiki, e.g. as in the case of Citizendium.
 * Solution: If Wikipedia has a page title that is the same as the interwiki prefix, but with a colon and so on after it, then add "wiki" to that prefix. (However, meta-wiki's interwiki map still overrides this.)
 * Current solution: If it doesn't have "wiki" or "pedia" in the name, append "wiki" to the end of the name and make that the prefix.
 * Getting the $1 URL is easy if we have the RecentChanges URL and it's a wiki engine that we can program the script to recognize based on RC URL, and provide the matching $1 URL. Not all wiki engines are yet supported.
 * Later on, we can get the API URL for MediaWiki installations (and probably some others) through really simple discovery.

What sites to include

 * Any sites on meta's interwiki map will be included. Also, all WikiIndex-listed sites considered vibrant, active, new, dormant, needing love, spammed, or goal reached.
 * What about useful non-wiki sites like urbandictionary? I guess leave them out for now.

What sites to exclude
All WikiIndex-listed sites considered private, cannot connect, inactive, or dead.

Procedure for generating the list
Run these scripts in the order listed:
 * AllCategoryMembersBot.php — Retrieves a list of all members of a category and stores that list in a text file. You'll want to use "All" as the category and Wikiindex as the wiki. Alternatively, use AllPagesBot.php to get everything (not just what's in Category:All).
 * ExportAllPagesBot.php — Exports all pages from a wiki and stores them in XML files. Use the list generated by AllCategoryMembersBot.php (or AllPagesBot) as the list to export.
 * ParseMirroredWikiIndexBot.php — Parses pages from the mirrored wikiindex to populate the parse_mirrored_wikiindex_bot.parsed_mirrored_wikiindex table.
 * Canonical interwiki prefixes/PMWTableToWikiTable.php — This script takes the PMWTable generated by Manual:Chris G's botclasses/ParseMirroredWikiIndexBot.php and converts it to a wiki table. You can then put that in your MediaWiki:Interwiki-whitelist for use by InterwikiMap.