User:Daniel Kinzler (WMDE)/SitesInfo

From mediawiki.org

This is a quick brain dump of what I believe should replace Interwiki, SitesLookup, and WikiMap.

Prior iterations of this, for reference:

Needs identified during the code experiment:

  • Primary needs:
    • Resolve local aliases (interwiki prefixes) to global IDs. For convenience, most methods should allow the use of local aliases instead of global site IDs. Signature: getGlobalId( $siteId ). If $siteId already is the global ID, it is just returned as-is. If the ID cannot be resolved, this returns null.
    • Check whether an ID or alias is in a given set (e.g. interlanguage or interwiki prefixes). Such sets are subsets of the combined set of IDS and aliases. This can be done by defining several separate alias maps. Signature: getGlobalId( $siteId, $setName ). This is like getGlobalid() without the second parameter, but will only resolve if the ID is in the given set, and return null otherwise. We could also have isValidId( $siteId, $setName ).
    • Look up a "property" for a given site. Relevant properties include the base URL, article path, script path, asset path, database name, database cluster, content language, family (wikipedia, wiktionary, etc), etc. There should be some well known properties, but this should be an open set: things should be able to just add and use new properties. Signature: getProperty( $siteId, $propertyName, $default = null ). $siteId should accept global IDs and local alias. Should fail if the site isn't know. Should return the default if the site doesn't have that property set.
    • List all sites with a specific property set, or with a specific value for a given property. E.g. list all sites in a given family, list all sites on a given db cluster, etc. Signature: findSites( $property, $value = true ). Returns a list of global site IDs. If $value is not given, list all sites that have the property set to true (convenient for boolean properties, like "multilingual").
  • Secondary needs (for maintenance, etc):
    • list the global IDs of all sites. Signature: listSites().
    • list the local aliases for a given site. Signature: getAliases( $siteId ). Should accept an alias as input.
    • get all properties for a given site. Signature: getProperties( $siteId, $defaults = [] ). Returns an associative array, with the defaults applied for properties not set for the site.

The data should be represented as nested arrays in JSON or PHP files. Multiple files can be combined by recursively merging the arrays. Basic structure:

   {
       "sites": {
           "dewiki": {
               "family": "wikipedia",
               ...
           },
           "wikidatawiki": {
               ...
           }
       },
       "interlanguage-prefixes": {
           "de": "dewiki",
           ....
       },
       "interwiki-prefixes": {
           "d": "wikidatawiki",
           "wd": "wikidatawiki",
           "wikipedia": "dewiki",
           ....
       },
       "aliases": {
           "nowiki": "nbwiki",
           ....
       }
   }

On the WMF cluster, each wiki would typically load three, possibly four files:

  • wikis.json, a file containing the "sites" structure, containing the properties for each site, and possible also some aliases that are the same across families (e.g. "phab" for fabricator).
  • a "family" file, e.g. wikipedia.json, which would have all the interlanguage-prefixes, e.g. "en" => "enwiki", etc.
  • a "language" file, e.g. en.json, which would have all the sister-prefixes, e.g. "wikibooks" => "enwikibooks", etc.
  • if needed, a per-site file for overrides. This is mostly needed for "special" wikis, e.g. wikidatawiki.json or mediawikiwiki.json.

If available, the equivalent PHP file would be used instead of JSON, for performance reasons.