Extension:MediaWikiFarm/Concepts

This page explains the vocabulary (words in bold) and the big picture of how MediaWiki and MediaWikiFarm manage multiple wikis.

History of the MediaWiki farms
MediaWiki has a long history of managing multiple wikis, since Wikimedia quickly had to manage many projects (Wikipedia, Wikisource, etc.) and many wikis by project (many languages for each project). MediaWiki (called “phase3” at the time) was born during 2002-2003; the SiteConfiguration class, which manages multiple configurations, was created mid-2004 (according to Git history). Most of the definitions are a legacy of this class SiteConfiguration.

For small farms which don’t use the class SiteConfiguration, the classical method to manage wiki farms is a LocalSettings.php with a switch depending on the host, as explained in the page Manual:Wiki family. Two extensions were written to help manage wiki farms: Extension:Farmer, written in 2006, and Extension:SimpleFarm written in 2011. The extension MediaWikiFarm is designed to be near SiteConfiguration’s spirit – expose raw configuration parameters but add a hierarchical layer – and at the same time add multi-versions management and scripts management, and tries to hide itself as much as possible to reduce incompatibility risk compared to a standalone MediaWiki.

Classifications of the wikis
The basic concept is the farm, which is a set of wikis, whose the configuration is collectively managed by an operator.
 * farm
 * Examples : Wikimedia wikis are collectively a wiki farm, Wikia wikis are a wiki farm, the Wikimedia Beta Cluster is another farm (this latter farm is a pre-production environment for the main Wikimedia farm)

The wikis are represented by a wikiID, an name to identify individual wikis. In the original spirit of the farm concept, the wikiID is the database name of the wiki, but in MediaWikiFarm the wikiID can be more generally thought as "just identifiers", linked or not to the database name, athough there are some constraints linked to some extensions (CentralAuth and Flow).
 * wikiID
 * Examples : Wikimedia wikis have wikiIDs strictly linked to the database name of the wiki: "enwiki" for English-speaking Wikipedia, "huwiki" for Hungarian-speaking Wikipedia, "nvwiki" for Navajo-speaking Wikipedia, "frwiktionary" for French-speaking Wiktionary, etc.

The wikis can be naturally sorted according to their suffix, their canonical family. As the name suggests it, the suffix of wikiIDs must match a common suffix for a given family. The exact meaning can be defined farm by farm. If there is no such natural classification, an arbitrary non-empty suffix must be defined.
 * suffix
 * Examples : for Wikimedia wikis the natural families are the projects: "wiktionary" for the Wiktionaries, "wikivoyage" for the Wikivoyage sites, "wiki" for Wikipedia (for historical reasons). In a farm where you have multiple distincts groups of people, for instance clients or communities, you can define a suffix as a group of people, and each group can own some wikis.

Other classifications can be defined (NB: but currently not implemented in MediaWikiFarm): wikis can be grouped by tags, for wikis with a common characteristic. The list of wiki belonging to a tag must be manually defined.
 * tag
 * Examples : Wikimedia has tags such as: wikis where the VisualEditor is proposed as a Beta feature, read-only wikis, private wikis, small wikis, or the database cluster of the wikis (there are currently 7 clusters), or the deployment group (there are currently 3 groups: test wikis, all wikis except Wikipedia, and Wikipedia).

Configuration of multiple wikis
MediaWiki has many configuration parameters, currently about 730 parameters. Although each parameter has a default value, the individual wikis must change at least 5 to 10 parameters (site name, server, database settings, language) and there are often 20 to 50 parameters with non-default values. In a farm, this can become difficult to manage. To reduce maintenance workload and complexity, the classification detailled above can help to set parameters to groups of wikis.

For instance, take the parameter, the skin used for anonymous visitors.
 * 1) by default, MediaWiki defines "vector" as the default skin;
 * 2) you can prefer a default to "modern" in your farm;
 * 3) different groups of people, who have each their own suffix, can prefer respectively "nimbus" and "cologneblue";
 * 4) for the anniversary event of the farm, each group of people creates a dedicated wiki for the event, and a common graphic identity is decided with the "metrolook" skin;
 * 5) in order to get more flexibility for the portal of the anniversary event, the skin "chameleon" (Bootstrap) is used with a custom stylesheet.

Such a complicated scenario can be easily implemented if the classifications have been correctly defined: (Forget me if the choices above are weird, they were chosen to have different values :)
 * 1) the default value for the farm is "vector";
 * 2) the two suffixes (the two wiki families) get the respective values "nimbus" and "cologneblue";
 * 3) a tag is created for the wikis related to the event, and it gets the value "metrolook";
 * 4) the portal website for the event gets the value "chameleon".

Existence of a wiki
A farm is identified by the URL of the wikis, or more precisely by a regular expression representing the URLs. This is similar to virtual hosts in webservers, but here at the MediaWiki level. For instance, for the Wikimedia farm, it could be "[a-z-]{2,12}.wik[a-z]+.org". With MediaWikiFarm, it can be convenient to name each part of the URL, this will create variables which can be reused later: "(?P&lt;lang&gt;[a-z-]{2,12}).(?P&lt;family&gt;wik[a-z]+).org".

These variables can be used to check the existence of a given wiki. For instance, the list of the existing "family"s can be defined in a file "families.yml". Similarly the list of the existing "wiki"s for a given family can be defined in a file "$family.yml" (NB: the dollar sign represents the value of a variable).

To check if "zh-min-nan.wikipedia.org" exists:
 * 1) "wikipedia" is searched in the file "families.yml", and if it exists,
 * 2) "zh-min-nan" is searched in the file "wikipedia.yml".

Multiple farms are defined with very different configuration architectures, but the initial choice depends on the regular expressions, so these should be mutually exclusive. It is possible to redirect from a farm (regular expression) to another up to 5 times; this redirect is internal and the visitor will not be aware of it.

MediaWiki versions
MediaWikiFarm has two exclusive modes:
 * 1) mono-version: only one version of MediaWiki is available for the whole farm, and MediaWikiFarm is installed as a classical MediaWiki extension, OR
 * 2) multi-versions: different versions of MediaWiki co-exist in the farm, and MediaWikiFarm is installed aside MediaWiki directories containing different versions.

Here, what is called a version is merely a MediaWiki version + flavour. For instance, there can be: Strictly speaking, a version is here the name of the directory containing the specific MediaWiki version + flavour. In mono-version mode, there is no version in this definition (since the directory name is not intended for that).
 * a version "1.28.0" corresponding to the Git tag 1.28.0,
 * a version "REL1_28" corresponding to the Git branch REL1_28,
 * a version "REL1_28-dev" corresponding to the Git branch REL1_28 but with specific developments or extensions tested before production.

It is possible to easily switch from a mono-version mode to a multi-versions mode (the reverse is also true, but probably less useful).