MediaWiki 1.38/New configuration system

thanuwatdanger@gmail.com

Motivation: Treat configuration as data, not code
The motivation behind creating a new system of configuration is to make it easier to manage different sets of config for different purpose and environments. This could be a wiki farm, or running in multiple data centers, or managing testing scenarios and setting up development environments.

Two ideas lie at the foundation of the new design:


 * It should be easy to combine configuration from multiple sources in a predictable way.
 * Configuration should be data, not code.

This means we are moving away from managing configuration settings as global variables. For several years now, the preferred way to get the value of configuration settings has been via a Config object. Now we also want to make it so configuration can be defined and loaded without the need of global variables.

Of course, MediaWiki will stay backwards compatible with the old way of doing things for some time to come. Also, complex setups will always need to be able to determine configuration dynamically, by running code. That will not go away, but it will hopefully become nicer.

Configuring wiki farms
With MediaWiki 1.38, we are introducing an experimental mechanism for multitenancy, which should make it much simpler to build wiki farms. To enable it, just set the  configuration setting to a directory containing one settings file for each site. These files can override any configuration, and should typically provide at least the  and   settings.

With this setting enabled, MediaWiki will determine the name of the file to load from this directory by looking at information provided by the web server: if the WIKI_NAME variable is set in the environment (e.g. via Apache's SetEnv directive), this will be used as the file name. Otherwise, the file name will be the host name as reported by the web server. Any dots in the name are substituted by dashes, and the file extension is determined by the  setting.

For example, in the simplest case:

Note that this feature is experimental and may change or be removed in the next release. We would be grateful to hear from you whether this seems useful.

Changes in 1.39
Shortly after the release of 1.38, we decided that directly mapping from the requested domain to a file name was error prone and somewhat dangerous. So we removed the implicit mapping from the domain name, and require the name of the requested site to be set explicitly using the MW_WIKI_NAME variable using Apache's SetEnv directive or similar. Note that the old name used by 1.38, WIKI_NAME, is still supported but triggers a deprecation notice.

So in effect, the value of MW_WIKI_NAME will be combined with   and   to determine the file name of the settings file to load.

For example:

Experimental mode for loading LocalSettings.php
In the spirit of getting away from using global variables, we plan to, in the future, load LocalSettings.php not in the top level file scope, where every variable is automatically a global variable. Instead, we plan to load LocalSettings.php in a separate scope, and provide the $wgXyz variables to it explicitly. Similarly, any variables set in LocalSettings.php will be detected and applied to the configuration.

In most cases, this will be entirely transparent, except for the small performance hit caused by copying around 700 variables. To try whether it works for you, set the  environment variable. Here are some things to watch out for: If all goes well, loading LocalSettings in an isolated scope will become the default in 1.39. Please test this feature to make sure you will not have any difficulties when we switch to it per default. The more feedback we get on it now, the better!
 * When reading from a config variable, don't use the  keyword to access it. Config variables will be available in the file scope of LocalSettings.php, but not in the global scope. They will still be made available as global variables at the end of the initialization sequence, for backwards compatibility with any code that still reads from globals, but that only happens after all config has already been loaded.
 * Similarly, don't use the  keyword to define and write configuration variables. This is especially important if you define any closures or functions in your configuration. Either use the   keyword to import the variable into the closer's scope, or use a Config object to read configuration.
 * This currently works only for configuration variables that start with "wg". Consequently, this doesn't work at all if you need any extensions that use a  other than "wg". We are hoping to fix this before the next release.

Experimental support for LocalSettings.yaml
Instead of defining your wiki's configuration by setting variables in LocalSettings.php, you can now load a YAML (or JSON) file using the new YAML settings file format. To do this, set the  environment variable to the location of the configuration file you want to load.

Note that this is an experimental feature. We would love to hear how it works for you, but you should be aware that it is incomplete to a degree, and that some bits like exact structure of the settings file may still change without much warning.

One thing we have not quite figured out yet is how to make use of PHP constants for things like namespaces inside the YAML file. Symfony has a magic syntax for that, but we have not yet decided whether we want to use that. So for now, if you want to refer to, for example, the user namespace in a YAML file, you would have to use the number 2.

In any case, please try this out and report any issues you encounter. The more feedback we get, the better this feature will be once it becomes stable!

A word of caution though: do not put the YAML file in a location that is accessible from the web! In contrast to PHP files, YAML files can typically just be loaded as plain files from any browser. If you put the file in a place that is accessible from the web, like the one that contains LocalSettings.php, anyone who knows where to look will be able to see all secrets contained in that file, which may allow them to compromise your wiki.

Using SettingsBuilder in LocalSettings.php
Configuration loading and merging in MediaWiki is now managed by the  class. The variable  can be used inside   to access the default instance of SettingsBuilder. However, be aware that the interface of SettingsBuilder is still unstable and may change without notice.

The SettingsBuilder class offers the following methods for use in LocalSettings.php: Again, please keep in mind that this interface is experimental and subject to change without notice.
 * to load settings from a YAML or JSON file. Note that configuration loaded into the SettingsBuilder will generally not become available in the file scope of . However, you should not rely on this, as it depends on interactions with global variables in the background which is subject to change.
 * to update a single config setting. $value will be combined with any pre-existing value according to the merge strategy applicable to this variable.
 * to set a single config setting, overriding any previous values.
 * to get a Config object containing any configuration loaded so far.

Configuration schema
With the ability to load configuration from multiple sources arises the need to have a schema that provides us with more information about each configuration setting than just its default value. In particular, we need to know how to combine array structures when we encounter values for a given setting in multiple places: some should be replaced, some need to be merged as associative maps, while others have to be concatenated as lists.

To cover this need, we are introducing the idea of a config schema. For its structure, we are borrowing from JSON Schema, with a few additions. For each configuration variable, the following things can be defined:


 * : the configuration variable's default value.
 * : identifies the allowed value type or types. Most importantly, this can be used to distinguish between associative arrays (maps) and lists.
 * : If a complex structure needs to be merged in a way that is different from the default (key-by-key for maps, and append for lists), this can be used to specify the desired merge strategy.
 * : indicates that the value represents a set, so the array keys can and should be ignored.

This schema is used while loading configuration files, to provide defaults and to control how configuration from different sources are combined.

In 1.38, the configuration schema is defined as a JSON Schema in. From this schema we generate  for fast loading on every request, as well as documentation in. is kept in sync using a structure test that ensure that it defines the same default value as the schema file.

In 1.39 however, this will change again. Maintaining the configuration schema as YAML proved inconvenient, so we moved it into a PHP class called MainConfigSchema. From that we then generate  and

Loading Default Settings
The default values derived from the configuration schema are now read from, which contains a generated array that can be read quickly on every request by making use of PHP's opcode cache. The default values are loaded into the SettingsBuilder (see above), though they are still being made available as global variables for now.

This replaces the use of the file that sets a global variable for each of the over 700 configuration settings that MediaWiki supports. A backwards compatibility mode is available, just in case this change causes any issues:  still exists, and MediaWiki can be told to load it as before by setting   as  an environment variable (e.g. using a  directive in  ) or as a PHP constant (e.g. via   in  ).

Upcoming changes
There are some upcoming changes related to configuration that deserve a closer look:

MainConfigSchema
As described above, we have been working to move away from defining configuration defaults as global variables. After we found that maintaining the schema in  is inconvenient, we are now defining the schema for each setting as a constant on the   class. Each schema is an associative array that follows the schema structure described above. For convenience, types can be given using the PHPdoc conventions, and will be converted to JSON types as needed. To avoid confusion with PHP types, the aliases 'list' and 'map' can be used for the JSON types 'arrays' and 'object', respectively.

While  is convenient for maintaining information about configuration variables, it is not ideal for using that information. For this reason, we created a set of maintenance scripts that will generate specialized files from the schema information in MainConfigSchema:

The  file continues to exist as a deprecated stub that puts the defaults defined by   into the local scope. It should however no longer be used, and will be removed in an upcoming release.
 * contains the schema information in a form optimized for fast loading. It is intended for internal use by MediaWiki core only.
 * contains a class that defines a constant for each configuration variable, similar to what MainConfigSchema. However, the value of the constant is just the name of the config variable. The idea is that these constants can be used with Config::get and with ServiceOptions, to provide a safe way to refer to config settings. They also provide a way to discover the location of the schema and primary documentation.
 * is a JSON Schema file that can be used to validate config files written in JSON or YAML. This is considered documentation and is not used by MediaWiki itself.

Further ideas and outlook

 * Consolidate extension loading with the new configuration mechanism. There is currently a lot of dublicated code. See T297166.
 * Overhaul SiteConfiguration and WikiMap to mesh with the wiki farm support in SettingsBuilder.
 * Introduce config "presets" or "modes" for testing environments. Presets are settings files that can be activated using an HTTP header, to allow end-to-end tests to run against specific setups. See T267928.
 * Stop exporting configuration settings to global variables. The  variables will still work in LocalSettings.php, but not in application code or extensions. Of course, we will be done carefully, to avoid unnecessary disruption.
 * Replace ConfigRegistry and config prefixes with the more flexible concept of nested configuration nodes.
 * Add support for some parts of the configuration to be loaded from a database table.