Architecture guidelines

This document describes guidelines that all MediaWiki developers should consider when they make changes to the MediaWiki codebase. It is a work in progress, and is the result of developers meeting at Wikimedia-sponsored events and on IRC, as well as online discussions via wikitech-l.

Incremental change
MediaWiki core changes incrementally. Complete rewrites of large sections of code should be avoided ( why? ) as they often are more difficult than they appear and are often abandoned, this is in line with how we have done most refactoring so far ( needs clarification ).

Introduction of new language features
Features introduced in the PHP language which impact architecture should be discussed (see examples below), and consensus reached, before they are widely adopted in the code base.

Rationale: Features being added to PHP aren't necessarily suitable for use in a large scale applications and/or our specific architecture. In the past, features have been introduced with caveats or pitfalls that were relevant to us (for example:, which can be worked around by using a better design). Experienced MediaWiki developers are in a good position to critically review new PHP features.

Examples of PHP features that we aren't widely using yet, but could adopt (or reject):


 * Method chaining (enabled by PHP 5)
 * __get magic method
 * Late static binding (PHP 5.3)
 * Namespaces (PHP 5.3)
 * Traits (PHP 5.4)
 * Generators (PHP 5.5)

Interface changes
An interface is the means by which modules communicate with each other. For example, two modules written in PHP may have an interface consisting of a set of classes, their public method names, the definitions of the parameters, and the definitions of the return values.

For interfaces which are known to be used by extensions, changes to those interfaces should retain backwards compatibility if it is feasible to do so. The rationale is:


 * To reduce the maintenance burden on extensions. Many extensions are unmaintained, so a break in backwards compatibility can cause the useful lifetime of the extension to end.
 * Many extensions are developed privately, outside of WMF Gerrit, so rectification of all extensions in the mediawiki/extensions/* tree does not necessarily address the maintenance burden of a change.
 * Some extension authors have a policy of allowing a single codebase to work with multiple versions of MediaWiki. Such policies may become more common now that we have Long Term Support (LTS) releases.
 * MediaWiki's extension distribution framework contains no core version compatibility metadata. Thus, the normal result of a breaking change to a core interface can lead to a PHP fatal error, which is not especially user-friendly.
 * WMF's deployment system has only rudimentary support for a simultaneous code update in core and extensions.


 * When creating hooks, try to keep the interfaces very narrow. Exposing a '$this' object as a hook parameter is a poor practice which has caused trouble as we moved code from being UI-centric to separating to API modules etc.

Requests for comment (RFC)
An RFC is a request for review of a proposal or idea. RFCs are reviewed by the community of MediaWiki developers. Final decisions on RFC status are made by the WMF architects (Mark Bergsma, Tim Starling, Brion Vibber).

Filing an RFC is strongly recommended before embarking on a major core refactoring project.

Separation of concerns — UI and business logic
It is generally agreed that separation of concerns is essential for a readable, maintainable, testable, high-quality codebase. However, opinions vary widely on the exact degree to which concerns should be separated, and on which lines the application should be split.

MediaWiki began in 2002 with a very brief style where "business logic" and UI code were freely intermixed. This style produced a functional wiki engine with a small outlay of time and only 14,000 lines of code. Despite the MediaWiki core now weighing in at some 235,000 lines, the marks of the original style can still be seen in important areas of the core codebase. This design is clearly untenable as the core for a large and complex project.

Many features have three user interfaces:


 * Server-generated HTML
 * The HTTP API, i.e. api.php. This is used as both a UI in itself (action=help etc.) and as an interface with client-side UIs.
 * Maintenance scripts

Currently, these three user interfaces are supported by means of either:


 * A pure-PHP backend library
 * Having one UI wrap another UI (FauxRequest etc.)
 * Code duplication

Code duplication is generally frowned upon, but the other two approaches both have significant support. The traditional position is that pure-PHP backend libraries should be constructed, and this has been a common approach over the years. The progressive position is that business logic should be moved to the HTTP API, and that the other UIs should wrap the HTTP API.

Advantages of wrapping the HTTP API


 * Certain kinds of batching are naturally supported. Some existing pure-PHP interfaces suffer from a lack of batching, for example, Revision.
 * The HTTP API provides a boundary for splitting across different servers or across daemons running different languages. This provides a migration path away from PHP, if this is desirable.
 * Tight integration of backend and HTTP API functions provides compactness.

Disadvantages of wrapping the HTTP API


 * Loss of generality in the interface, due to the need to serve both internal and external clients. For example, it is not possible to pass PHP objects or closures.
 * The inability to pass objects across the interface has various implications for architecture. For example, in-process caches may have a higher access latency.
 * Depending on implementation, there may be serialisation overhead. This is certainly the case with the idea of replacing internal FauxRequest-style calls with remote calls.
 * The command line interface is inherently unauthenticated, so it is difficult to implement it in terms of calls to functions which implement authentication. Similarly, some extensions may wish to have access to unauthenticated functions, after implementing their own authentication scheme.
 * In general, tight integration of backend and HTTP API functions causes a loss of flexibility. Abstraction and flexibility are fundamentally linked.
 * More verbose calling code.

Separation of concerns — encapsulation versus value objects
It has been proposed that service classes (with complex logic and external dependencies) be separated from value classes (which are lightweight and easily constructed). It is said that this would improve testability. The extent to which this should be done is controversial. The traditional position, most commonly followed in existing MediaWiki code, is that code should be associated with the data it operates on, i.e. encapsulation.

Disadvantages of encapsulation


 * The association of code with a single unit of data tends to limit batching. Thus, performance and the integrity of DB transactions are compromised.
 * For some classes, the number of actions which can be done on/with an object is very large or not enumerable. For example, very many things can be done with a Title, and it is not practical to put them all in the Title class. This leads to an inelegant separation between code which is in the main class and code which isn't.
 * The use of smart but new-operator-constructable classes tends to lead to the use of singletons and global variables for request context. This makes unit testing more awkward and fragile. It also leads to a loss of flexibility, since the relevant context cannot easily be overridden by callers.

Whether or not it is used in new code, it is likely that encapsulation will continue to be a feature of code incrementally developed from the current code base. So we propose the following best practices intended to limit the negative impacts of traditional encapsulation.

Encapsulation best practices


 * Where there is I/O or network access, provide repository classes with interfaces that support batching.
 * A global singleton manager should be introduced, to simplify control of request-lifetime state, especially for the benefit of unit tests. This should replace global, class-static and local-static object variables.
 * Limit the code size of "smart object" classes by splitting out service modules, which are called like $service->action( $data ) instead of $data->action.

You aren't gonna need it
Do not introduce abstraction in advance of need unless the probability that you will need the flexibility thus provided is very high.

This is a widely accepted principle. Even Robert C. Martin, whose "single responsibility principle" tends to lead to especially verbose and well-abstracted code, stated in the book "Agile principles, patterns, and practices in C#":


 * If, on the other hand, the application is not changing in ways that cause the the two responsibilities to change at different times, then there is no need to separate them. Indeed, separating them would smell of Needless Complexity.


 * There is a corollary here. An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom.

An abstraction provides a degree of freedom, but it also increases code size. When a new feature is added which needs an unanticipated degree of freedom, the difficulty of implementing that feature tends to be proportional to the number of layers of abstraction that the feature needs to cross.

Thus, abstraction makes code more flexible in anticipated directions, but less flexible in unanticipated directions. Developers tend to be very poor at guessing what abstractions will be needed in the future. When abstraction is implemented in advance of need, the abstraction is often permanently unused. Thus, the costs are borne, and no benefit is seen.