Architecture guidelines

This document describes guidelines that all MediaWiki developers should consider when they make changes to the MediaWiki codebase. It is a work in progress, and is the result of several discussions at Wikimedia-sponsored events, as well as online discussion on wikitech-l.

Incremental change
The MediaWiki core should be changed incrementally. Folk wisdom has it that complete rewrites of large sections of code are more difficult than they appear, and are often abandoned. This is in line with how we have done most refactoring so far.

Introduction of new language features
Features introduced to the PHP language which impact on architecture (see examples below) should be discussed, and consensus reached, before they are widely implemented.

Rationale: acceptance into the PHP core is not a guarantee that a feature is suitable for use in a large extant codebase with many developers. In the past, features have been introduced with caveats or pitfalls that are relevant to us (for example Late Static Binding which can be worked around by using a better design). Experienced MediaWiki developers are in a good position to critically review new PHP features.

Examples of existing PHP features which impact on architecture:


 * jQuery-style method chaining (enabled by PHP 5)
 * Type hinting (5.0)
 * __get magic method
 * Late static binding (5.3)
 * Closures (5.3)
 * Namespaces (5.3)
 * Traits (5.4)
 * Generators (5.5)

Interface changes
An interface is the means by which modules communicate with each other. For example, two modules written in PHP may have an interface consisting of a set of classes, their public method names, the definitions of the parameters, and the definitions of the return values.

For interfaces which are known to be used by extensions, changes to those interfaces should retain backwards compatibility if it is feasible to do so. The rationale is:


 * To reduce the maintenance burden on extensions. Many extensions are unmaintained, so a break in backwards compatibility can cause the useful lifetime of the extension to end.
 * Many extensions are developed privately, outside of WMF Gerrit, so rectification of all extensions in the mediawiki/extensions/* tree does not necessarily address the maintenance burden of a change.
 * Some extension authors have a policy of allowing a single codebase to work with multiple versions of MediaWiki. Such policies may become more common now that we have Long Term Support (LTS) releases.
 * MediaWiki's extension distribution framework contains no core version compatibility metadata. Thus, the normal result of a breaking change to a core interface can lead to a PHP fatal error, which is not especially user-friendly.
 * WMF's deployment system has only rudimentary support for a simultaneous code update in core and extensions.


 * When creating hooks, try to keep the interfaces very narrow. Exposing a '$this' object as a hook parameter is a poor practice which has caused trouble as we moved code from being UI-centric to separating to API modules etc.

Requests for comment (RFC)
An RFC is a request for review of a proposal or idea. RFCs are reviewed by the community of MediaWiki developers. Final decisions on RFC status are made by the WMF architects (Mark Bergsma, Asher Feldman, Tim Starling, Brion Vibber).

Filing an RFC is strongly recommended before embarking on a major core refactoring project.

Separation of concerns — UI and business logic
It is generally agreed that separation of concerns is essential for a readable, maintainable, testable, high-quality codebase. However, opinions vary widely on the exact degree to which concerns should be separated, and on which lines the application should be split.

MediaWiki began in 2002 with a very brief style where "business logic" and UI code were freely intermixed. This style produced a functional wiki engine with a small outlay of time and only 14,000 lines of code. Despite the MediaWiki core now weighing in at some 235,000 lines, the marks of the original style can still be seen in important areas of the core codebase. This design is clearly untenable as the core for a large and complex project.

Many features have three user interfaces:


 * Server-generated HTML
 * The HTTP API, i.e. api.php. This is used as both a UI in itself (action=help etc.) and as an interface with client-side UIs.
 * Maintenance scripts

Currently, these three user interfaces are supported by means of either:


 * A pure-PHP backend library
 * Having one UI wrap another UI (FauxRequest etc.)
 * Code duplication

Code duplication is generally frowned upon, but the other two approaches both have significant support. The traditional position is that pure-PHP backend libraries should be constructed, and this has been a common approach over the years. The progressive position is that business logic should be moved to the HTTP API, and that the other UIs should wrap the HTTP API.

Advantages of wrapping the HTTP API


 * Certain kinds of batching are naturally supported. Some existing pure-PHP interfaces suffer from a lack of batching, for example, Revision.
 * The HTTP API provides a boundary for splitting across different servers or across daemons running different languages. This provides a migration path away from PHP, if this is desirable.
 * Tight integration of backend and HTTP API functions provides compactness.

Disadvantages of wrapping the HTTP API


 * Loss of generality in the interface, due to the need to serve both internal and external clients. For example, it is not possible to pass PHP objects or closures.
 * The inability to pass objects across the interface has various implications for architecture. For example, in-process caches may have a higher access latency.
 * Depending on implementation, there may be serialisation overhead. This is certainly the case with the idea of replacing internal FauxRequest-style calls with remote calls.
 * The command line interface is inherently unauthenticated, so it is difficult to implement it in terms of calls to functions which implement authentication. Similarly, some extensions may wish to have access to unauthenticated functions, after implementing their own authentication scheme.
 * In general, tight integration of backend and HTTP API functions causes a loss of flexibility. Abstraction and flexibility are fundamentally linked.
 * More verbose calling code.

Separation of concerns — encapsulation versus value objects
It has been proposed that service classes (with complex logic and external dependencies) be separated from value classes (which are lightweight and easily constructed). It is said that this would improve testability. The extent to which this should be done is controversial. The traditional position, most commonly followed in existing MediaWiki code, is that code should be associated with the data it operates on, i.e. encapsulation.

Disadvantages of encapsulation


 * The association of code with a single unit of data tends to limit batching. Thus, performance and the integrity of DB transactions are compromised.
 * For some classes, the number of actions which can be done on/with an object is very large or not enumerable. For example, very many things can be done with a Title, and it is not practical to put them all in the Title class. This leads to an inelegant separation between code which is in the main class and code which isn't.
 * The use of smart but new-operator-constructable classes tends to lead to the use of singletons and global variables for request context. This makes unit testing more awkward and fragile. It also leads to a loss of flexibility, since the relevant context cannot easily be overridden by callers.

Whether or not it is used in new code, it is likely that encapsulation will continue to be a feature of code incrementally developed from the current code base. So we propose the following best practices intended to limit the negative impacts of traditional encapsulation.

Encapsulation best practices


 * Where there is I/O or network access, provide repository classes with interfaces that support batching.
 * A global singleton manager should be introduced, to simplify control of request-lifetime state, especially for the benefit of unit tests. This should replace global, class-static and local-static object variables.
 * Limit the code size of "smart object" classes by splitting out service modules, which are called like $service->action( $data ) instead of $data->action.

You aren't gonna need it
Do not introduce abstraction in advance of need unless the probability that you will need the flexibility thus provided is very high.

This is a widely accepted principle. Even Robert C. Martin, whose "single responsibility principle" tends to lead to especially verbose and well-abstracted code, stated in the book "Agile principles, patterns, and practices in C#":


 * If, on the other hand, the application is not changing in ways that cause the the two responsibilities to change at different times, then there is no need to separate them. Indeed, separating them would smell of Needless Complexity.


 * There is a corollary here. An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom.

An abstraction provides a degree of freedom, but it also increases code size. When a new feature is added which needs an unanticipated degree of freedom, the difficulty of implementing that feature tends to be proportional to the number of layers of abstraction that the feature needs to cross.

Thus, abstraction makes code more flexible in anticipated directions, but less flexible in unanticipated directions. Developers tend to be very poor at guessing what abstractions will be needed in the future. When abstraction is implemented in advance of need, the abstraction is often permanently unused. Thus, the costs are borne, and no benefit is seen.

Type Hinting

 * Type hinting (both in parameter lists and in doc comments) avoid errors
 * PHP provides __call which allows for proxy objects, i.e. objects which duplicate the methods of a class but do not actually inherit from that class. Consider awesome backwards compatibility hacks like WikiPage and StubObject. Type hinting breaks this. So I am hesitant to introduce it to public interfaces. -- Tim Starling (talk)
 * how does type hinting break this? -- Duesentrieb ⇌ 13:49, 26 May 2013 (UTC)

For example:

Breaks if RequestContext::getMain->getUser is migrated to return a StubUser instead of a User. -- Tim Starling (talk) 01:10, 30 May 2013 (UTC)
 * Oh right, subs can't "dynamically" implement interfaces. I want Java's proxy objects :) -- Duesentrieb ⇌ 18:19, 30 May 2013 (UTC)


 * Type hinting also prevents b/c migration hacks like

If a core class has child classes in extensions, the type hints cannot change without generating parse errors in the derived classes. So this prevents the type of a parameter from being migrated.

Since we are discussing renaming every class and completely changing all the interfaces, I don't think support for b/c hacks can be easily dismissed. The success of any renaming/refactoring project may depend on them being available. -- Tim Starling (talk)


 * Personally, I don't advocate any big rewrites. I'm mostly thinking of which guidelines should apply to new code.
 * But this is an important point: we should make clear which rules should apply to new code, and if and how they can be applied to legacy code, and which b/c strategies are to be employed, and which pit falls are to be considered. -- Duesentrieb ⇌ 13:49, 26 May 2013 (UTC)

We could do handle type hinting by using interfaces. The User object is probably not the best example since it both represents the values of a User and let you act on the values just like a controller would do. Anyway here is an example of such an interface:

-- Antoine &#34;hashar&#34; Musso (talk)
 * This is like the trick I used when I made WikiPage. Lots of "Article" type hints were just swapped out for "Page" interface hints. I'm not convinced the extra trick needed negates the utility of type hints. Aaron (talk) 05:26, 4 June 2013 (UTC)
 * Exactly. And I love how that makes the function self documenting and nicely throw errors whenever one mis use a function. Antoine &#34;hashar&#34; Musso (talk) 20:10, 5 June 2013 (UTC)

Refactoring best practices

 * It should be easy to identify what 'concern' a class is dealing with (without even reading the code). Naming convention or phpdoc markers ?
 * The historic lack of namespaces in PHP led to the proliferation of "bags o' vaguely related things" classes. Now we have namespaces, but they look a bit alien in the MediaWiki codebase -- can we decide on the role of namespaces in future code? Are they going to be a bit of cleverness that is tolerated, or a design pattern we encourage / enforce? Anonymous functions ditto -- there are a lot of MyExtensionHook classes that are more concisely written as  --Ori
 * For abstract classes, have clear documentation on what is expected of children
 * Along the lines of the comment about type hinting above: there is a set of tricks and techniques for performing complex migrations "in-flight" that should be documented, like using a temporary configuration var that selects the old or new implementation.

Let's please try to focus on a) guidelines for new code and b) guidelines for careful, local, just-in-time refactoring. Let's avoid planning or even specing out a huge refactoring of the entire codebase. -- Duesentrieb ⇌ 13:51, 26 May 2013 (UTC)

Impact on casual contributors

 * Some rules can be difficult for newbies, like requiring unit tests
 * help them write their tests!
 * don't be afraid to refactor in code review
 * Differ between (hard)core mediawiki in test case requirements vs for instance non-deployed extensions
 * Advice a good IDE/toolset, because that might help
 * Open source, please! --Brad


 * More separation makes code easier to understand and extend; understanding huge classes like User or Title is daunting.
 * On the other hand, mazes of tiny little classes that all depend on each other are difficult to understand too. We need a middle ground. --Brad
 * the win is usually when dependencies are one-way and limited. Avoid mazes but go for little clouds.
 * Tell users to be proactive about their changes when required (don't leave them hanging)
 * We can keep settings for most popuplar IDEs in source control, allowing contributors to start quickly.

Use dependency injection for external resources
If you initialize connection objects to external resources somewhere deep in your code, it makes writing unit tests very difficult and expensive. Possible pattern:

^ This makes it very easy to write unit tests that use PHPUnit mock objects --Ori


 * Passing in NULL like this is discouraged as it makes the signature less clear
 * The tight coupling is still there
 * Production code will end up being too lazy to properly inject dependencies
 * The assumption is that the dependencies are properly instantiated by the class itself and callers can and should be lazy; the ability to inject them is put in place specifically for unit tests. --Ori
 * In that case there indeed still is tight coupling and no poper dependency injection. So most problems are not solved
 * Yes, you're right: this partially solves the testing problem but misses an opportunity to make the implementation more flexible. --Ori

I'm not really on board with this and may need some more convincing. Daniel showed a code snippet from Wikibase yesterday with half a dozen constructor parameters. I am not sure it is worthwhile making unit tests easier by making everything else several times harder and uglier. I am happier at this point with the idea of a single context object passed to the constructor, even if it does make unit testing slightly more difficult.

The problem with passing many dependencies as formal parameters to constructors is that it is difficult to know in advance how many will be required, especially if you call hooks. Wrapping up all the context into a single object makes it easier to extend.


 * But it makes it hard(er) to substitute individual bits; you always have to provide everything that the object might use from the context.
 * Example: IContextSource has a getSkin method. So if I want to provide the contextual Title and User to something that takes an IContextSource, I also have to somehow provide a Skin. That doesn't make me happy...
 * It would still be an improvement over the current situation though -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)

I'm not saying that all constructors should use the same context class, just that I prefer a single context parameter in constructors over a dozen.
 * I agree that having many parameters in the constructor is ugly. A retort I got when complaining about this is that a class needing many different services is an indication of the class doing too much. Split up the class and have less responsibilities per class, then you need fewer services in the constructor - or so they say. I can't say yet whether this always holds true, but so far, it seems like a reasonable rule of thumb. -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)


 * A constructor that requires a dozen parameters is constructing a class that shouldn't exist. Collapsing the signature by using a Context object buries the problem without fixing it. A constructor's signature really is the appropriate place in which to enumerate a class's external dependencies, I think. --Ori.livneh (talk) 01:28, 2 June 2013 (UTC)


 * Code changes and grows. Functions which did simple things sometimes need to start doing more complex things. And sometimes even the simplest code needs several kinds of request state. I don't think that enumerating a class's dependencies in the constructor is flexible or conducive to growth. And I think that requiring even the smallest snippet of UI code with complex dependencies to be split up into 5 or 10 classes, each with voluminous doc comments and unit tests, would be a drain on staff resources and a barrier to volunteer entry. -- Tim Starling (talk) 06:40, 3 June 2013 (UTC)
 * Constructing service objects is hard because of all the dependencies, especially since these dependencies change change and grow, as you say. But this is solved by treating the services as "injectables": they are constructed in exactly one place, and then passed around. If you run into the problem of having to construct a service that asks for 12 parameters, and now need to figure out how to get all these objects, then you should probably not be constructing that object there, but just ask for it in the constructor of your class.
 * I undermined this principle somewhat by suggesting the notion of controller objects for modeling tasks/processes. Controllers may have a lot of dependencies, since they basically are services with smaller scope (and perhaps some state). If a controller ends up being very inconvenient to instantiate, it should either be created using a factory or turned into a request-scope service in order to bring the number of instantiation points down to one. -- Duesentrieb ⇌ 13:18, 3 June 2013 (UTC)
 * Isn't that exactly the opposite of what you complain about above, where having singletons or global factories or the like makes things impossible to test? Also, it seems to me that saying your class should take the already-constructed service object in its constructor instead of all that object's dependencies is just pushing the "how do I locate dependencies?" problem onto all your callers. And then, presumably, they'd push it to their callers. Where does this process end, and how does it end up different from the "context object" concept you argue against? Anomie (talk) 14:59, 3 June 2013 (UTC)
 * The service construction is indeed pushed up the stack as far as possible, ideally all the way into init.php. Well, perhaps that would be a bit too far. But at least into some function that can be considered "top level", a "static entry point" (see further up on this page). This way, all classes further down the stack have all their dependencies cleanly injected, and can easily be tested in isolation. The further "up" we can push this, the more "clean" classes we have. But of course, it has to end somewhere, see my comments about initialization in the section.
 * Services having "request-scope" is different from having global instances / singletons in two important respects: You can't (or at least shouldn't) access such an object at will in random code using some global variable, this is only done in a static entry point (ideally, only one: index.php). "request-scope" merely refers to the lifetime of the object (perhaps I should have just referred to "persistent services" or "long lived objects"). Long lived service objects don't violate the idea of modularity / isolation, where each dependency must (or should ideally) be declared in the constructor. -- Duesentrieb ⇌ 15:51, 3 June 2013 (UTC)
 * So instead of having request globals, you simulate them by having everything everywhere take a laundry list of paramaters. This doesn't strike me as much of an improvement. Anomie (talk) 13:57, 5 June 2013 (UTC)
 * Access to globals means everything depends on everything. Having a context object means everything depends on everything. Injecting a "laundry list" of dependencies means you list exactly what you need, when you need it.
 * Requiring all dependencies to be provided to the constructor (which is called in one place, in case of a service instance) means the service depends only on the things it actually needs and uses, and these are well defined and documented.
 * The "laundry list" can be kept short by reducing the responsibilities of the service. To which level this is taken is a matter of taste and practicality, but in general, the number of dependencies is lower when the responsibilities of the service are more narrow.
 * Basically - a service (or any class) needs to collaborate with others to do its job (that is, it depends on other classes). We can hardwire and hide these dependencies using globals, make them dynamic but still hidden (and effectively universal) using context objects, or we can declare them and make them explicit. Which option do you prefer? -- Duesentrieb ⇌ 14:29, 5 June 2013 (UTC)
 * Again, you're pushing problems up the call stack and then ignoring them. If you split one service into several to reduce its dependencies (and is this really a logical split, or is it just work around the huge constructor parameter list?), then callers that need those services have more dependencies. So at some level you're going to wind up with classes that have a constructor needing dozens or hundreds of these services (or you'll punt and "allow" these to use globals). You're also increasing the load on the programmer, as now they have even more service classes to know about.
 * Also, if changes mean your service-with-explicit-dependencies needs to add a dependency, you have to track down all of its callers, and then possibly all of their callers, etc to add it. Including third-party extensions that might be using the service. And, since the constructor is changing, you'll probably have to adjust any subclasses too. Or you might avoid all this by making yet another service. Or, I suppose, you might be able to have the new methods individually take the new dependency as a parameter.
 * Are the benefits worth the drawbacks? I'm not so sure. Anomie (talk) 13:52, 6 June 2013 (UTC)

(reset indent) Anomie, you seem to assume that "callers" need to instantiate services. They don't. Let me given an example:

So my PageStore implementation, DBPageStore, depends on a DBConnectionManager and a RevisionStore. Now I discover that this is not sufficient, I also need a BlobStore and a CompressionHandler. Two new dependencies for my DBPageStore. So, what needs to change? Only one place, namely ServiceRegistry::getPageStore, which needs to pass two more parameters to DBPageStore's constructor. That's it. The dependency of PageMover didn't change, it only depends on the three interfaces it originally depended on (since it checks permissions, changes a page and logs the result, it needs a permission manager, a page store and a log store).

Now, sublcasses of DBPageStore would need to change, but if we avoid code-reuse via subclassing, the hierarchy is going to be shallow and subclasses few.

If we decide to split a service interface, then yes, a lot of code needs to be updated (just like now). If we keep interfaces small, splitting an implementation along the interfaces it implements wouldn't require any code updates though.

With the design I outlined here, I do not see how callers have to know about so many services, or need frequent updates, or how dependencies would accumulate to huge constructor lists. This doesn't happen.

Of course, real life isn't this simple. But if you have control over the construction of the objects you are dealing with, then you can use this pattern to isolate them. If you don't have control over instantiation (e.g. in an API module or a SpecialPage), then you have a "static entry point" and need to "somehow" get a service manager to boostrap again. That's the only place where accessing a global would be ok, and that would only be done to get access to the service registry. And the respective object should offer an alternative way of providing the servies it needs, for use during testing. -- Duesentrieb ⇌ 19:25, 6 June 2013 (UTC)
 * So now anything that wants to move a page needs to have this "RequestDispatcher"? Which is basically a sort of "context object", exactly as you keep arguing against. I give up, it's impossible to discuss this if you can't even follow your own design. Anomie (talk) 13:14, 7 June 2013 (UTC)
 * So never mind whether I'm consistent in my arguments (I try to be, but perhaps I'm not). Let's just look at the proposed design. What problems do you see with it? what advantages? Do you think it would be a good model when starting a new project from scratch? Do you think it's applicable to MediaWiki?
 * Below are a few clarifications I feel could be helpful. But don't let them distract from the questions above. These are the important ones. The points below are merely meant to clear up unclear issues; maybe they are not helpful at all.
 * The request dispatcher is what handles http requests and dispatches them to the respective handlers, based on the desired action in the request params.
 * Code that wants to move a page as a high level operation (i.e. including permission checks, logs, etc) needs a PageMover. Ideally, this is provided to the constructor. This could be constructed directly if we have all knowledge needed for that, but this is bad for testability (tight coupeling). Or it could come from a Builder or Factory, if desired (the latter means more classes, but less dependencies for the original caller). But it's cleaner and probably simpler to just ask for it in the constructor.
 * Code that wants to move a page as a low level operation (just rename it) needs a PageStore.
 * The RequestDispatcher is not like a context object, but it uses something like a context object: the service registry. This is a concession to performance requirements: we only want to initialize stuff that we actually need to handle a specific kind of request.
 * Some sort of registry is needed on the top level for bootstrapping. Ideally that's only in index.php, but realistically, it's anywhere you have static methods or no control over instantiation. In MediaWiki, that's a lot of places. Which is exactly the problem.
 * The idea is to minimize the number of places that have or need access to the "kitchen sink", thus increasing the number of nicely isolated classes that can be easily understood, replaced, tested and reused. It well never be 100% (you always need some bootstrapping code), but 99% is doable. For MediaWiki, 50% would be a great improvement.
 * -- Duesentrieb ⇌ 20:47, 7 June 2013 (UTC)

If you don't think your class requires RequestContext::getUser, and don't want to set up a user in a test, why not use an IContextSource which throws an exception when getUser is called on it? That way, if you are proved wrong and a User is actually required, you can add one without adding constructor parameters. -- Tim Starling (talk)

I see your points, but now you have to step through the code to see what it does in fact require. I'll try to refactor a class that uses optional constructor parameters to take a single context object to see how well it works. --Ori

Perhaps a compromise would be to say that individual parameters are preferable, but a context object ("service bundle") is acceptable for convenience. The downsides should be made clear though. -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)