Architecture guidelines

Incremental change
The MediaWiki core should be changed incrementally. Folk wisdom has it that complete rewrites of large sections of code are more difficult than they appear, and are often abandoned. This is in line with how we have done most refactoring so far.

Introduction of new language features
Features introduced to the PHP language which impact on architecture (see examples below) should be discussed, and consensus reached, before they are widely implemented.

Rationale: acceptance into the PHP core is not a guarantee that a feature is suitable for use in a large extant codebase with many developers. In the past, features have been introduced with caveats or pitfalls that are relevant to us (for example Late Static Binding which can be worked around by using a better design). Experienced MediaWiki developers are in a good position to critically review new PHP features.

Examples of existing PHP features which impact on architecture:


 * jQuery-style method chaining (enabled by PHP 5)
 * Type hinting (5.0)
 * __get magic method
 * Late static binding (5.3)
 * Closures (5.3)
 * Namespaces (5.3)
 * Traits (5.4)
 * Generators (5.5)

Interface changes
An interface is the means by which modules communicate with each other. For example, two modules written in PHP may have an interface consisting of a set of classes, their public method names, the definitions of the parameters, and the definitions of the return values.

For interfaces which are known to be used by extensions, changes to those interfaces should retain backwards compatibility if it is feasible to do so. The rationale is:


 * To reduce the maintenance burden on extensions. Many extensions are unmaintained, so a break in backwards compatibility can cause the useful lifetime of the extension to end.
 * Many extensions are developed privately, outside of WMF Gerrit, so rectification of all extensions in the mediawiki/extensions/* tree does not necessarily address the maintenance burden of a change.
 * Some extension authors have a policy of allowing a single codebase to work with multiple versions of MediaWiki. Such policies may become more common now that we have Long Term Support (LTS) releases.
 * MediaWiki's extension distribution framework contains no core version compatibility metadata. Thus, the normal result of a breaking change to a core interface can lead to a PHP fatal error, which is not especially user-friendly.
 * WMF's deployment system has only rudimentary support for a simultaneous code update in core and extensions.

Requests for comment (RFC)
An RFC is a request for review of a proposal or idea. RFCs are reviewed by the community of MediaWiki developers. Final decisions on RFC status are made by the WMF architects (Mark Bergsma, Asher Feldman, Tim Starling, Brion Vibber).

Filing an RFC is strongly recommended before embarking on a major core refactoring project.

Separation of concerns
It is generally agreed that separation of concerns is essential for a readable, maintainable, testable, high-quality codebase. However, opinions vary widely on the exact degree to which concerns should be separated, and on which lines application should be split.

MediaWiki began in 2002 with a very brief style where "business logic" and UI code were freely intermixed. This style produced a functional wiki engine with a small outlay of time and only 14,000 lines of code. Despite the MediaWiki core now weighing in at some 235,000 lines, the marks of the original style can still be seen in important areas of the core codebase. This design is clearly untenable as the core for a large and complex project.

Many features have three user interfaces:


 * Traditional server-generated HTML
 * The HTTP API, i.e. api.php. This is used as both a UI in itself (action=help etc.) and as an interface with client-side UIs.
 * Maintenance scripts

Currently, these three user interfaces are supported by means of either:


 * A pure-PHP backend library
 * Wrapping of one of the three UIs (FauxRequest etc.)
 * Code duplication

Code duplication is generally frowned upon, but the other two approaches both have significant support. The traditional position is that pure-PHP backend libraries should be constructed. The progressive position is that business logic should be moved to the HTTP API, and that the other UIs should wrap the HTTP API.

Candidate design patterns for consideration
These are design patterns that have been discussed in the community that need further consideration among MediaWiki architects and experienced developers.

One way to do things
(Borrowed from the Zen of Python / http://www.python.org/dev/peps/pep-0020/ ).

Rationale: code review back log is an issue; things would go quicker if we had canonical patterns for implementating certain things. If we agree on this, we probably need to eliminate some duplication.

Practical consequences: it should be a requirement for merging bold redesigns of existing interfaces that the previous interface be removed or a clear deprecation plan be articulated.


 * I agree we should write MediaWiki 2.0 using python go .NET.Haskell BrainFuck.WhiteSpace. Machine code nodejs
 * Let's bikeshed about whitespace! yay, WhiteSpace!
 * Switch to "more common" schema:  if (blah(blah) && blah) {

Just-in-time abstraction
This is another way of saying "You ain't going to need it"

Writing unit tests for all code immediately requires more abstraction right away. -- Duesentrieb ⇌ 13:42, 26 May 2013 (UTC)


 * As Robert Martin says "An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom." Abstraction gives you flexibility in a particular direction, the challenge (often not met) is to identify the axes of change in advance. If you know that you need unit tests, fine. The point is to avoid abstraction that gives you flexibility in unnecessary directions. -- Tim Starling (talk)


 * But how do you condense this into concrete recommendations? Anticipating the directions in which some code will be extended is usually a matter of experience. -- Ori
 * Isn't that basically what this is saying? Until you have the experience in how this particular thing needs to be abstracted, you're just guessing. Robla told a story about some bus tunnel built in the 80s, where they thought they were eventually going to want to run rail through it as well so they ran track. Then a few years ago when they actually built the rail, they found the tunnel had a different gague... Anomie (talk) 14:39, 25 May 2013 (UTC)


 * "Do not introduce abstraction in advance of need unless the probability that you will need the flexibility thus provided in the short term is very high (>80%)" Obviously human judgement is needed at some point, but I think we are putting a foot in the ground by asking for a high probability of need rather than a conceivable need.-- Tim Starling (talk)


 * Yes, I think this works. ContentHandler is a good example: there was pre-existing evidence that the notion of "content" merited being made abstract -- namely, w/image content and stuff in the mw namespace. -- Ori


 * I agree that premature abstraction is bad. I would argue though that there are some boundaries that should not be crossed, and require decomposition per default (e.g. the value object / service boundary, or the database code / ui code boundary). Any abstraction beyond that should and can be done "just in time"; Note however that the requirement to write (actual) unit tests may require some abstraction right away that would not be needed otherwise. -- Duesentrieb ⇌ 13:46, 26 May 2013 (UTC)
 * OTOH, if your "unit" tests are forcing you to add layers of abstraction that are not otherwise necessary, then perhaps your units are too small. Anomie (talk) 14:06, 28 May 2013 (UTC)

There is a counterpoint, which is that we know from experience that abstraction left to be done "just in time" sometimes isn't done at the time it is actually needed. Instead, the second developer hacks something up on top of whatever the first developer left, in the ugliest possible way. Consider the introduction of ApiEditPage without appropriate refactoring of EditPage, or the introduction of action=render without introducing the non-static methods that would be needed to allow generation of absolute URLs for links, dependent on ParserOptions. Also, I think it's fair to say that some developers just don't have a head for architecture and can't reasonably be expected to rearchitect a module in the course of adding a feature to it. Maybe this implies a more moderate wording for this policy, along the lines of "avoid unnecessary abstraction" rather than "just-in-time abstraction". -- Tim Starling (talk) 14:02, 26 May 2013 (UTC)
 * I'd also hope that code review and RFCs actually being useful could help with the "don't have a head for architecture" problem. Anomie (talk) 14:06, 28 May 2013 (UTC)
 * "Just-in-time" is confusing anyways, since in the context of computing the term is now firmly associated with compilation. --Ori.livneh (talk) 02:32, 2 June 2013 (UTC)

Type Hinting

 * Type hinting (both in parameter lists and in doc comments) avoid errors
 * PHP provides __call which allows for proxy objects, i.e. objects which duplicate the methods of a class but do not actually inherit from that class. Consider awesome backwards compatibility hacks like WikiPage and StubObject. Type hinting breaks this. So I am hesitant to introduce it to public interfaces. -- Tim Starling (talk)
 * how does type hinting break this? -- Duesentrieb ⇌ 13:49, 26 May 2013 (UTC)

For example:

Breaks if RequestContext::getMain->getUser is migrated to return a StubUser instead of a User. -- Tim Starling (talk) 01:10, 30 May 2013 (UTC)
 * Oh right, subs can't "dynamically" implement interfaces. I want Java's proxy objects :) -- Duesentrieb ⇌ 18:19, 30 May 2013 (UTC)


 * Type hinting also prevents b/c migration hacks like

If a core class has child classes in extensions, the type hints cannot change without generating parse errors in the derived classes. So this prevents the type of a parameter from being migrated.

Since we are discussing renaming every class and completely changing all the interfaces, I don't think support for b/c hacks can be easily dismissed. The success of any renaming/refactoring project may depend on them being available. -- Tim Starling (talk)


 * Personally, I don't advocate any big rewrites. I'm mostly thinking of which guidelines should apply to new code.
 * But this is an important point: we should make clear which rules should apply to new code, and if and how they can be applied to legacy code, and which b/c strategies are to be employed, and which pit falls are to be considered. -- Duesentrieb ⇌ 13:49, 26 May 2013 (UTC)

We could do handle type hinting by using interfaces. The User object is probably not the best example since it both represents the values of a User and let you act on the values just like a controller would do. Anyway here is an example of such an interface:

-- Antoine &#34;hashar&#34; Musso (talk)
 * This is like the trick I used when I made WikiPage. Lots of "Article" type hints were just swapped out for "Page" interface hints. I'm not convinced the extra trick needed negates the utility of type hints. Aaron (talk) 05:26, 4 June 2013 (UTC)
 * Exactly. And I love how that makes the function self documenting and nicely throw errors whenever one mis use a function. Antoine &#34;hashar&#34; Musso (talk) 20:10, 5 June 2013 (UTC)

Refactoring best practices

 * It should be easy to identify what 'concern' a class is dealing with (without even reading the code). Naming convention or phpdoc markers ?
 * The historic lack of namespaces in PHP led to the proliferation of "bags o' vaguely related things" classes. Now we have namespaces, but they look a bit alien in the MediaWiki codebase -- can we decide on the role of namespaces in future code? Are they going to be a bit of cleverness that is tolerated, or a design pattern we encourage / enforce? Anonymous functions ditto -- there are a lot of MyExtensionHook classes that are more concisely written as  --Ori
 * For abstract classes, have clear documentation on what is expected of children
 * Along the lines of the comment about type hinting above: there is a set of tricks and techniques for performing complex migrations "in-flight" that should be documented, like using a temporary configuration var that selects the old or new implementation.

Let's please try to focus on a) guidelines for new code and b) guidelines for careful, local, just-in-time refactoring. Let's avoid planning or even specing out a huge refactoring of the entire codebase. -- Duesentrieb ⇌ 13:51, 26 May 2013 (UTC)

Context and object lifetime

 * Registration systems in which class names are used instead of object instances (ie special pages or API modules) prevent control over object lifecycle and thus proper dependency injection
 * On the other hand, using objects means that you have to actually instantiate a possibly large number of classes every time when only a few will actually be used—wastes time and memory. --Brad
 * I agree that it would be bad to instantiate the entire object graph in the setup phase. There should be a limited number of static entry points that just construct the things they need, using the relevant configuration, factories, etc. E.g. an API module or a hook function may construct/request the services and controller it needs just in time.
 * In other cases, lazy instantiation can be implemented using the holder or factory pattern, though I find that a bot awkward/annoying. -- Duesentrieb ⇌ 13:56, 26 May 2013 (UTC)
 * Singletons are problematic for unit testing, and lifetime control generally, but they are an improvement over global variables
 * [for an example we should refactor the Wiki class to actually encapsulate configuration state, so multiple object trees could exist in one process -- makes cross-wiki stuff more feasible]
 * how are they an improvement? For all intents and purposes, they are global variables. -- Duesentrieb ⇌ 13:56, 26 May 2013 (UTC)


 * Singletons are better than globals because:
 * They allow for lazy initialisation. This improves performance. Lazy initialisation can be used to reduce the overhead required for backwards compatibility, by deferring the load of backwards compatible interface code until it is actually required.
 * Singleton requests can easily be flagged with wfDeprecated.
 * A global declaration provides a reference, rather than a copy of an object handle, which has negative performance consequences as described in my recent wikitech-l post.
 * Singletons could in principle be registered in some process lifetime object, to allow them to easily be discarded in bulk at the start of each unit test run. By a singleton, we mean an object with a global scope which is retreived with a function or static method. Infinite lifetime is not inherent to the concept.


 * -- Tim Starling (talk) 01:06, 31 May 2013 (UTC)
 * Such a registry might be nice anyway, so things like ForkController could more reliably destroy anything that could cause trouble before fork. Unit testing would be another use, along with a code that wants to change wiki context (if we decide to do that, e.g. frwiki -> enwiki). Aaron (talk) 05:30, 4 June 2013 (UTC)

Impact on casual contributors

 * Some rules can be difficult for newbies, like requiring unit tests
 * help them write their tests!
 * don't be afraid to refactor in code review
 * Differ between (hard)core mediawiki in test case requirements vs for instance non-deployed extensions
 * Advice a good IDE/toolset, because that might help
 * Open source, please! --Brad


 * More separation makes code easier to understand and extend; understanding huge classes like User or Title is daunting.
 * On the other hand, mazes of tiny little classes that all depend on each other are difficult to understand too. We need a middle ground. --Brad
 * the win is usually when dependencies are one-way and limited. Avoid mazes but go for little clouds.
 * Tell users to be proactive about their changes when required (don't leave them hanging)
 * We can keep settings for most popuplar IDEs in source control, allowing contributors to start quickly.

Use dependency injection for external resources
If you initialize connection objects to external resources somewhere deep in your code, it makes writing unit tests very difficult and expensive. Possible pattern:

^ This makes it very easy to write unit tests that use PHPUnit mock objects --Ori


 * Passing in NULL like this is discouraged as it makes the signature less clear
 * The tight coupling is still there
 * Production code will end up being too lazy to properly inject dependencies
 * The assumption is that the dependencies are properly instantiated by the class itself and callers can and should be lazy; the ability to inject them is put in place specifically for unit tests. --Ori
 * In that case there indeed still is tight coupling and no poper dependency injection. So most problems are not solved
 * Yes, you're right: this partially solves the testing problem but misses an opportunity to make the implementation more flexible. --Ori

I'm not really on board with this and may need some more convincing. Daniel showed a code snippet from Wikibase yesterday with half a dozen constructor parameters. I am not sure it is worthwhile making unit tests easier by making everything else several times harder and uglier. I am happier at this point with the idea of a single context object passed to the constructor, even if it does make unit testing slightly more difficult.

The problem with passing many dependencies as formal parameters to constructors is that it is difficult to know in advance how many will be required, especially if you call hooks. Wrapping up all the context into a single object makes it easier to extend.


 * But it makes it hard(er) to substitute individual bits; you always have to provide everything that the object might use from the context.
 * Example: IContextSource has a getSkin method. So if I want to provide the contextual Title and User to something that takes an IContextSource, I also have to somehow provide a Skin. That doesn't make me happy...
 * It would still be an improvement over the current situation though -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)

I'm not saying that all constructors should use the same context class, just that I prefer a single context parameter in constructors over a dozen.
 * I agree that having many parameters in the constructor is ugly. A retort I got when complaining about this is that a class needing many different services is an indication of the class doing too much. Split up the class and have less responsibilities per class, then you need fewer services in the constructor - or so they say. I can't say yet whether this always holds true, but so far, it seems like a reasonable rule of thumb. -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)


 * A constructor that requires a dozen parameters is constructing a class that shouldn't exist. Collapsing the signature by using a Context object buries the problem without fixing it. A constructor's signature really is the appropriate place in which to enumerate a class's external dependencies, I think. --Ori.livneh (talk) 01:28, 2 June 2013 (UTC)


 * Code changes and grows. Functions which did simple things sometimes need to start doing more complex things. And sometimes even the simplest code needs several kinds of request state. I don't think that enumerating a class's dependencies in the constructor is flexible or conducive to growth. And I think that requiring even the smallest snippet of UI code with complex dependencies to be split up into 5 or 10 classes, each with voluminous doc comments and unit tests, would be a drain on staff resources and a barrier to volunteer entry. -- Tim Starling (talk) 06:40, 3 June 2013 (UTC)
 * Constructing service objects is hard because of all the dependencies, especially since these dependencies change change and grow, as you say. But this is solved by treating the services as "injectables": they are constructed in exactly one place, and then passed around. If you run into the problem of having to construct a service that asks for 12 parameters, and now need to figure out how to get all these objects, then you should probably not be constructing that object there, but just ask for it in the constructor of your class.
 * I undermined this principle somewhat by suggesting the notion of controller objects for modeling tasks/processes. Controllers may have a lot of dependencies, since they basically are services with smaller scope (and perhaps some state). If a controller ends up being very inconvenient to instantiate, it should either be created using a factory or turned into a request-scope service in order to bring the number of instantiation points down to one. -- Duesentrieb ⇌ 13:18, 3 June 2013 (UTC)
 * Isn't that exactly the opposite of what you complain about above, where having singletons or global factories or the like makes things impossible to test? Also, it seems to me that saying your class should take the already-constructed service object in its constructor instead of all that object's dependencies is just pushing the "how do I locate dependencies?" problem onto all your callers. And then, presumably, they'd push it to their callers. Where does this process end, and how does it end up different from the "context object" concept you argue against? Anomie (talk) 14:59, 3 June 2013 (UTC)
 * The service construction is indeed pushed up the stack as far as possible, ideally all the way into init.php. Well, perhaps that would be a bit too far. But at least into some function that can be considered "top level", a "static entry point" (see further up on this page). This way, all classes further down the stack have all their dependencies cleanly injected, and can easily be tested in isolation. The further "up" we can push this, the more "clean" classes we have. But of course, it has to end somewhere, see my comments about initialization in the section.
 * Services having "request-scope" is different from having global instances / singletons in two important respects: You can't (or at least shouldn't) access such an object at will in random code using some global variable, this is only done in a static entry point (ideally, only one: index.php). "request-scope" merely refers to the lifetime of the object (perhaps I should have just referred to "persistent services" or "long lived objects"). Long lived service objects don't violate the idea of modularity / isolation, where each dependency must (or should ideally) be declared in the constructor. -- Duesentrieb ⇌ 15:51, 3 June 2013 (UTC)
 * So instead of having request globals, you simulate them by having everything everywhere take a laundry list of paramaters. This doesn't strike me as much of an improvement. Anomie (talk) 13:57, 5 June 2013 (UTC)
 * Access to globals means everything depends on everything. Having a context object means everything depends on everything. Injecting a "laundry list" of dependencies means you list exactly what you need, when you need it.
 * Requiring all dependencies to be provided to the constructor (which is called in one place, in case of a service instance) means the service depends only on the things it actually needs and uses, and these are well defined and documented.
 * The "laundry list" can be kept short by reducing the responsibilities of the service. To which level this is taken is a matter of taste and practicality, but in general, the number of dependencies is lower when the responsibilities of the service are more narrow.
 * Basically - a service (or any class) needs to collaborate with others to do its job (that is, it depends on other classes). We can hardwire and hide these dependencies using globals, make them dynamic but still hidden (and effectively universal) using context objects, or we can declare them and make them explicit. Which option do you prefer? -- Duesentrieb ⇌ 14:29, 5 June 2013 (UTC)
 * Again, you're pushing problems up the call stack and then ignoring them. If you split one service into several to reduce its dependencies (and is this really a logical split, or is it just work around the huge constructor parameter list?), then callers that need those services have more dependencies. So at some level you're going to wind up with classes that have a constructor needing dozens or hundreds of these services (or you'll punt and "allow" these to use globals). You're also increasing the load on the programmer, as now they have even more service classes to know about.
 * Also, if changes mean your service-with-explicit-dependencies needs to add a dependency, you have to track down all of its callers, and then possibly all of their callers, etc to add it. Including third-party extensions that might be using the service. And, since the constructor is changing, you'll probably have to adjust any subclasses too. Or you might avoid all this by making yet another service. Or, I suppose, you might be able to have the new methods individually take the new dependency as a parameter.
 * Are the benefits worth the drawbacks? I'm not so sure. Anomie (talk) 13:52, 6 June 2013 (UTC)

(reset indent) Anomie, you seem to assume that "callers" need to instantiate services. They don't. Let me given an example:

So my PageStore implementation, DBPageStore, depends on a DBConnectionManager and a RevisionStore. Now I discover that this is not sufficient, I also need a BlobStore and a CompressionHandler. Two new dependencies for my DBPageStore. So, what needs to change? Only one place, namely ServiceRegistry::getPageStore, which needs to pass two more parameters to DBPageStore's constructor. That's it. The dependency of PageMover didn't change, it only depends on the three interfaces it originally depended on (since it checks permissions, changes a page and logs the result, it needs a permission manager, a page store and a log store).

Now, sublcasses of DBPageStore would need to change, but if we avoid code-reuse via subclassing, the hierarchy is going to be shallow and subclasses few.

If we decide to split a service interface, then yes, a lot of code needs to be updated (just like now). If we keep interfaces small, splitting an implementation along the interfaces it implements wouldn't require any code updates though.

With the design I outlined here, I do not see how callers have to know about so many services, or need frequent updates, or how dependencies would accumulate to huge constructor lists. This doesn't happen.

Of course, real life isn't this simple. But if you have control over the construction of the objects you are dealing with, then you can use this pattern to isolate them. If you don't have control over instantiation (e.g. in an API module or a SpecialPage), then you have a "static entry point" and need to "somehow" get a service manager to boostrap again. That's the only place where accessing a global would be ok, and that would only be done to get access to the service registry. And the respective object should offer an alternative way of providing the servies it needs, for use during testing. -- Duesentrieb ⇌ 19:25, 6 June 2013 (UTC)
 * So now anything that wants to move a page needs to have this "RequestDispatcher"? Which is basically a sort of "context object", exactly as you keep arguing against. I give up, it's impossible to discuss this if you can't even follow your own design. Anomie (talk) 13:14, 7 June 2013 (UTC)
 * So never mind whether I'm consistent in my arguments (I try to be, but perhaps I'm not). Let's just look at the proposed design. What problems do you see with it? what advantages? Do you think it would be a good model when starting a new project from scratch? Do you think it's applicable to MediaWiki?
 * Below are a few clarifications I feel could be helpful. But don't let them distract from the questions above. These are the important ones. The points below are merely meant to clear up unclear issues; maybe they are not helpful at all.
 * The request dispatcher is what handles http requests and dispatches them to the respective handlers, based on the desired action in the request params.
 * Code that wants to move a page as a high level operation (i.e. including permission checks, logs, etc) needs a PageMover. Ideally, this is provided to the constructor. This could be constructed directly if we have all knowledge needed for that, but this is bad for testability (tight coupeling). Or it could come from a Builder or Factory, if desired (the latter means more classes, but less dependencies for the original caller). But it's cleaner and probably simpler to just ask for it in the constructor.
 * Code that wants to move a page as a low level operation (just rename it) needs a PageStore.
 * The RequestDispatcher is not like a context object, but it uses something like a context object: the service registry. This is a concession to performance requirements: we only want to initialize stuff that we actually need to handle a specific kind of request.
 * Some sort of registry is needed on the top level for bootstrapping. Ideally that's only in index.php, but realistically, it's anywhere you have static methods or no control over instantiation. In MediaWiki, that's a lot of places. Which is exactly the problem.
 * The idea is to minimize the number of places that have or need access to the "kitchen sink", thus increasing the number of nicely isolated classes that can be easily understood, replaced, tested and reused. It well never be 100% (you always need some bootstrapping code), but 99% is doable. For MediaWiki, 50% would be a great improvement.
 * -- Duesentrieb ⇌ 20:47, 7 June 2013 (UTC)

If you don't think your class requires RequestContext::getUser, and don't want to set up a user in a test, why not use an IContextSource which throws an exception when getUser is called on it? That way, if you are proved wrong and a User is actually required, you can add one without adding constructor parameters. -- Tim Starling (talk)

I see your points, but now you have to step through the code to see what it does in fact require. I'll try to refactor a class that uses optional constructor parameters to take a single context object to see how well it works. --Ori

Perhaps a compromise would be to say that individual parameters are preferable, but a context object ("service bundle") is acceptable for convenience. The downsides should be made clear though. -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)

Things to Read

 * Writing Testable Code on the Google Testing Blog. Concise overview of dos and don'ts.
 * Actually, we can just take that blog post as our architecture guide. It should do. -- Duesentrieb ⇌ 09:26, 31 May 2013 (UTC)