Architecture guidelines

Goals

 * Self-documenting, easy to follow code
 * More maintainable code
 * Separable and generic elements (e.g., reuse just the Http class, or just the Parser, or just the DB classes, without pulling in the entire framework)
 * Easier, stabler third party development in the future
 * True unit tests (via, for example, making it easier to mock interfaces)
 * Warning: we will have to break some old interfaces and kill some deprecated things. This will make future versions have more stable interfaces for implementations.
 * Can we standardize the deprecation process too? i.e, when we branch a new MW version, delete anything tagged as deprecated in $version-2 or something like that? Anomie (talk) 14:31, 25 May 2013 (UTC)
 * Didn't that already get decided on the ML a few months ago? or was that only deciding on hwo they should be labeled?. Peachey88 (talk) 07:43, 26 May 2013 (UTC)
 * Make it easier to parallelize and distribute processing; potentially implement parts in other languages
 * Use the same code paths (controllers) from both UI and API code

Known problems

 * Testability is limited due to lack of separation of concerns
 * High levels of abstraction / complexity / non-conformity are leading to bad security (chunked uploads).
 * This seems to be more an example of a bad abstraction / missing separation of concerns -- Gabriel
 * Lack of high-level documentation (how classes fit together)
 * Readability of the overall codebase (where do I find this object that is being used?)

Design principles
The MediaWiki code base has evolved over many years. While it currently serves us reasonably well, there were many design decisions that have outlived their useful lifetime. We intend to evolve the codebase toward more modern practices over time, using this document as a focal point for choosing which design patterns and best practices we should adopt in a long term document.

Candidate design patterns for consideration
These are design patterns that have been discussed in the community that need further consideration among MediaWiki architects and experienced developers.

One way to do things
(Borrowed from the Zen of Python / http://www.python.org/dev/peps/pep-0020/ ).

Rationale: code review back log is an issue; things would go quicker if we had canonical patterns for implementating certain things. If we agree on this, we probably need to eliminate some duplication.

Practical consequences: it should be a requirement for merging bold redesigns of existing interfaces that the previous interface be removed or a clear deprecation plan be articulated.


 * I agree we should write MediaWiki 2.0 using python go .NET.Haskell BrainFuck.WhiteSpace. Machine code nodejs
 * Let's bikeshed about whitespace! yay, WhiteSpace!
 * Switch to "more common" schema:  if (blah(blah) && blah) {

Clear separation of concerns
This proposal suggests that most objects in MediaWiki should handle one (and only one) of the following concerns
 * Values - Plain old PHP objects (POPO) - data is stored and validated internally, but no outside calls (especially network / DB) are done from this object. The object could be mutable or not. The object could contain other POPO objects, but should not reference services/controllers. Should easily serialize to JSON. Should not know or use services.
 * examples SHOULD be Title, User, etc.
 * Services - narrow interfaces, could in principle be web services with JSON (see value objects); long life time (application, request). Usually only one instance (but don't use singne instance (but don't use singletons). Stateless.
 * examples SHOULD be: BlobStore, LinkStore, RCNotifier, JobQueue, etc.
 * Controllers: models busines logic (typically a "process" or "action", like "rename page" or "block user"); usually local, short lifetime. Stateful. Uses services.
 * examples SHOULD be: Edit (with permission checks, conflict resolution, etc), MovePage, BlockUser, etc.

...with a catch-all for "misc." that doesn't fit into the first three.
 * examples: exceptions, interators, callbacks, and other glue code.

Even with the current (3:07pm) definitions, these terms are somewhat opaque to me. -Ori
 * Me too. And what does this mean for users of these classes: if some function needs a Title object, how does it get the TitleService (or whatever it's called) to be able to create one? Anomie (talk) 14:36, 25 May 2013 (UTC)

Keep aspects/concerns separate. Don't mix these in the same class:
 * Database / storage code
 * UI Code
 * Business Logic
 * HTTP handling
 * ...more?...

An example, though not perfect, is the separation between File objects, Store objects, and MediaHandler classes. This lets us use the same handlers regardless of storage backend, etc.

Where do process caches go? The File/FileRepo split was basically designed around the caches, i.e. File is basically a container for cached information about the file. It's hard to see how a "value object" can have references to rich and useful in-process caching.


 * How does this interact with the bit above about "values" and "services"? Do we end up with many services for each value, one for Database, one for UI, one for Business logic, and so on? Or is a "service" some sort of GenericValueFactory that handles getting all different types of Value objects? Anomie (talk) 14:36, 25 May 2013 (UTC)

Just-in-time abstraction
This is another way of saying "You ain't going to need it"

Writing unit tests for all code immediately requires more abstraction right away.


 * As Robert Martin says "An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom." Abstraction gives you flexibility in a particular direction, the challenge (often not met) is to identify the axes of change in advance. If you know that you need unit tests, fine. The point is to avoid abstraction that gives you flexibility in unnecessary directions. -- Tim Starling (talk)


 * But how do you condense this into concrete recommendations? Anticipating the directions in which some code will be extended is usually a matter of experience. -- Ori
 * Isn't that basically what this is saying? Until you have the experience in how this particular thing needs to be abstracted, you're just guessing. Robla told a story about some bus tunnel built in the 80s, where they thought they were eventually going to want to run rail through it as well so they ran track. Then a few years ago when they actually built the rail, they found the tunnel had a different gague... Anomie (talk) 14:39, 25 May 2013 (UTC)


 * "Do not introduce abstraction in advance of need unless the probability that you will need the flexibility thus provided in the short term is very high (>80%)" Obviously human judgement is needed at some point, but I think we are putting a foot in the ground by asking for a high probability of need rather than a conceivable need.-- Tim Starling (talk)


 * Yes, I think this works. ContentHandler is a good example: there was pre-existing evidence that the notion of "content" merited being made abstract -- namely, w/image content and stuff in the mw namespace. -- Ori

Code structure

 * Type hinting (both in parameter lists and in doc comments) avoid errors
 * PHP provides __call which allows for proxy objects, i.e. objects which duplicate the methods of a class but do not actually inherit from that class. Consider awesome backwards compatibility hacks like WikiPage and StubObject. Type hinting breaks this. So I am hesitant to introduce it to public interfaces. -- Tim Starling (talk)


 * Type hinting also prevents b/c migration hacks like

If a core class has child classes in extensions, the type hints cannot change without generating parse errors in the derived classes. So this prevents the type of a parameter from being migrated.

Since we are discussing renaming every class and completely changing all the interfaces, I don't think support for b/c hacks can be easily dismissed. The success of any renaming/refactoring project may depend on them being available. -- Tim Starling (talk)

Refactoring best practices

 * It should be easy to identify what 'concern' a class is dealing with (without even reading the code). Naming convention or phpdoc markers ?
 * The historic lack of namespaces in PHP led to the proliferation of "bags o' vaguely related things" classes. Now we have namespaces, but they look a bit alien in the MediaWiki codebase -- can we decide on the role of namespaces in future code? Are they going to be a bit of cleverness that is tolerated, or a design pattern we encourage / enforce? Anonymous functions ditto -- there are a lot of MyExtensionHook classes that are more concisely written as  --Ori
 * For abstract classes, have clear documentation on what is expected of children
 * Along the lines of the comment about type hinting above: there is a set of tricks and techniques for performing complex migrations "in-flight" that should be documented, like using a temporary configuration var that selects the old or new implementation.

Proper discussion of architectural changes
RFCs which need to get reviewed

The requirements for writing an RFC should be made more stringent. An RFC should be written when you have a concrete implementation in mind and when you explicitly intend to realize it fully and deploy it if you get the go-ahead. If not a working prototype, you should at least have some pseudo-code to illustrate your intent.

The current set of RFCs on MediaWiki.org contain a number of RFCs that are: --Ori
 * 1) written by someone who is hoping to motivate other people to actually implement something but who has no intent of implementing it herself.
 * 2) are thin on concrete implementation details, preferring instead to express a vague "wouldn't it be nice if..." aspiration.
 * I think you mean "accepting an RFC" instead of "writing an RFC." It also seems fairly misguided to suggest that a requests for comment process only be used in cases where there's a clear execution path. --MZ

Context and object lifetime

 * Registration systems in which class names are used instead of object instances (ie special pages or API modules) prevent control over object lifecycle and thus proper dependency injection
 * On the other hand, using objects means that you have to actually instantiate a possibly large number of classes every time when only a few will actually be used—wastes time and memory. --Brad
 * Singletons are problematic for unit testing, and lifetime control generally, but they are an improvement over global variables
 * [for an example we should refactor the Wiki class to actually encapsulate configuration state, so multiple object trees could exist in one process -- makes cross-wiki stuff more feasible]

Impact on casual contributors

 * some rules can be difficult for newbies, like requiring unit tests
 * help them write their tests!
 * don't be afraid to refactor in code review
 * Differ between (hard)core mediawiki in test case requirements vs for instance non-deployed extensions
 * Advice a good IDE/toolset, because that might help
 * Open source, please! --Brad


 * More separation makes code easier to understand and extend; understaning huge classes like User or Title is dounting.
 * On the other hand, mazes of tiny little classes that all depend on each other are difficult to understand too. We need a middle ground. --Brad
 * the win is usually when dependencies are one-way and limited. Avoid mazes but go for little clouds.
 * Tell users to be proactive about their changes when required (don't leave them hanging)

Use dependency injection for external resources
If you initialize connection objects to external resources somewhere deep in your code, it makes writing unit tests very difficult and expensive. Possible pattern:

^ This makes it very easy to write unit tests that use PHPUnit mock objects --Ori


 * Passing in NULL like this is discouraged as it makes the signature less clear
 * The tight coupling is still there
 * Production code will end up being too lazy to properly inject dependencies
 * The assumption is that the dependencies are properly instantiated by the class itself and callers can and should be lazy; the ability to inject them is put in place specifically for unit tests. --Ori
 * In that case there indeed still is tight coupling and no poper dependency injection. So most problems are not solved
 * Yes, you're right: this partially solves the testing problem but misses an opportunity to make the implementation more flexible. --Ori

I'm not really on board with this and may need some more convincing. Daniel showed a code snippet from Wikibase yesterday with half a dozen constructor parameters. I am not sure it is worthwhile making unit tests easier by making everything else several times harder and uglier. I am happier at this point with the idea of a single context object passed to the constructor, even if it does make unit testing slightly more difficult.

The problem with passing many dependencies as formal parameters to constructors is that it is difficult to know in advance how many will be required, especially if you call hooks. Wrapping up all the context into a single object makes it easier to extend.

I'm not saying that all constructors should use the same context class, just that I prefer a single context parameter in constructors over a dozen.

If you don't think your class requires RequestContext::getUser, and don't want to set up a user in a test, why not use an IContextSource which throws an exception when getUser is called on it? That way, if you are proved wrong and a User is actually required, you can add one without adding constructor parameters. -- Tim Starling (talk)

I see your points, but now you have to step through the code to see what it does in fact require. I'll try to refactor a class that uses optional constructor parameters to take a single context object to see how well it works. --Ori

Templates for common requirements

 * Taking some unit of work that has grown too expensive and converting it to a job
 * How to implement simple locks in memcached
 * Creating a new namespace for a ContentHandler implementation

Declaring unit tests in an extension
--Ori

This requires one to manually maintain a list of the tests, which leads to errors. It also is not inline with the standards of the wider PHP community. Defining the PHPUnit config in a phpunit.xml.dist file and allowing extendsions to register this is perhaps better. --Jeroen De Dauw (talk)

Part of the problem is that there are two different ways tests are found, depending on how PHPUnit is invoked. Invoking it against core uses UnitTestsList, while pointing it at the extension directly just looks for files ending in "Test.php". And this is non-obvious. -- Brad