Requests for comment/Dependency injection

This RFC proposes best practices for establishing the dependency injection pattern in MediaWiki.

Proposal summary
This RFC proposes adopting the following as best practices:
 * 1) When a service (or application logic) needs access to another a service, it asks for it in the constructor. This is the actual injection of dependencies.
 * 2) Objects that need access to services can only be constructed via factories (not directly using the   operator).
 * 3) Services are constructed by other services (factories and registries being special types of service). At the top of this chain of registries/factories there is the application-scope service locator which acts as the top-level service registry.
 * 4) Access to global default instances ("singletons") should be restricted to static entry points (e.g. hook handlers and callbacks in bootstrap code). Ideally, there is only one such global default instance, namely the service locator.
 * 5) Injecting/passing around factories, registries, and especially "kitchen sinks" like RequestContext should be avoided. The service locator should never be passed as a parameter.
 * 6) Mutable global state should especially be avoided.
 * 7) Services should be represented by narrow interfaces (e.g. UserLookup).
 * 8) Registries use constructor callbacks (aka factory functions) to instantiate services.
 * 9)  should avoid instantiating services, but define constructor callback instead.

These principles effectively promote dependency injection, without binding to a particular DI framework. In fact, it allows DI to be implemented entirely "on foot", without any configurable DI container - instantiation and decisions regarding the life cycle of service objects (lazy initialization, etc) can be left to plain old PHP code.

Rationale
Dependency injection (DI) is a design pattern that can facilitate unit testing, loose coupling and architecture description. Although it's more useful in some languages than in others, it is a well-established pattern, and there is a solid ecosystem of DI libraries for PHP.

MediaWiki doesn't have a dedicated DI mechanism, though adding one has been discussed.

Adding simple DI support to core would be a first step towards consistent, concise use of this pattern. Since we'll probably need at least a few iterations and use cases to get it right, this first step could be a kind of "internal API beta feature".

See also: Requests for comment/Services and narrow interfaces

Previous discussions
Using DI in MediaWiki has been considered before. Here are some earlier conversations about it:
 * Previous incarnation of this RFC
 * Discussion of changes to Architecture guidelines at Wikimania 2013
 * Section on DI for external resources in Talk:Architecture guidelines
 * Discussion of TitleValue at the Architecture Summit 2014
 * The TitleValue RFC and the ServiceRegistry section on that RFC's Talk page

Hard Coded Service Locator vs. Configurable DI Container
The example implementation of the application level service locator given in below hard-codes the knowledge about the services that are available in the MediaWiki application, as well as the details of constructing service instances. Extensions that nmeed to define their own services would have to create their own top level service locator, which would depend on the application level service locator (and possibly also on the service locators of other extensions).

The alternative would be to use a DI container that can instantiate service instances based on configuration that is defined at initialiazion time, not hard coded. Extensions would not need to define their on service locator, but would define additional services in the central container. A pre-existing container implementatin like Pimple or PHP-DI could be used.

Note that the RFC as such only calls for such a central locator to exist, both options would fit the RFC as proposed, and both would provide a migration path towards proper DI, if used correctly. Here are some code examples for defining a service using different techniques or frameworks:

Hard coded:

Pimple:

Symfony:

PHP DI:

A decision for or against the use of a configurable container is essential, since it blocks all other steps towards proper DI in MediaWiki. Even though the RFC does not specify how the service locator should be implemented, the author of this RFC preferrs a hard coded locator over a configurable container, as indicated by the example code. Below are some arguments for using a hard coded locator:

Performance. When using a configurable DI container, the container needs to be instantiated before initialization, and method are called on container and related objects. Structural elements like arrays, closures, or wrapper objects are instantiated. When auto-wiring is used, complex introspection needs to be performed. This happens during initialization, on every request. In contrast, the hard coded service locator has zero overhead. It's a plain PHP class that is loaded and instantiated.

Idiomatic Code (and tooling). With the hard coded service locator, no knowledge of the DI framework or extra documentation is needed to understand which services exist, and how they are defined. The code can easily be modified by anyone with basic knowledge of PHP.

Explicit Guarantees (static analysis). The hard coded service locator explicitly states which services are available, and guarantees their availability and type using the PHP language itself, not conventions on top of PHP. Dependencies are accessible using standard static analysis tools, without the need for knowledge about the DI framework. No plugins or special configuration needed needed for full IDE support.

Explicit Dependencies for Extensions. With extension providing their own hard coded service locator, which may depend on core's or another extension's service locator, dependencies between extensions (and dependencies of extensions on core services) are explicitly modeled in terms of the PHP language, and available to static analysis. It is always directly obvious which extension needs which service, and how it gets access to that service, and who defines this service, and how. When using a DI container, this would only be possible using conventions or documentation.

Unneeded Abstraction. DI containers provide a domain specific language for binding implementations to service names, and for defining the wiring of services, that is, their dependencies and configuration. This layer of abstraction introduces computational overhead as well as cognitive overhead (need to understand the DI framework). The additional abstraction provides runtime extensibility (see below), which may be useful, but isn't compelling for our use case. It does not, in the author's oppinion, improve readability or maintainability of the initialization code.

Arguments for a configurable container:

Extensibility at runtime. A DI container can be initialized based on configuration files, and extensions are free not only to add services, but also to replace or re-configure existing services. This removes the need to define a service locator for each extension that wants to use DI, but it also removes the advantage of explicitly modelling the dependencies of extensions on core services, or between extensions. Replacing or re-configuring services defined by core could be achieved for the hard coded service locator by introducing hook points (resp. callbacks).

Separation of Concerns. A DI container encapsulates the logic for instantiating services and managing service singletons. The hard coded locator mixes that with the knowledge about how to concrete service instances, causing the service locator to depend on essentially everything, and preventing isolated testing of the management logic. On the other hand, when using a DI container, the knowledge about instantiating concrete service instances has to be somewhere. There will still be a place that has all the dependencies on all the services, it's just push out of the container class itself.

In conclusion, it seems to the author that configurable DI container do not really offer any advantage to core, and have several disadvantages (though none of them is absolutely prohibitive). However, needing a separate service locator for every extension may prove problematic, since it may deter extension authors from using DI at all.

As a compromise, the core service locator could have a public getExtensionService( 'Foo' ) method (in addition to the specialized getFoo methods) for use by extensions. A callback for instantiating Foo would be registered in a global variable (which is passed to the service locator's constructor). This would implement a very basic DI container (similar to pimple) for use by extensions, while using the hard coded locator for core. Extensions that want more type safety and control over instantiation could still provide their own service locator that depends on core's service locator. We could even go one step further and wrap one of the popular DI container implementations in our own ServiceLocator. That would provide type-safe convenience functions for accessing well known services, provide framework isolation (letting us swap out the container implementation), and still give us the full power (but also the overhead) of a configurable DI container.

Service Locator vs. RequestContext
In the past, RequestContext was sometimes proposed as a vehicle to make services available to application logic. This is an example of the kitchen sink anti-pattern: An object with many dependencies is passed to many classes, causing all code to (indirectly) depend on everything. The distinction may seem cosmetic:

$context->getFoo;

versus

MediaWikiServices::getInstance->getFoo

And the former actually looks better on a first glance - after all, it does not use any global state, the RequestContext is properly injected (though its current implementation heavily relies on global state internally). The important distinction is when and where this would be called: MediaWikiServices::getInstance is supposed to be called only by static code, never in application logic. Code that uses MediaWikiServices::getInstance can, by definition, not be tested by unit tests, and should thus be minimized.

By contrast, RequestContext is injected to provide information about the current request, that is, the requested page, the logged in user, requested output language, etc. Having a (value) object to represent the request in such a way is quite useful. But that value object should not depend on any services, to avoid circular (or rather, knotted) dependencies of everything on everything.

Static entry points
A static entry point is code in a static context (directly in clobal scope, or in a global function, or in a static method) that gets called by a framework (e.g. the PHP runtime, or MediaWiki's Hooks mechanism). In MediaWiki, typical static entry points are:


 * 1) Global scope code in web entry points like index.php, load.php, thumb.php, etc.
 * 2) Global scope code in maintenance scripts.
 * 3) Extension bootstrap code (see )
 * 4) Hook handler functions (see )
 * 5) Constructor callbacks

Service locator
The application-scope service locator is the top-level registry for the services that the application's logic needs to operate (see e.g. ). Extensions can define their own service locators (which may depend on MediaWiki's service locator), see e.g..

Access to the service locator should be restricted to static entry points. This way, it acts as a DI container. A simple implementation of such a DI container is described in http://fabien.potencier.org/do-you-need-a-dependency-injection-container.html

See also en:Dependency_injection for a discussion of service locator vs. DI container logic.

Bootstrap code
Bootstrap code refers to code that is executed at the beginning of every request. Bootstrap code creates the initial scaffolding for initializing the application by loading configuration and instantiating the most basic services. In MediaWiki, bootstrap code is typically:
 * 1) global scope code in a web entry point (or maintenance script).
 * 2) extension entry points (see )

Code inside hook handler functions or constructor callbacks is not bootstrap code, since it is not executed during the initialization process.

Factory
A factory is a service that instantiates objects. These objects can be services, or data objects. Factory methods that are guaranteed to create a new instance should have names starting with "new". Other factory methods should have names starting with "get", and may or may not return singletons.

Factories are used to inject the ability to instantiate certain kinds of objects. They can be understood as partial applications of constructors. Factory methods typically, but not necessarily, take parameters.

A "factory" in the more narrow sense would typically have only one factory method, and create only one kind of object.

Registry
Registries are factories for services. Factory methods in a registry typically do not take any parameters. Registries can be used to
 * 1) provide access to instances of a variety of services, e.g. various storage level services.
 * 2) provide access to specialized instances of a single service interface implemented for different "targets", e.g. different MediaHandler instances for each type of media.
 * 3) provide lazy instantiation of services, to avoid overhead at startup time.

The top-level registry (the service locator) provides access to all services known to the application, for use in.

A registry may be implemented by hardcoding the logic for instantiating the services (typical especially for the top-level registry), or by bootstrap code defining constructor callbacks (aka factory functions). See the example. Note that registering class names should be avoided, since that prevents injection of services via constructor arguments (because the constructor's signature is prescribed by the registry).

Constructor callback
A constructor callback or factory function is a callable that returns a new instance of some class.

Code Experiments on Gerrit

 * Introduce top-level service locator . 245483
 * Move singletons of MediaWikiTitleCodec and MediaWikiPageLinkRenderer to . 245484
 * Refactor  to allow injection. 250150
 * Allow DI for  scripts. 250430
 * Change  to use DI. 250151

UserLookup interface
Note that this is an interface for looking up users. It does not contain methods for updating user records, nor for creating new users.

MediaWikiServices
To avoid boilerplate code for lazy instantiation, this can be generalized a bit:

Hook handler injection
The handler function is then hooked up as usual:

The static logic can also be moved into an anonymous function, if preferred:

This is somewhat cleaner, since spurious dependencies in the handler class are avoided. But keeping the static code in the handler class provides better knowledge locality, and avoids clutter in the bootstrap file.