Requests for comment/Services and narrow interfaces

Problem statement
MediaWiki's codebase has mostly grown organically, which led to wide or non-existing internal interfaces. This makes it hard to test parts of the system independently and couples the development of parts strongly. Reasoning about the interaction of different parts of the system is difficult, especially once extensions enter the mix. Fault isolation is less than ideal for the same reason: a fatal error in a minor feature can bring down the entire system.

Additionally, the way clients request data from MediaWiki is changing. Richer clients request more information through APIs, which should ideally perform well even with relatively small requests. New features like notifications require long-running connections, which are difficult to implement in the PHP request processing model. New technologies like node.js have been developed that fit some of these applications well, and it would be nice if we had a way to leverage them.

Another problem is organizational. We now have several teams at the Wikimedia foundation working on new features. Currently each team needs to handle the full stack from front-end through caching layers and Apaches to the database. This tends to promote tight coupling of storage and code using it, which makes independent optimizations of the backend layers difficult. It also often leads to conflicts over backend issues when deployment is getting closer.

Using services to solve some of these issues
A common solution to the issues we are facing is to define parts of the system as independent services with clearly defined and narrow interfaces. A popular and ubiquitous style of interface is HTTP. Reasons for its popularity include wide availability of implementations and middleware, a common vocabulary of verbs that can be applied to resources modeling the state (see REST) and reasonable efficiency. Even without a need for distribution it is often useful to model interfaces in a way that would also easily map to HTTP. Value objects are another proposal in a very similar vein.

Incremental change
A complex system like MediaWiki can't be rewritten from scratch. We need a way to evolve the system incrementally. By starting to develop parts of the system like Parsoid, Math or PDF rendering as services we gain the ability to choose the most appropriate technology for each part. It is relatively easy to access these services from PHP code or provide a public API for browsers, and there are proposals [TODO: link PHP interface RFC] to make this even more convenient. The narrow interface also makes it possible to evolve each component internally without affecting existing users of the interface.

Performance and scaling
The performance of modern hardware mostly grows through parallelism and distribution. An architecture that makes it easy to process parts of a request in parallel is thus likely to improve the performance of the application. Implementing this as distribution lets us scale to many machines, and provides good fault isolation without the problems common with naive use of shared state.

Interfaces between teams as a method of organizational scaling
Different teams in the foundation differ in their focus and area of expertise. While backend minded teams might enjoy and excel at optimizing storage solutions and number crunching, feature teams would often prefer to have more time to polish the functionality they provide to users. Services can help to leverage different strengths by splitting a bigger tasks between several teams. The interfaces defined between the services are more likely to be narrow and informed by both implementer and user concerns. Concerns surface early as part of the definition of interfaces rather than in final review.

The storage layer in particular seems to be a good candidate for a service abstraction. This is discussed in the storage service RFC [TODO: link to RFC!].

Packaging and small installs
A strength of MediaWiki has so far been the ability to install a very stripped-down version in a PHP-only shared hosting environment. This might be insecure and slow, does not know much about balancing its HTML and does not include fancy features like Math or PDF rendering. But it will provide an easy starting point for running your own wiki.

In a service architecture the challenge is to provide a similar experience for small-scale deployments. One answer to this problem can be packaging. Virtual machines running popular Linux distributions like Debian are now available at similar prices as a shared hosting install. With good packaging the installation of MediaWiki can be as easy as  with optional   and   packages readily available. While there are definitely small overheads associated with running a distributed system in a small VM, this is likely to be offset by the choice of more efficient technologies for individual services. Another options is alternative implementations of some services for resource-constrained environments. Again, narrow interfaces make such drop-in replacements relatively straightforward.