Requests for comment/Data mapper

This RFC proposes adding a data mapper facility to core. Hopefully, this will promote domain-driven design and help isolate database access. It seems compatible with the recently approved move to a service-oriented architecture. An implementation with tests and examples is provided.

Implemented and merged on a WIP branch of the Campaigns extension for the Editor campaigns project

Problem statement
How to model the things a program is "about" and how to organize code for persistent storage are central problems of software architecture. A possible approach to the first problem is domain-driven design. This approach recommends creating effective domain models that aren't overly determined by the technology they're embedded in. The data mapper pattern is a persistent storage approach that may facilitate domain-driven design.

As the service-oriented architecture RFC notes, a lot of Mediawiki code has wide interfaces and is tightly coupled. Dividing this complex system into narrow, independent services and APIs will improve this situation. But it's not enough. We also need a pattern for persistent storage code and a route to expressive domain models.

Adding a data mapper facility to core will provide a standard way of isolating database access and should facilitate the development of domain models. Using the same patterns and toolkit within multiple service components, as appropriate, will make it easy for developers to move from one component to another, and will help avoid duplicate efforts to solve the same problems.

Domain-driven design, Mediawiki and data mappers
In domain-driven design, you build software around a model of what the software is “about” in the real world. According to Eric Evans: As a large system that must support the changing needs of a multifaceted, global social movement with hundreds of thousands of participants, Mediawiki, it seems, would benefit from domain-driven design.
 * “The goal of domain-driven design is to create better software by focusing on a model of the domain rather than the technology.”
 * “[...] the software constructs of the domain layer mirror the model concepts. It is not practical to achieve that correspondence when the domain logic is mixed with other concerns of the program. Isolating the domain implementation is a prerequisite for domain-driven design.”
 * “The domain objects, free of the responsibility of displaying themselves, storing themselves, managing application tasks, and so forth, can be focused on expressing the domain model. This allows a model to evolve to be rich enough and clear enough to capture essential business knowledge and put it to work.”
 * The problem with not following this methodology is that as a system grows, “more and more domain rules become embedded in query code or simply lost.” In such cases “[W]e are no longer thinking about concepts in our domain model. Our code will not be communicating about the business; it will be manipulating the technology of data retrieval.”

The data mapper pattern is a type of object-relational mapping (ORM). It supports domain-driven design by pushing object-relational mapping out of the domain model, into a lower, infrastructure layer. A dedicated facility, the data mapper, "handles all of the loading and storing between the database and the Domain Model and allows both to vary independently". The data mapper pattern contrasts with the active record pattern (in which domain objects have methods for inserting and updating themselves in a database).

Service orientation and domain-driven design
Service orientation and domain-driven design seem compatible. Service orientation is about building a complex application from smaller, relatively independent units of functionality that expose APIs, often over a network. Depending on how functionality is divided up, it seems there could be one or more domain models, and one or more places to use a data mapper (or some other ORM facility).

Isolating database access
Mediawiki's database classes provide abstraction of low-level database calls. But another, higher level of abstraction and isolation of persistent storage-related code is often justified. Consider, for example,. This method mixes logic for API parameters together with table and field names, an SQL join, and iteration through a complex database query result to build an API result. It is coupled to the details of the API call, data storage, and API result generation. Some form of ORM could be used to separate out code that depends on data storage details; that would be a step towards greater separation of concerns.

Proposed implementation
The proposed implementation is a generic data mapping facility that is configured via a global variable and annotations in entity classes.

Setup
Let's say that you have this database table and unique index:

Suppose you also have the following interface for objects that map to rows in that table:

Let's also say this is your implementation. (Here we've already added annotations on class variables as required by the data mapper.)

Once you have that, you just define an enum class (using ) for your entity's fields, and set some values in a global variable:

Then you're good to go!

CRUD
Here are some fun things you can do:

Tests and example
Please see the Campaigns extension for unit tests and a longer example.

Considerations
In the above example, the  class expresses domain knowledge about people: they must have a name and an age, no two people have the same name, the values of both properties can change, and you can create a string with a person's name and age that looks like this: "Name (age)". The class is not cluttered with framework-specific information or logic.

If you want to encapsulate logic related to the set of entities of a given type, it's easy to create repositories. For example, if we know we'll frequently have to retrieve people older than 70 whose names start with "W", we can create a  and put the logic for such queries there. Repositories are a thing in domain-driven design.

This implementation takes several cues (not queues) from Doctrine, a much more complete data mapping library for PHP. It would probably be more fun to just use Doctrine! With the recent acceptance of the Composer-manager libraries RFC, this is definitely something to consider. A possible impediment is that for consistency and legacy support, we may well want low-level database access to continue to go through existing MW classes.

and friends
Mediawiki already contains an ORM facility: ORMTable and related classes. These classes support the active record pattern, rather than the data mapper pattern.

Data persistence in Flow
The Flow extension encapsulates database access using its own object mapper facility.

DAO class in OAuth
Simple base class to handle CRUD, ACLS, and potentially caching (not done yet).

Proposed methodology
This RFC is not about refactoring existing Mediawiki classes to use the data mapper pattern, but about adding a data mapper facility to Mediawiki. Such a facility could be used with new and non-central MW code on the understanding that it is experimental and could change or even disappear at any time. It would be a sort of internal "beta feature". Reviewing how it is used and how it impacts on code quality would be a central task.

This same approach has been proposed for adding dependency injection to core.