Requests for comment/Data mapper

This RFC proposes adding a data mapper facility to core. Hopefully, this will promote domain-driven design and help isolate database access. It seems compatible with the recently approved move to a service-oriented architecture. An implementation with tests and examples is provided.

Problem statement
How to model the things a program is "about" and how to organize code for persistent storage are central problems of software architecture. A possible approach to the first problem is domain-driven design. This approach recommends creating effective domain models that aren't overly determined by the technology they are embedded in. The data mapper pattern is a persistent storage approach that may facilitate domain-driven design.

As the service-oriented architecture RFC notes, a lot of Mediawiki code has wide interfaces and is tightly coupled. Dividing this complex system into narrow, independent services and APIs will be a step in the right direction. But it won't be enough. We also need a pattern for persistent storage code and a route to expressive domain models.

Adding a data mapper facility to core will provide a standard way of isolating database access and should facilitate the development of domain models. Using the same patterns and toolkit within multiple service components, as appropriate, will make it easy for developers to move from one component to another, and will help avoid duplicate efforts to solve the same problems.

Domain-driven design, Mediawiki and data mappers
In domain-driven design, you build software around a model of what the software is “about” in the real world. According to Eric Evans: As a large system that must support the changing needs of a multifaceted, global social movement with hundreds of thousands of participants, Mediawiki, it seems, would benefit from domain-driven design.
 * “The goal of domain-driven design is to create better software by focusing on a model of the domain rather than the technology.”
 * “[...] the software constructs of the domain layer mirror the model concepts. It is not practical to achieve that correspondence when the domain logic is mixed with other concerns of the program. Isolating the domain implementation is a prerequisite for domain-driven design.”
 * “The domain objects, free of the responsibility of displaying themselves, storing themselves, managing application tasks, and so forth, can be focused on expressing the domain model. This allows a model to evolve to be rich enough and clear enough to capture essential business knowledge and put it to work.”
 * The problem with not following this methodology is that as a system grows, “more and more domain rules become embedded in query code or simply lost.” In such cases “[W]e are no longer thinking about concepts in our domain model. Our code will not be communicating about the business; it will be manipulating the technology of data retrieval.”

The data mapper pattern is a type of object-relational mapping (ORM). It supports domain-driven design by pushing object-relational mapping out of the domain model, into a lower, infrastructure layer. A dedicated facility, the data mapper, "handles all of the loading and storing between the database and the Domain Model and allows both to vary independently". The data mapper pattern contrasts with the active record pattern (in which domain objects have methods for inserting and updating themselves in a database).

Service orientation and domain-driven design
Service orientation and domain-driven design seem compatible. Service orientation is about building a complex application from smaller, relatively independent units of functionality that expose APIs, often over a network. Depending on how functionality is divided up, it seems there could be one or more domain models, and one or more places to use a data mapper (or some other ORM facility).

Isolating database access
Mediawiki's database classes provide abstraction of low-level database calls. But another, higher level of abstraction and isolation of persistent storage-related code is often justified. Consider, for example,. This method mixes logic for API parameters with table and field names, an SQL join, and iteration through the complex database query result to build up an API result. It is coupled to the details of the API call, data storage, and API result generation. Some form of ORM could be used to separate out code that depends on data persistence details; that would be a step towards greater separation of concerns.