Editor campaigns/Persistence and domain layer notes

From mediawiki.org

Goals[edit]

  • Isolate all direct database access and code dependent on the database implementation.
  • Keep persistence logic separate from higher-level coordinating functions (logging, saving or restoring page revisions, checking user rights) and presentation logic (i18n messages, html construction, request details).
  • Expose an intuitive, opaque persistence PHP API to other layers of the system.
  • Keep open the possibility of switching out the DB implementation for a structured data store at a later date.
  • Optimize DB use:
  • Write tests as we go and aim for full code coverage.
  • Support the growth of the domain in a modular way for diverse use cases (edit-a-thons, courses, projects).

Adding general stuff that could go in core[edit]

  • Keep general additions compact.

Domain-driven design[edit]

In domain-driven design, architecture revolves around a model of what the software is “about” in the real world. That model goes in the domain layer. The main sources I've used for this are Patterns of Enterprise Application Architecture, by Martin Fowler, and Domain-Driven Design: Tackling Complexity in the Heart of Software, by Eric Evans. Here are some points from those books:

  • "With a Domain Model [...] we build a model of our domain which, at least on a first approximation, is organized primarily around the nouns in the domain." (Fowler, p. 26)
  • "The goal of domain-driven design is to create better software by focusing on a model of the domain rather than the technology.” (Evans, p. 148)
  • "In a model-driven design, the software constructs of the domain layer mirror the model concepts. It is not practical to achieve that correspondence when the domain logic is mixed with other concerns of the program. Isolating the domain implementation is a prerequisite for domain-driven design." (Evans, p. 75)
  • "The domain objects, free of the responsibility of displaying themselves, storing themselves, managing application tasks, and so forth, can be focused on expressing the domain model. This allows a model to evolve to be rich enough and clear enough to capture essential business knowledge and put it to work." (Evans, p. 70-71)
  • Domain-driven design makes a lot of sense when you expect a system to grow more complex (Evans, p. 78).
  • The problem with not following this methodology is that as a system grows, “more and more domain rules become embedded in query code or simply lost.” (Evans, p. 149) In such cases “[W]e are no longer thinking about concepts in our domain model. Our code will not be communicating about the business; it will be manipulating the technology of data retrieval.” (p. 150)

Justification[edit]

  • One of the problems with the last patch set (21) for the persistence layer was that some logic was hard to see. This approach should help.
  • While there's a lot of very nice code in Mediawiki, there's not enough mid-level structure and encapsulation. Editor Campaigns is almost completely greenfield (within its sphere of competence), so this is a good opportunity to do things differently. It's also a good opportunity to try out general facilities that might help us do so. Even if an approach like the one proposed here isn't adopted elsewhere in Mediawiki, work here can definitely feed into discussions of Mediawiki architecture.
  • While the domain is quite simple in our initial minimum viable product, it may well grow quickly and hairly.

Domain layer outward-facing PHP interface[edit]

* Method to be implemented later on.

Domain layer design notes[edit]

  • Unlike in the previous version, the classes that directly implement outward-facing interfaces will also be isolated from direct database access. This will let them express domain-related logic more clearly.
  • For database access, we'll use the Data Mapper pattern, which Fowler recommends when there's a substantial domain layer.
  • IParticipationRepository and ICampaignRepository are repositories in Evans's terminology and, to some extent, data mappers in Fowler's. According to Evans, a repository "represents all objects of a certain type as a conceptual set (usually emulated). It acts like a collection, except with more elaborate querying capability. Objects of the appropriate type are added and removed, and the machinery behind the repository inserts them or deletes them from the database." (p. 151)
  • For actual database access, the repositories will use helper classes that encapsulate database access. The interface of these helper classes will be opaque but much more open-ended than the above interface. Unlike the above interface, it will not be for use outside the domain layer. (The helper classes will be the real data mappers, and will be the consolidated general persistence bit called for in code review.)
  • The purpose of the IPersistenceManager is to allow the interface consumer to control transaction scope. Consistency rules will only be enforced when flush() is called.
  • In the previous version of the Campaigns persistence classes, participations were treated as part of an aggregate with campaigns. In Evans's scheme, an aggregate is "a cluster of associated objects that we treat as a unit for the purpose of data changes" (p. 126). On that view, classes outside the aggregate should only access internal members via the aggregate root (which was a campaign in the previous version). However, sometimes we'll want to access participations without going through a campaign (for example, to find a user's participations). That's why the new design includes IParticipationRepository, and campaigns and participations are no longer an aggregate.
  • There are external libraries (like Doctrine) that could handle the data mapping function. However it's not possible to use them with Mediawiki, since we should only access the database via existing MW classes for that (like DatabaseBase).
  • This design involves pushing existing MW classes for database access out of domain logic and into a lower, infrastructure level.

Domain layer logic[edit]

Here is some of the logic that will be expressed by the domain layer in isolation from lower-level persistence code:

  • A campaign must have a name, a URL key and a time created.
  • A campaign's time created is immutable and is set automatically when it is created.
  • Campaign names should be valid wiki page names.
  • No two campaigns may have the same name or URL key.
  • Once a participation is created, its fields are immutable, except for the time left.
  • A participation's start time is immutable and is set automatically when it's created.
  • A participation is considered current if no time left has been set.
  • When a user changes status or re-joins a campaign after having left, a new participation is created.
  • For a given user and a given campaign, no two participations may overlap in time.
  • A corollary of the previous rule: for a given user and a given campaign, there may be only one current participation.
  • All the properties of campaigns and participations are also part of what is expressed by this layer.
  • In Evans's terminology, a campaign is an entity (since it has a persistent identity) and a participation is a value object (since all that matters about it are the values it contains).
  • To be determined: details of how campaigns are deleted, terminated, retired or something like that. In any case, whatever happens to a campaign in that regard will also happen to users' participations in it.
  • When a user is deleted, their participations are ended.

Also note three campaign property changes from the previous version:

  • use_only_event_logging was removed (since it seems there are no use cases).
  • campaign_wikipage_id was replaced with campaign_wikipage_title_text.
  • time_ended was (tentatively) added.