Requests for comment/Data mapper

From MediaWiki.org
Jump to: navigation, search
General2014-07-07Andrew GreenT383
Request for comment (RFC)Requests for comment
Data mapper
Component General
Creation date 2014-07-07
Author(s) Andrew Green
Document status in draft
See Phabricator.

This RFC proposes adding a data mapper facility to core. Hopefully, this will promote domain-driven design and help isolate database access. It seems compatible with the recently approved move to a service-oriented architecture. An implementation with tests and examples is provided.

Implemented and merged on a WIP branch of the Campaigns extension for the Editor campaigns project

Problem statement[edit]

How to model the things a program is "about" and how to organize code for persistent storage are central problems of software architecture. A possible approach to the first problem is domain-driven design. This approach recommends creating effective domain models that aren't overly determined by the technology they're embedded in.[1] The data mapper pattern is a persistent storage approach that may facilitate domain-driven design.[2]

As the service-oriented architecture RFC notes, a lot of Mediawiki code has wide interfaces and is tightly coupled. Dividing this complex system into narrow, independent services and APIs will improve this situation. But it's not enough. We also need a pattern for persistent storage code and a route to expressive domain models.

Adding a data mapper facility to core will provide a standard way of isolating database access and should facilitate the development of domain models. Using the same patterns and toolkit within multiple service components, as appropriate, will make it easy for developers to move from one component to another, and will help avoid duplicate efforts to solve the same problems.

Rationale (more details)[edit]

Domain-driven design, Mediawiki and data mappers[edit]

In domain-driven design, you build software around a model of what the software is “about” in the real world. According to Eric Evans:

  • “The goal of domain-driven design is to create better software by focusing on a model of the domain rather than the technology.”[3]
  • “[...] the software constructs of the domain layer mirror the model concepts. It is not practical to achieve that correspondence when the domain logic is mixed with other concerns of the program. Isolating the domain implementation is a prerequisite for domain-driven design.”[4]
  • “The domain objects, free of the responsibility of displaying themselves, storing themselves, managing application tasks, and so forth, can be focused on expressing the domain model. This allows a model to evolve to be rich enough and clear enough to capture essential business knowledge and put it to work.”[5]
  • The problem with not following this methodology is that as a system grows, “more and more domain rules become embedded in query code or simply lost.”[6] In such cases “[W]e are no longer thinking about concepts in our domain model. Our code will not be communicating about the business; it will be manipulating the technology of data retrieval.”[7]

As a large system that must support the changing needs of a multifaceted, global social movement with hundreds of thousands of participants, Mediawiki, it seems, would benefit from domain-driven design.

The data mapper pattern is a type of object-relational mapping (ORM). It supports domain-driven design by pushing object-relational mapping out of the domain model, into a lower, infrastructure layer. A dedicated facility, the data mapper, "handles all of the loading and storing between the database and the Domain Model and allows both to vary independently".[8] The data mapper pattern contrasts with the active record pattern (in which domain objects have methods for inserting and updating themselves in a database).

Service orientation and domain-driven design[edit]

Service orientation and domain-driven design seem compatible. Service orientation is about building a complex application from smaller, relatively independent units of functionality that expose APIs, often over a network. Depending on how functionality is divided up, it seems there could be one or more domain models, and one or more places to use a data mapper (or some other ORM facility).

Isolating database access[edit]

Mediawiki's database classes provide abstraction of low-level database calls. But another, higher level of abstraction and isolation of persistent storage-related code is often justified. Consider, for example, ApiQueryAllUsers::execute(). This method mixes logic for API parameters together with table and field names, an SQL join, and iteration through a complex database query result to build an API result. It is coupled to the details of the API call, data storage, and API result generation. Some form of ORM could be used to separate out code that depends on data storage details; that would be a step towards greater separation of concerns.

Proposed implementation[edit]

The proposed implementation is a generic data mapping facility that is configured via a global variable and annotations in entity classes.

Setup[edit]

Let's say that you have this database table and unique index:

CREATE TABLE IF NOT EXISTS /*_*/person (
	person_id int unsigned NOT NULL PRIMARY KEY auto_increment,
	person_name varchar(255) NOT NULL,
	person_age int unsigned NOT NULL
) /*$wgDBTableOptions*/;

CREATE UNIQUE INDEX /*i*/person_name_idx ON
	/*_*/person (person_name);


Suppose you also have the following interface for objects that map to rows in that table:

interface IPerson {
	public function getId();
	public function getName();
	public function setName( $name );
	public function getAge();
	public function setAge( $age );
	public function makeNameAndAgeString();
}


Let's also say this is your implementation. (Here we've already added annotations on class variables as required by the data mapper.)

class Person implements IPerson {

	/**
	 * @var int
	 * @id
	 */
	private $id;

	/**
	 * @var string
	 * @unique
	 * @required
	 */
	private $name;

	/**
	 * @var int
	 * @required
	 */
	private $age;

	public function getId() {
		return $this->id;
	}

	public function getName() {
		return $this->name;
	}

	public function setName( $name ) {
		$this->name = $name;
	}

	public function getAge() {
		return $this->age;
	}

	public function setAge( $age ) {
		$this->age = $age;
	}

	public function makeNameAndAgeString() {
		return $this->name . ' (' . $this->age . ')';
	}
}


Once you have that, you just define an enum class (using TypesafeEnum) for your entity's fields, and set some values in a global variable:

class PersonField extends TypesafeEnum implements IField {
	static $ID;
	static $NAME;
	static $AGE;
}

PersonField::setUp();

$wgDBPersistence['IPerson'] = array(
	'realization'   => 'Person',
	'table'         => 'person',
	'column_prefix' => 'person',
	'field_class'   => 'PersonField'
);


Then you're good to go!

CRUD[edit]

Here are some fun things you can do:

// Get or instantiate the persistence manager
$persistence_mgr = new DBPersistenceManager( new DBMapper() );

// Create Phil
$phil = new Person();
$phil->setName( 'Phil' );
$phil->setAge( 25 );
$persistence_mgr->queueSave( $phil );
$persistence_mgr->flush();

// Phil now has an id (see the @id annotation in Person)
$id = $phil->getId();

// Retrieve Phil
$condition = new Condition( PersonField::$NAME, Operator::$EQUALS, 'Phil' );
$retrieved_phil = $persistence_mgr->getOne( 'IPerson', $condition );

// Update Phil
$phil->setAge( 26 );
$persistence_mgr->queueSave( $phil );
$persistence_mgr->flush();

// Tell Phil he's no longer welcome in your persistence store
$persistence_mgr->queueDelete( 'IPerson', $condition );
$persistence_mgr->flush();

Wait, there's more...[edit]

$jill = new Person();
$jill->setAge( 27 );
$persistence_mgr->queueSave( $jill );

// Throws a RequiredFieldNotSetException (see the @required annotation in Person
// and the NOT NULL in the database schema)
$persistence_mgr->flush();

// Try creating another Phil
$phil2 = new Person();
$phil2->setName( 'Phil' );
$phil2->setAge( 31 );
$persistence_mgr->queueSave( $phil2 );

// Throws a MWException due to the duplicate value (see the @unique annotation in
// Person and the unique index in the database schema)
$persistence_mgr->flush();

// Hmmm, let's try that again
$persistence_mgr->queueSave( $phil2, function ( $person, $index_name ) {
	print( 'Duplicate value on ' . $index_name . '.' );
} );

// Prints 'Duplicate value on NAME.'
$persistence_mgr->flush();

// Let's say we just want to make sure there's a 32-year-old Phil in our store. We
// don't know if there's currently a Phil there or not. If there's already a Phil,
// we want to set his age, and if there's no Phil, we want to insert him.
$phil_to_ensure = new Person();
$phil_to_ensure->setName( 'Phil' );
$phil_to_ensure->setAge( 32 );
$persistence_mgr->queueUpdateOrCreate( $phil_to_ensure, array( PersonField::$NAME ) );
$persistence_mgr->flush();

// Get an array of all the people in our repository, or as many as possible, ordered
// by name. Note that this method also accepts conditions and a continue key (works
// like MW web API's continue)
$people = $persistence_mgr->get( 'IPerson', PersonField::$NAME, Order::$ASCENDING );

Tests and example[edit]