Talk:Architecture guidelines

Removal notes

 * Removed the two sections about encouraging the use of commercial IDEs. To the extent that it is relevant to an architecture discussion, I think I am happy to veto it.
 * Removed the section on naming, since it is dealt with in the style guide, and also the duplicate comment about MWCryptRand naming.
 * Removed near-empty redundant section "Role of the architects in the code review process"

-- Tim Starling (talk) 08:08, 26 May 2013 (UTC)


 * I removed the sections "Templates for common requirements" & "Declaring unit tests in an extension", after thinking over and agreeing with Daniel's claim that they'd be more appropriate for a cookbook. --Ori.livneh (talk) 01:49, 2 June 2013 (UTC)

More complete removed sections have been copied to this talk page. -- 110.146.177.172 05:53, 5 August 2013 (UTC)

What is MediaWiki?
Before embarking on architecture guidelines and considering a dramatic change to the RFC process, I think it would be helpful to define what MediaWiki is. Has that be done anywhere? If so, where? If not, wouldn't that be a sensible starting point? --MZMcBride (talk) 22:21, 26 May 2013 (UTC)


 * I didn't realise that was subject to debate. -- Tim Starling (talk) 01:11, 31 May 2013 (UTC)


 * A statement of purpose or list of principles or a set of pillars would be good, I think. (I started some notes at principles, but I'm not sure I feel comfortable writing something of this nature.)
 * "MediaWiki is open source wiki software." This seems like a good principle, though... should it be open source? Is MediaWiki a CMS? "MediaWiki is written in PHP." Is this a principle? I think defining (or attempting to define) what MediaWiki is will help in shaping architecture guidelines, at a high level. For example, in the context of considering MediaWiki 2.0, what is negotiable and what is non-negotiable? --MZMcBride (talk) 06:00, 1 June 2013 (UTC)
 * It sounds like you want a definition of what MediaWiki will be, rather than what MediaWiki is. You're looking for properties of MediaWiki that will be conserved through change, which is really more about limiting future changes than describing the current situation. I think we're already doing that here, in terms of architecture. In terms of choice of features, that's not a subject for this document. -- Tim Starling (talk) 00:21, 3 June 2013 (UTC)
 * I was curious if there was any other document describing (what you call) how MediaWiki will be. I call it how MediaWiki is and what we strive for it to be (principles). I agree that it's outside the scope of this document, but it still seems to be missing and I think it's important to have.
 * It isn't about limiting future changes as much as it's about figuring out what the priorities of the software are. And yes, I think this will impact (not limit, but impact) guidelines like these; that's why I think it's important that defining what MediaWiki is and will be should be done first, if it hasn't been done already. (I searched around this wiki, but didn't find anything. There's likely something on Meta-Wiki, maybe.) --MZMcBride (talk) 00:25, 4 June 2013 (UTC)

Where are we going?
One thing I'd like to see come out of this would be a 10000-foot overview document to just show what we'd want the class structure in core to look like. E.g. "We have Title, and X, Y, and Z service classes. And User and X, Y, and Z service classes. And then we have MovePageControllerThing, etc.". Anomie (talk) 17:23, 29 May 2013 (UTC)
 * Thanks for the suggestion! Perhaps I'll set up a "pipe dream" page and collect some ideas of how mediawiki could work if we could re-design it from scratch now. Not as a proposal for a rewrite or even refactoring, just as food for thought. -- Duesentrieb ⇌ 13:21, 3 June 2013 (UTC)
 * Maybe we can pick one of our class and propose various way to refactor it. Maybe Title which everyone should be familiar with. Antoine &#34;hashar&#34; Musso (talk) 19:50, 5 June 2013 (UTC)

Known problems

 * Section moved from article


 * Testability is limited due to lack of separation of concerns
 * High levels of abstraction / complexity / non-conformity are leading to bad security (chunked uploads).
 * This seems to be more an example of a bad abstraction / missing separation of concerns -- Gabriel
 * I don't see what the chunked upload security issue had to do with abstraction or architecture or readability. -- Tim Starling (talk) 03:23, 4 June 2013 (UTC)
 * Lack of high-level documentation (how classes fit together)
 * Doxygen let us write documentation that is bound to the source code it documents. We can even build page out of a README file, see File backend doc. Antoine &#34;hashar&#34; Musso (talk)
 * Readability of the overall codebase (where do I find this object that is being used?)
 * Might standardize how we name class: compare AjaxResponse, DatabaseMysql, ParserOptions versus RedisBagOStuff, DuplicateJob, DeleteAction.  PHPUnit uses class names where underscore triggers a new subdirectory: PHPUnit_TextUI_Command is found in /PHPUnit/TextUI/Command.php . That is recommended by PSR-2. Antoine &#34;hashar&#34; Musso (talk)
 * Lot of inter-class dependencies, lack of separations and "do it all" class (ex: Title has user permissions handling).

Goals

 * Section moved from article


 * Self-documenting, easy to follow code
 * More maintainable code
 * Separable and generic elements (e.g., reuse just the Http class, or just the Parser, or just the DB classes, without pulling in the entire framework)
 * Easier, stabler third party development in the future
 * True unit tests (via, for example, making it easier to mock interfaces)
 * Warning: we will have to break some old interfaces and kill some deprecated things. This will make future versions have more stable interfaces for implementations.
 * Can we standardize the deprecation process too? i.e, when we branch a new MW version, delete anything tagged as deprecated in $version-2 or something like that? Anomie (talk) 14:31, 25 May 2013 (UTC)
 * Didn't that already get decided on the ML a few months ago? or was that only deciding on hwo they should be labeled?. Peachey88 (talk) 07:43, 26 May 2013 (UTC)
 * Make it easier to parallelize and distribute processing; potentially implement parts in other languages
 * Use the same code paths (controllers) from both UI and API code

Requests for comment (RFC)

 * Section moved from article

The requirements for writing an RFC should be made more stringent. An RFC should be written when you have a concrete implementation in mind and when you explicitly intend to realize it fully and deploy it if you get the go-ahead. If not a working prototype, you should at least have some pseudo-code to illustrate your intent.

The current set of RFCs on MediaWiki.org contain a number of RFCs that are: --Ori
 * 1) written by someone who is hoping to motivate other people to actually implement something but who has no intent of implementing it herself.
 * 2) are thin on concrete implementation details, preferring instead to express a vague "wouldn't it be nice if..." aspiration.
 * I think you mean "accepting an RFC" instead of "writing an RFC." It also seems fairly misguided to suggest that a requests for comment process only be used in cases where there's a clear execution path. --MZ

Our RFC workflow is mostly about dumping some text under RFC, we should probably look at the IETF workflow. Some randoms ideas:
 * designate a committee of persons responsible for the RFC process
 * write down: a glossary of terms, process and an overall workflow. Random ideas:
 * We definitely need deadlines for each step (ex: a draft with no activity after 3 months will be dismissed).
 * Acceptance criteria to enter the workflow
 * Each RFC should really have an owner in charge of driving it from idea to implementation/rejection.
 * review the current backlog and clear out the old scruff.
 * Ideally, each RFC should receives a number such as the IETF RFC or the PEP (Python Enhancement Proposal).

Clear separation of concerns

 * Section moved from article

This proposal suggests that most objects in MediaWiki should handle one (and only one) of the following concerns:


 * Values : Plain old PHP objects (POPO) - data is stored and validated internally, but no outside calls (especially network / DB) are done from this object. The object could be mutable or not. The object could contain other POPO objects, but should not reference services/controllers. Should easily serialize to JSON. Should not know or use services.
 * Examples SHOULD be Title, User, etc.


 * Services : Narrow interfaces, could in principle be web services with JSON (see value objects); long life time (application, request). Usually only one instance (but don't use singne instance (but don't use singletons). Stateless.
 * Examples SHOULD be: BlobStore, LinkStore, RCNotifier, JobQueue, etc.


 * Controllers : Models busines logic (typically a "process" or "action", like "rename page" or "block user"); usually local, short lifetime. Stateful. Uses services.
 * examples SHOULD be: Edit (with permission checks, conflict resolution, etc), MovePage, BlockUser, etc.

...with a catch-all for "'misc."' that doesn't fit into the first three.
 * Examples: exceptions, interators, callbacks, and other glue code.

Even with the current (3:07pm) definitions, these terms are somewhat opaque to me. -Ori
 * Me too. And what does this mean for users of these classes: if some function needs a Title object, how does it get the TitleService (or whatever it's called) to be able to create one? Anomie (talk) 14:36, 25 May 2013 (UTC)
 * If Title is a simple Value object, you just create it with new Title. If you need a service to do something to the title, e.g. a PageStore with a movePage operation, your class (which would probably be a controller) would require a PageStore in the constructor and store it as a member. -- Duesentrieb ⇌ 13:27, 26 May 2013 (UTC)
 * Isn't that exactly the opposite of the proposal here, where "Value" objects should not be accessing the database themselves? Anomie (talk) 13:59, 28 May 2013 (UTC)
 * You don't need a factory to create a Title, but yes, you'll need a service to load a Title. But that would not be a TitleService or TitleFactory, it would rather be a PageStore, corresponding to the page table and implementing more or less the functionality of WikiPage, just as a stateless service instead of an active record. -- Duesentrieb ⇌ 18:27, 30 May 2013 (UTC)

Keep aspects/concerns separate. Don't mix these in the same class:
 * Database / storage code
 * UI Code
 * Business Logic
 * HTTP handling
 * ...more?...

An example, though not perfect, is the separation between File objects, Store objects, and MediaHandler classes. This lets us use the same handlers regardless of storage backend, etc.

Where do process caches go? The File/FileRepo split was basically designed around the caches, i.e. File is basically a container for cached information about the file. It's hard to see how a "value object" can have references to rich and useful in-process caching.


 * How does this interact with the bit above about "values" and "services"? Do we end up with many services for each value, one for Database, one for UI, one for Business logic, and so on? Or is a "service" some sort of GenericValueFactory that handles getting all different types of Value objects? Anomie (talk) 14:36, 25 May 2013 (UTC)
 * You really don't need factories for value objects, because there is generally only one implementation. You may need factories for controllers or services when constructing them in a static context. But that should generally only happen in the set up phase.
 * Yes, there would be separate services for rendering, storing, etc, for different things. E.g. we could have a PageContentRenderer interface, and a PageTsore interface, as well as a RevisionStore and LinkStore interface. They may all be implemented by the same class though (e.g. SqlWikiStuffStore). Business logic would usually be covered by a controller, which can be instantiated locally with new ContentEditController or some such. Controllers can also be injected, directly or via a factory, but I don't think that this will often be necessary. -- Duesentrieb ⇌ 13:42, 26 May 2013 (UTC)
 * See above, isn't the point of "Value" objects that they don't access the database? So how do you get the Title object for a particular page without some "factory" that fetches the necessary information from the "storage" code? And from your second paragraph, I guess the answer to my other question is "Yes, there would be several service objects for each kind of Value object." Which seems like potentially a lot of classes. Anomie (talk) 13:59, 28 May 2013 (UTC)
 * Yes, as I said above, you need a service if you want to load a title from somewhere. You don't need a factory to create a title.
 * And yes, there would be different service interfaces for doing different things - having more granularity is the point, it makes code reuse and testing easier. However, also note that just because there may be a PageStore and a RevisionStore interface, that does not mean they can't be implemented by the same class, if that is convenient.
 * Also note that services are not bound to specific kinds of values. A serialization service might implement serialization for several kinds of values (that's quite OK if the values are strongly related, I think).
 * -- Duesentrieb ⇌ 18:27, 30 May 2013 (UTC)

What is the rationale for introducing value objects? This sounds very much like the Content/ContentHandler split, which I criticised during code review. I didn't manage to get a satisfactory answer out of Daniel at the time. To me, encapsulation, i.e. the close association of code with the data it uses, is fundamental to OOP design. It's certainly possible to separate code from data, and that is routinely done in non-OOP languages, but I hadn't heard it discussed as a defensible design principle in OOP. -- Tim Starling (talk) 23:52, 30 May 2013 (UTC)


 * "Value Objects" come from Domain-Driven Design (DDD makes another distinction between "Entities" and "Values", but that's not very relevant here; It does not differentiate between Services and Controllers, a distinction I find useful, but not critical). Keeping Services and Values/Entities separate ("newableas" vs. "injectable") seems to be a best practice for writing testable code
 * There should be two kinds of objects in your application. (1) Value-objects, these tend to have lots of getters / setters and are very easy to construct are never mocked, and probably don't need an interface. (Example: LinkedList, Map, User, EmailAddress, Email, CreditCard, etc...). (2) Service-objects which do the interesting work, their constructors ask for lots of other objects for colaboration, are good candidates for mocking, tend to have an interface and tend to have multiple implementations (Example: MailServer, CreditCardProcessor, UserAthenticator, AddressValidator). A value-object should never take a service object in its constructor (since than it is not easy to construct). 
 * To me, keeping service logic separate from value objects solves three problems:
 * A value object can be constructed on the fly. An "active record" otoh either global state internally, or access to services when constructing it. This means that it's hard to use them as input for tests cases and requires big and complicated fixtures for testing. The ORMRow class recently introduced into core had this problem, we are trying to fix this now.
 * A value object can easily be serialized, converted and passed around. An object bound to services can't. This is a serious problem with the Content interface - it doesn't allow the Content to be a simple value object. For Wikibase, this forced us to have a hierarchy of value objects (Entity, Item, Property) separate from the corresponding Content objects (EntityContent, ItemContent, PropertyContent). The value objects can be passed to and used on the client wiki, while the Content objects can't. This is quite anyoing.
 * A value object does not need to guess which operations are useful. Binding logic to values leads to a random collection of functionality on the record. Title is a good (bad) example for this: it has functionality for moving a page, while the functionality for deleting a page is in WikiPage. Also, it leads to a tangle of dependencies: If an objects knows how to store itself and render itself, it depends on the storage layer code and the output layer code. It seems that splitting the responsibility of objects along the axis of what is being done and how (and what is needed for that) is better than splitting along the axis of what is being acted on. It makes more sense to me for a booking service to be able to process credit cards than for a credit card to know how to process charges.
 * More generally, it seems to me that designing for testability, extensibility and maintainability means departing from some traditional OO principles. Most importantly:
 * Code sharing via inheritance is bad. Use composition. That makes it easier to replace individual components, to mix and match.
 * Binding logic to data (active record pattern) is bad. In traditional OO, the credit card would know how to process a charge, and hair would know how to comb itself. This is inflexible, since the behavior can't be replaced at runtime.
 * I'm still struggling with the old habits myself, though, especially with the inheritance bit. -- Duesentrieb ⇌ 09:53, 31 May 2013 (UTC)


 * When I refactored page moves, I asked Brion where the new UI-independent code should go. He said that Article deals with page contents (i.e. wikitext), and Title does not, so therefore the move page code should go in Title. The Article/Title split seemed arbitrary to me at the time, since an Article could be constructed simply from a Title, making them equivalent in some sense.


 * If it was me designing it from scratch (even the younger, stupider me from 2003), I would have put the split between backend and frontend, which would have led to a Title-like class which included the non-UI functions of Article, so that the choice of where to put storage logic would have been less arbitrary. Yes, it still would have been hair that knows how to comb itself, but you can progressively refactor it to delegate such tasks to service objects. Compare:


 * I'm concerned not just by the extra code length, but also by the potential for complexity in calling code to be daunting for new developers. Both examples above have service objects, both can easily support batch queries. Testability of the service object would be almost identical.


 * Perhaps, a more fundamental problem with the testability of Title is its dependence on localisation. If you provide both Title::newFromText and getNamespace, then you need localisation; or if you provide both Title::makeTitle and getNsText then you need localisation. If Title was a value object, as you describe it, it would be limited to the numeric namespace. Then, if there was a storage backend that needed text namespaces, it would need a different Title class. And there would be no way for a text-only Title class to validate its own contents.


 * I think the role of the "plain old Java object" in existing MediaWiki code is somewhat fulfilled by associative arrays, for example the parameter arrays passed to some constructors, and the data arrays stored in memcached. Plain strings and integers fill that role in other contexts, such as the message keys passed to Message/MessageCache/Language/LocalisationCache, or the URLs passed to ExternalStore::fetchFromURL or FileRepo::streamFile. -- Tim Starling (talk) 02:55, 3 June 2013 (UTC)


 * The issue I see with "active records" like Title is that in order to perform operations like localization, permission checks, or database access, it needs to know how these things are done. There are four options:
 * hardcode the bahavior (resp. hardcode the dependency on a specific service implementation). That's bad for extensibility, and may lead to an aggregation of lots of unrelated functionality in the same class. If we later decide that the dependency must be decided at run time, the only options we have are the ones described below, which are bad.
 * rely on a global registry of some sort for acquiring services. That's bad for testability.
 * require all dependencies to be injected in the constructor. That makes the on-the-fly construction of such objects cumbersome and requires code that needs a Title for any reason to know about all the services a Title may need for any operation. And if Title's dependencies change, all places that create a Title object would need to be updated.
 * inject a Context object instead of individual services. That's bad for re-usability but especially for unit testing: in order to test a Title object, I need to provide a mock context. How do I construct it? What does it need? How do I find out? What if the requirements change? How will the test code notice? The Context object couples the Title object to the rest of the system, making it depend on essentially everything. Here's some discussion of the context class (aka kitchen sink) issue: Breaking the Law of Demeter is Like Looking for a Needle in the Haystack. That approach is of course still better than global state, and can be acceptable in some cases - though not for value objects (because then, they wouldn't be value objects).
 * So: just factoring functionality out of the Title class itself into services doesn't help if the operations are available via the Title class, because this requires Title to somehow get the respective service instances. It would probably a good start and a viable way of slowly refactoring, yes, but it's not sufficient - I think, by itself, it wouldn't even help much with the current issues.


 * In my mind, Title could work like this:
 * Title itself is a value object, with a numeric namespace (and I think we'd also need a Link value object).
 * TitleParser is a service that turns a string into a Title, and knows about namespaces (LinkParser would know about interwiki prefixes, etc).
 * Page would be a value object that knows a pages Title and it's ID (and essentially everythign in the page table).
 * PageStore would be a service that can load Page objects by ID or Title, and could also perform other operations on pages, like update, move or delete. PageStore may use a RevisionStore store for some of these.
 * It may be useful to have a "raw" PageStore and a "high level" PageStore, where the latter would use a LogStore and ChangesStore for logging. Access control could also be performed on this level, using a PermissionManager service.
 * RevisionStore would load Revision objects, and allow for deleting/hiding reivsions, and loading revision content.
 * It would be nice if Revision could have a getContent method; But that would mean the content would always have to be available when a Revision is constructed (or Revision would need knowledge of the RevisionStore). Lazy loading is indeed somethign that does not work with value objects, and makes them a bit less convenient. OTOH, putting the loadContent method into the RevisionStore service allows for batching.
 * PermissionManager would be used for permission checkes on Title (and perhaps Page) objects. Currently, much of the permission system is hardwired in Title.


 * That's 6 classes for the functionality of what we currently have in 3: Title, WikiPage, and Revision. I have omitted some functionality, so let's say it turns out to be 12. Sounds like a lot (four times the classes!) but considering that these 3 classes together have about 10k lines, splitting them up to less than 1000 lines per class seems sensible (i have heard people use 100 (executable) lines per class as a rule of thumb, but I think that's overly strict). -- Duesentrieb ⇌ 13:01, 3 June 2013 (UTC)
 * One issue that you've glossed over in the above is that you just pushed the "how is the i18n service object obtained?" problem into still another service object and then ignored it. How does anything that needs to parse a string or obtain the string version of a title with namespace obtain this title service object, now? And the i18n service object that the title service object needs, for that matter? Anomie (talk) 14:52, 3 June 2013 (UTC)
 * Code that needs to turn a string into a Title needs a TitleParser. It doesn't need an i18n object, since TitleParser already has that, and the calling code shouldn't even know that it is needed. This means that the conversion between string and Title can not happen in a value object. But then, why should it?
 * Generally: Service objects are free to know about many service objects. They get them in their constructor. In the textbook case, the network of service objects is build in the initialization phase of the program by code that either reads or is the configuration. Construction of individual services may be delegated to Builder classes, but that's just so you don't have the entire setup code in one file.
 * This however means loading and instantiation all services always, for every request. Ugh. If we want to avoid that, we could to inject builders instead of proper services (or inject magic stub services). This can be useful and even necessary, but as a general principle, I don't much like this approach.
 * I think it would be OK to live with a limited number of "static entry points" that would indeed have access to a global service registry / factory (which is indeed somewhat like the context object) and that build the part of the service graph that is needed to handle that entry point. In the case of MedaWiki, these would be action classes, hook functions, special pages, and API modules. As things are, we don't have control over their instantiation anyway (or they are static), so the only way to inject anything is via a global registry.
 * This does violate the "no static stuff" and "inject everything" principles; the idea is to strike a balance between the architectural need to isolate components and the practical need for performance (and thus partial initialization) and backwards compatibility (static hook functions aren't going to go away soon).
 * I think strictly confining the access to the service registry to static entry points is a good compromise. And we should further try to reduce the number of such entry points, as far as this is sensible, and the split isn't needed for performance reasons. -- Duesentrieb ⇌ 15:37, 3 June 2013 (UTC)