Architecture guidelines

Known problems

 * Testability is limited due to lack of separation of concerns
 * High levels of abstraction / complexity / non-conformity are leading to bad security (chunked uploads).
 * This seems to be more an example of a bad abstraction / missing separation of concerns -- Gabriel
 * I don't see what the chunked upload security issue had to do with abstraction or architecture or readability. -- Tim Starling (talk) 03:23, 4 June 2013 (UTC)
 * Lack of high-level documentation (how classes fit together)
 * Doxygen let us write documentation that is bound to the source code it documents. We can even build page out of a README file, see File backend doc. Antoine &#34;hashar&#34; Musso (talk)
 * Readability of the overall codebase (where do I find this object that is being used?)
 * Might standardize how we name class: compare AjaxResponse, DatabaseMysql, ParserOptions versus RedisBagOStuff, DuplicateJob, DeleteAction.  PHPUnit uses class names where underscore triggers a new subdirectory: PHPUnit_TextUI_Command is found in /PHPUnit/TextUI/Command.php . That is recommended by PSR-2. Antoine &#34;hashar&#34; Musso (talk)
 * Lot of inter-class dependencies, lack of separations and "do it all" class (ex: Title has user permissions handling).

Goals

 * Self-documenting, easy to follow code
 * More maintainable code
 * Separable and generic elements (e.g., reuse just the Http class, or just the Parser, or just the DB classes, without pulling in the entire framework)
 * Easier, stabler third party development in the future
 * True unit tests (via, for example, making it easier to mock interfaces)
 * Warning: we will have to break some old interfaces and kill some deprecated things. This will make future versions have more stable interfaces for implementations.
 * Can we standardize the deprecation process too? i.e, when we branch a new MW version, delete anything tagged as deprecated in $version-2 or something like that? Anomie (talk) 14:31, 25 May 2013 (UTC)
 * Didn't that already get decided on the ML a few months ago? or was that only deciding on hwo they should be labeled?. Peachey88 (talk) 07:43, 26 May 2013 (UTC)
 * Make it easier to parallelize and distribute processing; potentially implement parts in other languages
 * Use the same code paths (controllers) from both UI and API code

Incremental change
The MediaWiki core should be changed incrementally. Folk wisdom has it that complete rewrites of large sections of code are more difficult than they appear, and are often abandoned. This is in line with how we have done most refactoring so far. The only branches we have been using were for the database schema refactoring in 1.5 and the content handler in 2012.

Introduction of new language features
Features introduced to the PHP language which impact on architecture (see examples below) should be discussed, and consensus reached, before they are widely implemented.

Rationale: acceptance into the PHP core is not a guarantee that a feature is suitable for use in a large extant codebase with many developers. In the past, features have been introduced with caveats or pitfalls that are relevant to us (for example Late Static Binding which can be worked around by using a better design). Experienced MediaWiki developers are in a good position to critically review new PHP features.

Examples of existing PHP features which impact on architecture:


 * jQuery-style method chaining (enabled by PHP 5)
 * Type hinting (5.0)
 * __get magic method
 * Late static binding (5.3)
 * Closures (5.3)
 * Namespaces (5.3)
 * Traits (5.4)
 * Generators (5.5)

Interface changes
An interface is the means by which modules communicate with each other. For example, two modules written in PHP may have an interface consisting of a set of classes, their public method names, the definitions of the parameters, and the definitions of the return values.

For interfaces which are known to be used by extensions, changes to those interfaces should retain backwards compatibility if it is feasible to do so. The rationale is:


 * To reduce the maintenance burden on extensions. Many extensions are unmaintained, so a break in backwards compatibility can cause the useful lifetime of the extension to end.
 * Many extensions are developed privately, outside of WMF Gerrit, so rectification of all extensions in the mediawiki/extensions/* tree does not necessarily address the maintenance burden of a change.
 * Some extension authors have a policy of allowing a single codebase to work with multiple versions of MediaWiki. Such policies may become more common now that we have Long Term Support (LTS) releases.
 * MediaWiki's extension distribution framework contains no core version compatibility metadata. Thus, the normal result of a breaking change to a core interface can lead to a PHP fatal error, which is not especially user-friendly.
 * WMF's deployment system has only rudimentary support for a simultaneous code update in core and extensions.

Requests for comment (RFC)
The requirements for writing an RFC should be made more stringent. An RFC should be written when you have a concrete implementation in mind and when you explicitly intend to realize it fully and deploy it if you get the go-ahead. If not a working prototype, you should at least have some pseudo-code to illustrate your intent.

The current set of RFCs on MediaWiki.org contain a number of RFCs that are: --Ori
 * 1) written by someone who is hoping to motivate other people to actually implement something but who has no intent of implementing it herself.
 * 2) are thin on concrete implementation details, preferring instead to express a vague "wouldn't it be nice if..." aspiration.
 * I think you mean "accepting an RFC" instead of "writing an RFC." It also seems fairly misguided to suggest that a requests for comment process only be used in cases where there's a clear execution path. --MZ

Our RFC workflow is mostly about dumping some text under RFC, we should probably look at the IETF workflow. Some randoms ideas:
 * designate a committee of persons responsible for the RFC process
 * write down: a glossary of terms, process and an overall workflow. Random ideas:
 * We definitely need deadlines for each step (ex: a draft with no activity after 3 months will be dismissed).
 * Acceptance criteria to enter the workflow
 * Each RFC should really have an owner in charge of driving it from idea to implementation/rejection.
 * review the current backlog and clear out the old scruff.
 * Ideally, each RFC should receives a number such as the IETF RFC or the PEP (Python Enhancement Proposal).

Design principles
The MediaWiki code base has evolved over many years. While it currently serves us reasonably well, there were many design decisions that have outlived their useful lifetime. We intend to evolve the codebase toward more modern practices over time, using this document as a focal point for choosing which design patterns and best practices we should adopt in a long term document.

Candidate design patterns for consideration
These are design patterns that have been discussed in the community that need further consideration among MediaWiki architects and experienced developers.

One way to do things
(Borrowed from the Zen of Python / http://www.python.org/dev/peps/pep-0020/ ).

Rationale: code review back log is an issue; things would go quicker if we had canonical patterns for implementating certain things. If we agree on this, we probably need to eliminate some duplication.

Practical consequences: it should be a requirement for merging bold redesigns of existing interfaces that the previous interface be removed or a clear deprecation plan be articulated.


 * I agree we should write MediaWiki 2.0 using python go .NET.Haskell BrainFuck.WhiteSpace. Machine code nodejs
 * Let's bikeshed about whitespace! yay, WhiteSpace!
 * Switch to "more common" schema:  if (blah(blah) && blah) {

Clear separation of concerns
This proposal suggests that most objects in MediaWiki should handle one (and only one) of the following concerns:


 * Values : Plain old PHP objects (POPO) - data is stored and validated internally, but no outside calls (especially network / DB) are done from this object. The object could be mutable or not. The object could contain other POPO objects, but should not reference services/controllers. Should easily serialize to JSON. Should not know or use services.
 * Examples SHOULD be Title, User, etc.


 * Services : Narrow interfaces, could in principle be web services with JSON (see value objects); long life time (application, request). Usually only one instance (but don't use singne instance (but don't use singletons). Stateless.
 * Examples SHOULD be: BlobStore, LinkStore, RCNotifier, JobQueue, etc.


 * Controllers : Models busines logic (typically a "process" or "action", like "rename page" or "block user"); usually local, short lifetime. Stateful. Uses services.
 * examples SHOULD be: Edit (with permission checks, conflict resolution, etc), MovePage, BlockUser, etc.

...with a catch-all for "'misc."' that doesn't fit into the first three.
 * Examples: exceptions, interators, callbacks, and other glue code.

Even with the current (3:07pm) definitions, these terms are somewhat opaque to me. -Ori
 * Me too. And what does this mean for users of these classes: if some function needs a Title object, how does it get the TitleService (or whatever it's called) to be able to create one? Anomie (talk) 14:36, 25 May 2013 (UTC)
 * If Title is a simple Value object, you just create it with new Title. If you need a service to do something to the title, e.g. a PageStore with a movePage operation, your class (which would probably be a controller) would require a PageStore in the constructor and store it as a member. -- Duesentrieb ⇌ 13:27, 26 May 2013 (UTC)
 * Isn't that exactly the opposite of the proposal here, where "Value" objects should not be accessing the database themselves? Anomie (talk) 13:59, 28 May 2013 (UTC)
 * You don't need a factory to create a Title, but yes, you'll need a service to load a Title. But that would not be a TitleService or TitleFactory, it would rather be a PageStore, corresponding to the page table and implementing more or less the functionality of WikiPage, just as a stateless service instead of an active record. -- Duesentrieb ⇌ 18:27, 30 May 2013 (UTC)

Keep aspects/concerns separate. Don't mix these in the same class:
 * Database / storage code
 * UI Code
 * Business Logic
 * HTTP handling
 * ...more?...

An example, though not perfect, is the separation between File objects, Store objects, and MediaHandler classes. This lets us use the same handlers regardless of storage backend, etc.

Where do process caches go? The File/FileRepo split was basically designed around the caches, i.e. File is basically a container for cached information about the file. It's hard to see how a "value object" can have references to rich and useful in-process caching.


 * How does this interact with the bit above about "values" and "services"? Do we end up with many services for each value, one for Database, one for UI, one for Business logic, and so on? Or is a "service" some sort of GenericValueFactory that handles getting all different types of Value objects? Anomie (talk) 14:36, 25 May 2013 (UTC)
 * You really don't need factories for value objects, because there is generally only one implementation. You may need factories for controllers or services when constructing them in a static context. But that should generally only happen in the set up phase.
 * Yes, there would be separate services for rendering, storing, etc, for different things. E.g. we could have a PageContentRenderer interface, and a PageTsore interface, as well as a RevisionStore and LinkStore interface. They may all be implemented by the same class though (e.g. SqlWikiStuffStore). Business logic would usually be covered by a controller, which can be instantiated locally with new ContentEditController or some such. Controllers can also be injected, directly or via a factory, but I don't think that this will often be necessary. -- Duesentrieb ⇌ 13:42, 26 May 2013 (UTC)
 * See above, isn't the point of "Value" objects that they don't access the database? So how do you get the Title object for a particular page without some "factory" that fetches the necessary information from the "storage" code? And from your second paragraph, I guess the answer to my other question is "Yes, there would be several service objects for each kind of Value object." Which seems like potentially a lot of classes. Anomie (talk) 13:59, 28 May 2013 (UTC)
 * Yes, as I said above, you need a service if you want to load a title from somewhere. You don't need a factory to create a title.
 * And yes, there would be different service interfaces for doing different things - having more granularity is the point, it makes code reuse and testing easier. However, also note that just because there may be a PageStore and a RevisionStore interface, that does not mean they can't be implemented by the same class, if that is convenient.
 * Also note that services are not bound to specific kinds of values. A serialization service might implement serialization for several kinds of values (that's quite OK if the values are strongly related, I think).
 * -- Duesentrieb ⇌ 18:27, 30 May 2013 (UTC)

What is the rationale for introducing value objects? This sounds very much like the Content/ContentHandler split, which I criticised during code review. I didn't manage to get a satisfactory answer out of Daniel at the time. To me, encapsulation, i.e. the close association of code with the data it uses, is fundamental to OOP design. It's certainly possible to separate code from data, and that is routinely done in non-OOP languages, but I hadn't heard it discussed as a defensible design principle in OOP. -- Tim Starling (talk) 23:52, 30 May 2013 (UTC)


 * "Value Objects" come from Domain-Driven Design (DDD makes another distinction between "Entities" and "Values", but that's not very relevant here; It does not differentiate between Services and Controllers, a distinction I find useful, but not critical). Keeping Services and Values/Entities separate ("newableas" vs. "injectable") seems to be a best practice for writing testable code
 * There should be two kinds of objects in your application. (1) Value-objects, these tend to have lots of getters / setters and are very easy to construct are never mocked, and probably don't need an interface. (Example: LinkedList, Map, User, EmailAddress, Email, CreditCard, etc...). (2) Service-objects which do the interesting work, their constructors ask for lots of other objects for colaboration, are good candidates for mocking, tend to have an interface and tend to have multiple implementations (Example: MailServer, CreditCardProcessor, UserAthenticator, AddressValidator). A value-object should never take a service object in its constructor (since than it is not easy to construct). 
 * To me, keeping service logic separate from value objects solves three problems:
 * A value object can be constructed on the fly. An "active record" otoh either global state internally, or access to services when constructing it. This means that it's hard to use them as input for tests cases and requires big and complicated fixtures for testing. The ORMRow class recently introduced into core had this problem, we are trying to fix this now.
 * A value object can easily be serialized, converted and passed around. An object bound to services can't. This is a serious problem with the Content interface - it doesn't allow the Content to be a simple value object. For Wikibase, this forced us to have a hierarchy of value objects (Entity, Item, Property) separate from the corresponding Content objects (EntityContent, ItemContent, PropertyContent). The value objects can be passed to and used on the client wiki, while the Content objects can't. This is quite anyoing.
 * A value object does not need to guess which operations are useful. Binding logic to values leads to a random collection of functionality on the record. Title is a good (bad) example for this: it has functionality for moving a page, while the functionality for deleting a page is in WikiPage. Also, it leads to a tangle of dependencies: If an objects knows how to store itself and render itself, it depends on the storage layer code and the output layer code. It seems that splitting the responsibility of objects along the axis of what is being done and how (and what is needed for that) is better than splitting along the axis of what is being acted on. It makes more sense to me for a booking service to be able to process credit cards than for a credit card to know how to process charges.
 * More generally, it seems to me that designing for testability, extensibility and maintainability means departing from some traditional OO principles. Most importantly:
 * Code sharing via inheritance is bad. Use composition. That makes it easier to replace individual components, to mix and match.
 * Binding logic to data (active record pattern) is bad. In traditional OO, the credit card would know how to process a charge, and hair would know how to comb itself. This is inflexible, since the behavior can't be replaced at runtime.
 * I'm still struggling with the old habits myself, though, especially with the inheritance bit. -- Duesentrieb ⇌ 09:53, 31 May 2013 (UTC)


 * When I refactored page moves, I asked Brion where the new UI-independent code should go. He said that Article deals with page contents (i.e. wikitext), and Title does not, so therefore the move page code should go in Title. The Article/Title split seemed arbitrary to me at the time, since an Article could be constructed simply from a Title, making them equivalent in some sense.


 * If it was me designing it from scratch (even the younger, stupider me from 2003), I would have put the split between backend and frontend, which would have led to a Title-like class which included the non-UI functions of Article, so that the choice of where to put storage logic would have been less arbitrary. Yes, it still would have been hair that knows how to comb itself, but you can progressively refactor it to delegate such tasks to service objects. Compare:


 * I'm concerned not just by the extra code length, but also by the potential for complexity in calling code to be daunting for new developers. Both examples above have service objects, both can easily support batch queries. Testability of the service object would be almost identical.


 * Perhaps, a more fundamental problem with the testability of Title is its dependence on localisation. If you provide both Title::newFromText and getNamespace, then you need localisation; or if you provide both Title::makeTitle and getNsText then you need localisation. If Title was a value object, as you describe it, it would be limited to the numeric namespace. Then, if there was a storage backend that needed text namespaces, it would need a different Title class. And there would be no way for a text-only Title class to validate its own contents.


 * I think the role of the "plain old Java object" in existing MediaWiki code is somewhat fulfilled by associative arrays, for example the parameter arrays passed to some constructors, and the data arrays stored in memcached. Plain strings and integers fill that role in other contexts, such as the message keys passed to Message/MessageCache/Language/LocalisationCache, or the URLs passed to ExternalStore::fetchFromURL or FileRepo::streamFile. -- Tim Starling (talk) 02:55, 3 June 2013 (UTC)


 * The issue I see with "active records" like Title is that in order to perform operations like localization, permission checks, or database access, it needs to know how these things are done. There are four options:
 * hardcode the bahavior (resp. hardcode the dependency on a specific service implementation). That's bad for extensibility, and may lead to an aggregation of lots of unrelated functionality in the same class. If we later decide that the dependency must be decided at run time, the only options we have are the ones described below, which are bad.
 * rely on a global registry of some sort for acquiring services. That's bad for testability.
 * require all dependencies to be injected in the constructor. That makes the on-the-fly construction of such objects cumbersome and requires code that needs a Title for any reason to know about all the services a Title may need for any operation. And if Title's dependencies change, all places that create a Title object would need to be updated.
 * inject a Context object instead of individual services. That's bad for re-usability but especially for unit testing: in order to test a Title object, I need to provide a mock context. How do I construct it? What does it need? How do I find out? What if the requirements change? How will the test code notice? The Context object couples the Title object to the rest of the system, making it depend on essentially everything. Here's some discussion of the context class (aka kitchen sink) issue: Breaking the Law of Demeter is Like Looking for a Needle in the Haystack. That approach is of course still better than global state, and can be acceptable in some cases - though not for value objects (because then, they wouldn't be value objects).
 * So: just factoring functionality out of the Title class itself into services doesn't help if the operations are available via the Title class, because this requires Title to somehow get the respective service instances. It would probably a good start and a viable way of slowly refactoring, yes, but it's not sufficient - I think, by itself, it wouldn't even help much with the current issues.


 * In my mind, Title could work like this:
 * Title itself is a value object, with a numeric namespace (and I think we'd also need a Link value object).
 * TitleParser is a service that turns a string into a Title, and knows about namespaces (LinkParser would know about interwiki prefixes, etc).
 * Page would be a value object that knows a pages Title and it's ID (and essentially everythign in the page table).
 * PageStore would be a service that can load Page objects by ID or Title, and could also perform other operations on pages, like update, move or delete. PageStore may use a RevisionStore store for some of these.
 * It may be useful to have a "raw" PageStore and a "high level" PageStore, where the latter would use a LogStore and ChangesStore for logging. Access control could also be performed on this level, using a PermissionManager service.
 * RevisionStore would load Revision objects, and allow for deleting/hiding reivsions, and loading revision content.
 * It would be nice if Revision could have a getContent method; But that would mean the content would always have to be available when a Revision is constructed (or Revision would need knowledge of the RevisionStore). Lazy loading is indeed somethign that does not work with value objects, and makes them a bit less convenient. OTOH, putting the loadContent method into the RevisionStore service allows for batching.
 * PermissionManager would be used for permission checkes on Title (and perhaps Page) objects. Currently, much of the permission system is hardwired in Title.


 * That's 6 classes for the functionality of what we currently have in 3: Title, WikiPage, and Revision. I have omitted some functionality, so let's say it turns out to be 12. Sounds like a lot (four times the classes!) but considering that these 3 classes together have about 10k lines, splitting them up to less than 1000 lines per class seems sensible (i have heard people use 100 (executable) lines per class as a rule of thumb, but I think that's overly strict). -- Duesentrieb ⇌ 13:01, 3 June 2013 (UTC)
 * One issue that you've glossed over in the above is that you just pushed the "how is the i18n service object obtained?" problem into still another service object and then ignored it. How does anything that needs to parse a string or obtain the string version of a title with namespace obtain this title service object, now? And the i18n service object that the title service object needs, for that matter? Anomie (talk) 14:52, 3 June 2013 (UTC)
 * Code that needs to turn a string into a Title needs a TitleParser. It doesn't need an i18n object, since TitleParser already has that, and the calling code shouldn't even know that it is needed. This means that the conversion between string and Title can not happen in a value object. But then, why should it?
 * Generally: Service objects are free to know about many service objects. They get them in their constructor. In the textbook case, the network of service objects is build in the initialization phase of the program by code that either reads or is the configuration. Construction of individual services may be delegated to Builder classes, but that's just so you don't have the entire setup code in one file.
 * This however means loading and instantiation all services always, for every request. Ugh. If we want to avoid that, we could to inject builders instead of proper services (or inject magic stub services). This can be useful and even necessary, but as a general principle, I don't much like this approach.
 * I think it would be OK to live with a limited number of "static entry points" that would indeed have access to a global service registry / factory (which is indeed somewhat like the context object) and that build the part of the service graph that is needed to handle that entry point. In the case of MedaWiki, these would be action classes, hook functions, special pages, and API modules. As things are, we don't have control over their instantiation anyway (or they are static), so the only way to inject anything is via a global registry.
 * This does violate the "no static stuff" and "inject everything" principles; the idea is to strike a balance between the architectural need to isolate components and the practical need for performance (and thus partial initialization) and backwards compatibility (static hook functions aren't going to go away soon).
 * I think strictly confining the access to the service registry to static entry points is a good compromise. And we should further try to reduce the number of such entry points, as far as this is sensible, and the split isn't needed for performance reasons. -- Duesentrieb ⇌ 15:37, 3 June 2013 (UTC)

Just-in-time abstraction
This is another way of saying "You ain't going to need it"

Writing unit tests for all code immediately requires more abstraction right away. -- Duesentrieb ⇌ 13:42, 26 May 2013 (UTC)


 * As Robert Martin says "An axis of change is only an axis of change if the changes actually occur. It is not wise to apply the SRP, or any other principle for that matter, if there is no symptom." Abstraction gives you flexibility in a particular direction, the challenge (often not met) is to identify the axes of change in advance. If you know that you need unit tests, fine. The point is to avoid abstraction that gives you flexibility in unnecessary directions. -- Tim Starling (talk)


 * But how do you condense this into concrete recommendations? Anticipating the directions in which some code will be extended is usually a matter of experience. -- Ori
 * Isn't that basically what this is saying? Until you have the experience in how this particular thing needs to be abstracted, you're just guessing. Robla told a story about some bus tunnel built in the 80s, where they thought they were eventually going to want to run rail through it as well so they ran track. Then a few years ago when they actually built the rail, they found the tunnel had a different gague... Anomie (talk) 14:39, 25 May 2013 (UTC)


 * "Do not introduce abstraction in advance of need unless the probability that you will need the flexibility thus provided in the short term is very high (>80%)" Obviously human judgement is needed at some point, but I think we are putting a foot in the ground by asking for a high probability of need rather than a conceivable need.-- Tim Starling (talk)


 * Yes, I think this works. ContentHandler is a good example: there was pre-existing evidence that the notion of "content" merited being made abstract -- namely, w/image content and stuff in the mw namespace. -- Ori


 * I agree that premature abstraction is bad. I would argue though that there are some boundaries that should not be crossed, and require decomposition per default (e.g. the value object / service boundary, or the database code / ui code boundary). Any abstraction beyond that should and can be done "just in time"; Note however that the requirement to write (actual) unit tests may require some abstraction right away that would not be needed otherwise. -- Duesentrieb ⇌ 13:46, 26 May 2013 (UTC)
 * OTOH, if your "unit" tests are forcing you to add layers of abstraction that are not otherwise necessary, then perhaps your units are too small. Anomie (talk) 14:06, 28 May 2013 (UTC)

There is a counterpoint, which is that we know from experience that abstraction left to be done "just in time" sometimes isn't done at the time it is actually needed. Instead, the second developer hacks something up on top of whatever the first developer left, in the ugliest possible way. Consider the introduction of ApiEditPage without appropriate refactoring of EditPage, or the introduction of action=render without introducing the non-static methods that would be needed to allow generation of absolute URLs for links, dependent on ParserOptions. Also, I think it's fair to say that some developers just don't have a head for architecture and can't reasonably be expected to rearchitect a module in the course of adding a feature to it. Maybe this implies a more moderate wording for this policy, along the lines of "avoid unnecessary abstraction" rather than "just-in-time abstraction". -- Tim Starling (talk) 14:02, 26 May 2013 (UTC)
 * I'd also hope that code review and RFCs actually being useful could help with the "don't have a head for architecture" problem. Anomie (talk) 14:06, 28 May 2013 (UTC)
 * "Just-in-time" is confusing anyways, since in the context of computing the term is now firmly associated with compilation. --Ori.livneh (talk) 02:32, 2 June 2013 (UTC)

Type Hinting

 * Type hinting (both in parameter lists and in doc comments) avoid errors
 * PHP provides __call which allows for proxy objects, i.e. objects which duplicate the methods of a class but do not actually inherit from that class. Consider awesome backwards compatibility hacks like WikiPage and StubObject. Type hinting breaks this. So I am hesitant to introduce it to public interfaces. -- Tim Starling (talk)
 * how does type hinting break this? -- Duesentrieb ⇌ 13:49, 26 May 2013 (UTC)

For example:

Breaks if RequestContext::getMain->getUser is migrated to return a StubUser instead of a User. -- Tim Starling (talk) 01:10, 30 May 2013 (UTC)
 * Oh right, subs can't "dynamically" implement interfaces. I want Java's proxy objects :) -- Duesentrieb ⇌ 18:19, 30 May 2013 (UTC)


 * Type hinting also prevents b/c migration hacks like

If a core class has child classes in extensions, the type hints cannot change without generating parse errors in the derived classes. So this prevents the type of a parameter from being migrated.

Since we are discussing renaming every class and completely changing all the interfaces, I don't think support for b/c hacks can be easily dismissed. The success of any renaming/refactoring project may depend on them being available. -- Tim Starling (talk)


 * Personally, I don't advocate any big rewrites. I'm mostly thinking of which guidelines should apply to new code.
 * But this is an important point: we should make clear which rules should apply to new code, and if and how they can be applied to legacy code, and which b/c strategies are to be employed, and which pit falls are to be considered. -- Duesentrieb ⇌ 13:49, 26 May 2013 (UTC)

We could do handle type hinting by using interfaces. The User object is probably not the best example since it both represents the values of a User and let you act on the values just like a controller would do. Anyway here is an example of such an interface:

-- Antoine &#34;hashar&#34; Musso (talk)
 * This is like the trick I used when I made WikiPage. Lots of "Article" type hints were just swapped out for "Page" interface hints. I'm not convinced the extra trick needed negates the utility of type hints. Aaron (talk) 05:26, 4 June 2013 (UTC)
 * Exactly. And I love how that makes the function self documenting and nicely throw errors whenever one mis use a function. Antoine &#34;hashar&#34; Musso (talk) 20:10, 5 June 2013 (UTC)

Refactoring best practices

 * It should be easy to identify what 'concern' a class is dealing with (without even reading the code). Naming convention or phpdoc markers ?
 * The historic lack of namespaces in PHP led to the proliferation of "bags o' vaguely related things" classes. Now we have namespaces, but they look a bit alien in the MediaWiki codebase -- can we decide on the role of namespaces in future code? Are they going to be a bit of cleverness that is tolerated, or a design pattern we encourage / enforce? Anonymous functions ditto -- there are a lot of MyExtensionHook classes that are more concisely written as  --Ori
 * For abstract classes, have clear documentation on what is expected of children
 * Along the lines of the comment about type hinting above: there is a set of tricks and techniques for performing complex migrations "in-flight" that should be documented, like using a temporary configuration var that selects the old or new implementation.

Let's please try to focus on a) guidelines for new code and b) guidelines for careful, local, just-in-time refactoring. Let's avoid planning or even specing out a huge refactoring of the entire codebase. -- Duesentrieb ⇌ 13:51, 26 May 2013 (UTC)

Context and object lifetime

 * Registration systems in which class names are used instead of object instances (ie special pages or API modules) prevent control over object lifecycle and thus proper dependency injection
 * On the other hand, using objects means that you have to actually instantiate a possibly large number of classes every time when only a few will actually be used—wastes time and memory. --Brad
 * I agree that it would be bad to instantiate the entire object graph in the setup phase. There should be a limited number of static entry points that just construct the things they need, using the relevant configuration, factories, etc. E.g. an API module or a hook function may construct/request the services and controller it needs just in time.
 * In other cases, lazy instantiation can be implemented using the holder or factory pattern, though I find that a bot awkward/annoying. -- Duesentrieb ⇌ 13:56, 26 May 2013 (UTC)
 * Singletons are problematic for unit testing, and lifetime control generally, but they are an improvement over global variables
 * [for an example we should refactor the Wiki class to actually encapsulate configuration state, so multiple object trees could exist in one process -- makes cross-wiki stuff more feasible]
 * how are they an improvement? For all intents and purposes, they are global variables. -- Duesentrieb ⇌ 13:56, 26 May 2013 (UTC)


 * Singletons are better than globals because:
 * They allow for lazy initialisation. This improves performance. Lazy initialisation can be used to reduce the overhead required for backwards compatibility, by deferring the load of backwards compatible interface code until it is actually required.
 * Singleton requests can easily be flagged with wfDeprecated.
 * A global declaration provides a reference, rather than a copy of an object handle, which has negative performance consequences as described in my recent wikitech-l post.
 * Singletons could in principle be registered in some process lifetime object, to allow them to easily be discarded in bulk at the start of each unit test run. By a singleton, we mean an object with a global scope which is retreived with a function or static method. Infinite lifetime is not inherent to the concept.


 * -- Tim Starling (talk) 01:06, 31 May 2013 (UTC)
 * Such a registry might be nice anyway, so things like ForkController could more reliably destroy anything that could cause trouble before fork. Unit testing would be another use, along with a code that wants to change wiki context (if we decide to do that, e.g. frwiki -> enwiki). Aaron (talk) 05:30, 4 June 2013 (UTC)

Impact on casual contributors

 * Some rules can be difficult for newbies, like requiring unit tests
 * help them write their tests!
 * don't be afraid to refactor in code review
 * Differ between (hard)core mediawiki in test case requirements vs for instance non-deployed extensions
 * Advice a good IDE/toolset, because that might help
 * Open source, please! --Brad


 * More separation makes code easier to understand and extend; understanding huge classes like User or Title is daunting.
 * On the other hand, mazes of tiny little classes that all depend on each other are difficult to understand too. We need a middle ground. --Brad
 * the win is usually when dependencies are one-way and limited. Avoid mazes but go for little clouds.
 * Tell users to be proactive about their changes when required (don't leave them hanging)
 * We can keep settings for most popuplar IDEs in source control, allowing contributors to start quickly.

Use dependency injection for external resources
If you initialize connection objects to external resources somewhere deep in your code, it makes writing unit tests very difficult and expensive. Possible pattern:

^ This makes it very easy to write unit tests that use PHPUnit mock objects --Ori


 * Passing in NULL like this is discouraged as it makes the signature less clear
 * The tight coupling is still there
 * Production code will end up being too lazy to properly inject dependencies
 * The assumption is that the dependencies are properly instantiated by the class itself and callers can and should be lazy; the ability to inject them is put in place specifically for unit tests. --Ori
 * In that case there indeed still is tight coupling and no poper dependency injection. So most problems are not solved
 * Yes, you're right: this partially solves the testing problem but misses an opportunity to make the implementation more flexible. --Ori

I'm not really on board with this and may need some more convincing. Daniel showed a code snippet from Wikibase yesterday with half a dozen constructor parameters. I am not sure it is worthwhile making unit tests easier by making everything else several times harder and uglier. I am happier at this point with the idea of a single context object passed to the constructor, even if it does make unit testing slightly more difficult.

The problem with passing many dependencies as formal parameters to constructors is that it is difficult to know in advance how many will be required, especially if you call hooks. Wrapping up all the context into a single object makes it easier to extend.


 * But it makes it hard(er) to substitute individual bits; you always have to provide everything that the object might use from the context.
 * Example: IContextSource has a getSkin method. So if I want to provide the contextual Title and User to something that takes an IContextSource, I also have to somehow provide a Skin. That doesn't make me happy...
 * It would still be an improvement over the current situation though -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)

I'm not saying that all constructors should use the same context class, just that I prefer a single context parameter in constructors over a dozen.
 * I agree that having many parameters in the constructor is ugly. A retort I got when complaining about this is that a class needing many different services is an indication of the class doing too much. Split up the class and have less responsibilities per class, then you need fewer services in the constructor - or so they say. I can't say yet whether this always holds true, but so far, it seems like a reasonable rule of thumb. -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)


 * A constructor that requires a dozen parameters is constructing a class that shouldn't exist. Collapsing the signature by using a Context object buries the problem without fixing it. A constructor's signature really is the appropriate place in which to enumerate a class's external dependencies, I think. --Ori.livneh (talk) 01:28, 2 June 2013 (UTC)


 * Code changes and grows. Functions which did simple things sometimes need to start doing more complex things. And sometimes even the simplest code needs several kinds of request state. I don't think that enumerating a class's dependencies in the constructor is flexible or conducive to growth. And I think that requiring even the smallest snippet of UI code with complex dependencies to be split up into 5 or 10 classes, each with voluminous doc comments and unit tests, would be a drain on staff resources and a barrier to volunteer entry. -- Tim Starling (talk) 06:40, 3 June 2013 (UTC)
 * Constructing service objects is hard because of all the dependencies, especially since these dependencies change change and grow, as you say. But this is solved by treating the services as "injectables": they are constructed in exactly one place, and then passed around. If you run into the problem of having to construct a service that asks for 12 parameters, and now need to figure out how to get all these objects, then you should probably not be constructing that object there, but just ask for it in the constructor of your class.
 * I undermined this principle somewhat by suggesting the notion of controller objects for modeling tasks/processes. Controllers may have a lot of dependencies, since they basically are services with smaller scope (and perhaps some state). If a controller ends up being very inconvenient to instantiate, it should either be created using a factory or turned into a request-scope service in order to bring the number of instantiation points down to one. -- Duesentrieb ⇌ 13:18, 3 June 2013 (UTC)
 * Isn't that exactly the opposite of what you complain about above, where having singletons or global factories or the like makes things impossible to test? Also, it seems to me that saying your class should take the already-constructed service object in its constructor instead of all that object's dependencies is just pushing the "how do I locate dependencies?" problem onto all your callers. And then, presumably, they'd push it to their callers. Where does this process end, and how does it end up different from the "context object" concept you argue against? Anomie (talk) 14:59, 3 June 2013 (UTC)
 * The service construction is indeed pushed up the stack as far as possible, ideally all the way into init.php. Well, perhaps that would be a bit too far. But at least into some function that can be considered "top level", a "static entry point" (see further up on this page). This way, all classes further down the stack have all their dependencies cleanly injected, and can easily be tested in isolation. The further "up" we can push this, the more "clean" classes we have. But of course, it has to end somewhere, see my comments about initialization in the section.
 * Services having "request-scope" is different from having global instances / singletons in two important respects: You can't (or at least shouldn't) access such an object at will in random code using some global variable, this is only done in a static entry point (ideally, only one: index.php). "request-scope" merely refers to the lifetime of the object (perhaps I should have just referred to "persistent services" or "long lived objects"). Long lived service objects don't violate the idea of modularity / isolation, where each dependency must (or should ideally) be declared in the constructor. -- Duesentrieb ⇌ 15:51, 3 June 2013 (UTC)
 * So instead of having request globals, you simulate them by having everything everywhere take a laundry list of paramaters. This doesn't strike me as much of an improvement. Anomie (talk) 13:57, 5 June 2013 (UTC)
 * Access to globals means everything depends on everything. Having a context object means everything depends on everything. Injecting a "laundry list" of dependencies means you list exactly what you need, when you need it.
 * Requiring all dependencies to be provided to the constructor (which is called in one place, in case of a service instance) means the service depends only on the things it actually needs and uses, and these are well defined and documented.
 * The "laundry list" can be kept short by reducing the responsibilities of the service. To which level this is taken is a matter of taste and practicality, but in general, the number of dependencies is lower when the responsibilities of the service are more narrow.
 * Basically - a service (or any class) needs to collaborate with others to do its job (that is, it depends on other classes). We can hardwire and hide these dependencies using globals, make them dynamic but still hidden (and effectively universal) using context objects, or we can declare them and make them explicit. Which option do you prefer? -- Duesentrieb ⇌ 14:29, 5 June 2013 (UTC)
 * Again, you're pushing problems up the call stack and then ignoring them. If you split one service into several to reduce its dependencies (and is this really a logical split, or is it just work around the huge constructor parameter list?), then callers that need those services have more dependencies. So at some level you're going to wind up with classes that have a constructor needing dozens or hundreds of these services (or you'll punt and "allow" these to use globals). You're also increasing the load on the programmer, as now they have even more service classes to know about.
 * Also, if changes mean your service-with-explicit-dependencies needs to add a dependency, you have to track down all of its callers, and then possibly all of their callers, etc to add it. Including third-party extensions that might be using the service. And, since the constructor is changing, you'll probably have to adjust any subclasses too. Or you might avoid all this by making yet another service. Or, I suppose, you might be able to have the new methods individually take the new dependency as a parameter.
 * Are the benefits worth the drawbacks? I'm not so sure. Anomie (talk) 13:52, 6 June 2013 (UTC)

(reset indent) Anomie, you seem to assume that "callers" need to instantiate services. They don't. Let me given an example:

So my PageStore implementation, DBPageStore, depends on a DBConnectionManager and a RevisionStore. Now I discover that this is not sufficient, I also need a BlobStore and a CompressionHandler. Two new dependencies for my DBPageStore. So, what needs to change? Only one place, namely ServiceRegistry::getPageStore, which needs to pass two more parameters to DBPageStore's constructor. That's it. The dependency of PageMover didn't change, it only depends on the three interfaces it originally depended on (since it checks permissions, changes a page and logs the result, it needs a permission manager, a page store and a log store).

Now, sublcasses of DBPageStore would need to change, but if we avoid code-reuse via subclassing, the hierarchy is going to be shallow and subclasses few.

If we decide to split a service interface, then yes, a lot of code needs to be updated (just like now). If we keep interfaces small, splitting an implementation along the interfaces it implements wouldn't require any code updates though.

With the design I outlined here, I do not see how callers have to know about so many services, or need frequent updates, or how dependencies would accumulate to huge constructor lists. This doesn't happen.

Of course, real life isn't this simple. But if you have control over the construction of the objects you are dealing with, then you can use this pattern to isolate them. If you don't have control over instantiation (e.g. in an API module or a SpecialPage), then you have a "static entry point" and need to "somehow" get a service manager to boostrap again. That's the only place where accessing a global would be ok, and that would only be done to get access to the service registry. And the respective object should offer an alternative way of providing the servies it needs, for use during testing. -- Duesentrieb ⇌ 19:25, 6 June 2013 (UTC)
 * So now anything that wants to move a page needs to have this "RequestDispatcher"? Which is basically a sort of "context object", exactly as you keep arguing against. I give up, it's impossible to discuss this if you can't even follow your own design. Anomie (talk) 13:14, 7 June 2013 (UTC)
 * So never mind whether I'm consistent in my arguments (I try to be, but perhaps I'm not). Let's just look at the proposed design. What problems do you see with it? what advantages? Do you think it would be a good model when starting a new project from scratch? Do you think it's applicable to MediaWiki?
 * Below are a few clarifications I feel could be helpful. But don't let them distract from the questions above. These are the important ones. The points below are merely meant to clear up unclear issues; maybe they are not helpful at all.
 * The request dispatcher is what handles http requests and dispatches them to the respective handlers, based on the desired action in the request params.
 * Code that wants to move a page as a high level operation (i.e. including permission checks, logs, etc) needs a PageMover. Ideally, this is provided to the constructor. This could be constructed directly if we have all knowledge needed for that, but this is bad for testability (tight coupeling). Or it could come from a Builder or Factory, if desired (the latter means more classes, but less dependencies for the original caller). But it's cleaner and probably simpler to just ask for it in the constructor.
 * Code that wants to move a page as a low level operation (just rename it) needs a PageStore.
 * The RequestDispatcher is not like a context object, but it uses something like a context object: the service registry. This is a concession to performance requirements: we only want to initialize stuff that we actually need to handle a specific kind of request.
 * Some sort of registry is needed on the top level for bootstrapping. Ideally that's only in index.php, but realistically, it's anywhere you have static methods or no control over instantiation. In MediaWiki, that's a lot of places. Which is exactly the problem.
 * The idea is to minimize the number of places that have or need access to the "kitchen sink", thus increasing the number of nicely isolated classes that can be easily understood, replaced, tested and reused. It well never be 100% (you always need some bootstrapping code), but 99% is doable. For MediaWiki, 50% would be a great improvement.
 * -- Duesentrieb ⇌ 20:47, 7 June 2013 (UTC)

If you don't think your class requires RequestContext::getUser, and don't want to set up a user in a test, why not use an IContextSource which throws an exception when getUser is called on it? That way, if you are proved wrong and a User is actually required, you can add one without adding constructor parameters. -- Tim Starling (talk)

I see your points, but now you have to step through the code to see what it does in fact require. I'll try to refactor a class that uses optional constructor parameters to take a single context object to see how well it works. --Ori

Perhaps a compromise would be to say that individual parameters are preferable, but a context object ("service bundle") is acceptable for convenience. The downsides should be made clear though. -- Duesentrieb ⇌ 14:05, 26 May 2013 (UTC)

Things to Read

 * Writing Testable Code on the Google Testing Blog. Concise overview of dos and don'ts.
 * Actually, we can just take that blog post as our architecture guide. It should do. -- Duesentrieb ⇌ 09:26, 31 May 2013 (UTC)