Architecture focus 2015

NOTE: THIS IS AN INCOMPLETE DRAFT! At the meeting at the Lyon Hackathon, 2015-05-24, we identified several key points we think should serve as guiding principles and high level tasks for the development of the MediaWiki platform for the foreseeable future.

Content Representation
How we represent wiki content is an essential question. One the one hand, support for mode and more new kinds of content are added to MediaWiki, using the ContentHandler mechanisms as well as other means. The shift away from editing wikitext markup towards visual editing allows and requires us to re-thing how we want to store and manage textual page content, as well as meta-data such as page categories or media licenses.

Over the next months, we should survey the kinds of content we currently support and want to support in the future, and assess whether the current mechanisms we have for managing different types of content are sufficient. We also need to establish a way to manage multiple different kinds of content together as one page or revision (and perhaps add the notion of sub-revisions), see below.

Multi-Content Revisions
Making more kinds of wiki content machine readable and machine editable requires us to move storage of these kinds of content out of wikitext, where is is currently inlined, typically as template parameters, magic words, or magic links like categories. In addition to that, we need a place to store derived content, such as rendered HTML for different target platforms.

Over the next months, we should establish a generic storage interface for storing arbitrary blobs (ideally in a content-addressable way). On top of that, we should establish a lookup service that associates any number of such blobs, along with information about their role, content model and serialization format, with any given revision. This would allow us to manage multi-part content (attachments) as well as derived content with minimal disruption, though it may not be possible to avoid a breaking change to the XML dump format, if we want to include multiple content objects per revision there.

Generalized Transclusion
MediaWiki currently features several transclusion mechanisms (image, template, special page, parser function, wikidata usage, dynamic injection of graphs, etc). The transclusion mechanism should be generalized, and the interfaces involved should be streamlined to allow a content representation based on composing elements of different kinds.


 * improve current codebase
 * allow late content assembly

Rationale: being able to render, store, and use bits and pieces of page content individually should improve performance, and make us more flexible in regards to which content can be used where, and how.
 * HTML-based transclusion
 * Late content assembly

Smart Caching
Pushing the assembly and rendering to the edge of the cluster, or even to the client, will improve our ability to scale horizontally.
 * Late content assembly / widgets
 * CDN

Modularity and Testability
Rationale: Modularity improves maintainability and reusability, as well as testability. Having better tests will allow more confident changes, and thus speed up development. Improving modularity and testability on all levels is key to achieve the other goals mentioned here.
 * Dependency Injection
 * Interface segregation
 * Unit testing vs. integration testing

Service Oriented Architecture
Rationale: the ability to move components to separate locations / hardware adds another degree of flexibility and scalability.
 * RestBase & co
 * No more LAMP?! What about 3rd party installs on shared hosting?

Client Diversity
Rationale: improving our handing of different locales and devices is key to making content available to more people in more regions and languages. WE need to improve support for this aspect of content delivery especially with respect to caching.
 * Different rendering for different devices
 * Localized renderings of neutral content
 * Multilingual content

Remove Assumption

 * Do not assume wikitext (or any text)
 * Do not assume information is local
 * Do not assume information is static

Rationale: In general, dropping assumptions allows more freedom. In particular, dropping these assumptions is necessary to achieve the goals described above.