Wikimedia Engineering Architecture Principles

The following principles and requirements defined by the Wikimedia Technical Committee SHOULD guide all Wikimedia engineering endeavors. The principles are grouped by motivation.

Application
The below principles are intended to guide engineering decisions on all levels, from detailed core review to high level RFCs. Any new code we we and any new system we design should be checked against these principles. It is however acknowledged that not all existing code is compliant with all the principles. Teams working on such code should make an effort to improve compliance as far as reasonably possible. Prototype projects do not have to comply to with all principles immediately, but have to be made compliant before they are made available to the public, including as "beta" or "pilot" functionality.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Any proposed change to the code or design that violates a MUST principle is to be rejected. Any violation of a SHOULD principle requires documentation of the rationale for violating the principle. When existing code is discovered to violate a MUST or SHOULD principle, steps for making the code compliant with the architecture principles need to be planned.

Principles
 The architecture principles are derived from strategic and operational goals of the organization, as understood and interpreted by TechCom. This document refers to these goals to document the motivation for the architecture principles.

Product strategy documents, for reference:


 * Movement Strategy outcomes and report (Phase One, September​​ 2017)
 * Product Directions (Wikimedia Technical Conference 2018, Toby Negrin)

Points of View Essays:

Trust:


 * Reliability
 * Transparency
 * Accountability

Experience:


 * Form Factor
 * Rich Content
 * Contributors
 * Customization
 * Discovery

Scale:


 * Community
 * Content
 * Resilience
 * Ubiquity

Augmentation:


 * Content Curation
 * Content Generation
 * Governance
 * Machine Translation

Culture:


 * Inclusion
 * Language
 * Content Gaps

Tools:

To allow users to consume, create, and interact in a form suitable for their devices, with the connectivity they have, in a language they speak,
 * Tools For Developers
 * Tools For Organizers
 * Tools For Moderators


 * software that interacts with users SHOULD be designed to make key functionality available on devices with a variety of capability and restrictions, as well as potentially limited connectivity.
 * software that interacts with users MUST support internationalization. Internationalization mechanisms SHOULD be consistent across platforms.
 * software that interacts with users MUST consider accessibility concerns and SHOULD follow accessibility guidelines.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and the Product Directions. They address issues discussed throughout the Point of View essays, especially the ones on Inclusion, Content, Ubiquity, and Form Factor, as well as the one on Tools for Moderators.

To empower contributors to collaboratively grow and curate content, and to build the tools that they need to do so,


 * our software SHOULD provide APIs and libraries that empower the community with ways to develop workflows using scripting languages, and ensure safety and maintainability of custom scripts.
 * software that manages or interact with user editable content SHOULD support close integration of various kinds of content.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes, especially the idea of an open and adaptable system, as well as the rich content types and content flexibility goal set by Product Directions. They address issues discussed in the Reliability, Resilience, Inclusion, and Contributors essays. The need to empower the community to contribute code and build their own tools is stressed especially in the Tools for Developers chapter, while the Content Curation chapter calls for communities to contribute to training AI systems.

To provide public APIs that allow efficient interaction with wiki content, as well as provide data that can be easily processed and reused in bulk,


 * for all interactions defined on the abstract domain model, stable public APIs MUST exist. APIs geared towards a specific user interface MUST be considered part of the component that implements that user interface, and MAY be considered private to that component.
 * our software SHOULD be designed in a way that makes all content and public meta-data available as structured data or media with semantic markup (annotated HTML).
 * data we offer for re-use MUST use clearly specified data schemas and SHOULD be based on widely used open standards.
 * public APIs SHOULD allow access to content with high granularity.
 * APIs SHOULD be designed to avoid any need for clients to process wikitext.
 * data formats and APIs that provide access to user generated content SHOULD be designed to ensure verifiability through the integration of provenance information.
 * data formats and APIs that provide access to user generated content SHOULD be designed to provide easy access to all necessary licensing information.

Motivation: These principles aim at the Knowledge as a Service goal set by the Movement Strategy Outcomes and the Product Directions, specifically for tool maintainers and third party developers. They address issues around the creation of specialized user experiences mentioned throughout the Point of View essays, especially in the Form Factor chapter of the Experience topic. They also touch upon issues related to verifiability and licensing, as discussed in the Reliability essay, and the need for distributing our content though APIs and as dumps, as discussed in the Resilience essay.

To provide an open-source software stack that can be easily used, modified, and extended by others,


 * software components SHOULD be designed to be reusable, and be published for re-use.
 * software that exposes public interfaces (as libraries, frameworks, hooks, or APIs do) SHOULD be subject to release management with clear and consistent versioning. Any breaking changes to such interfaces then MUST be announced in a timely and predictable manner over relevant channels.
 * any elements scheduled for removal from a stable public interface MUST be documented to be deprecated beforehand, and SHOULD be kept for backwards compatibility for a reasonable time.
 * our software architecture SHOULD be modular, with components exposing narrow interfaces, to allow components to be replaced and refactored while maintaining a stable interface towards other components as well as 3rd party extensions.
 * our software SHOULD be built on top of reliable, documented, well-maintained open-source components.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and the Product Directions in a bread sense, enabling third party developers to re-use our software components for their own purpose. The need for a modular architecture is also evident in the call for specialized user experiences throughout the Point of View essays, such as the one on Rich Content.

To maintain a code base that can be modified with confidence and readily understood,


 * our software architecture SHOULD follow explicitly specified domain models that define relevant entities and actions (nouns and verbs), to facilitate clear communication between software components as well as among people.
 * all code SHOULD be designed for testability.
 * the software we develop MUST provide a test suite, which SHOULD be sufficiently detailed, comprehensive, and noticeable to provide confidence for developers to see that their changes did not break anything.
 * comprehensive documentation SHOULD be maintained along with the code.

Motivation: These principles aim at the goal of being Extensible and Sustainable and the need for fast and confident deployment defined in the Product Directions. More generally, these principles reflect best practices of software engineering processes.

To provide a web application that can be freely used to collaboratively collect and share knowledge,


 * the MediaWiki stack SHOULD be easy to deploy on standard hosting platforms.
 * small MediaWiki instances SHOULD function in low-budget hosting environments.
 * MediaWiki SHOULD be easy to install and upgrade without much technical knowledge, and without the risk of losing content or disrupting operation.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and the Product Directions, working to provide others with the tools to collect, curate, and share knowledge. More generally, they aim at the values of the Free Software Movement, to allow others to run, study, modify, and share our software.

To ensure availability and performance of WMF projects through scalable and resilient system design,


 * horizontal scalability SHOULD be a design goal on all levels of the architecture, especially for storage and query mechanisms.
 * our software and infrastructure SHOULD be designed to be resilient against spikes in demand and failure of backend systems.
 * services and APIs SHOULD be designed to allow the identification of read-only and read-write HTTP requests to optimize routing and caching.
 * storage and caching systems SHOULD be designed with a distributed multi-datacenter architecture in mind.
 * system design and technology choice SHOULD aim to reduce operational overhead and the cost of change management.

Motivation: These principles aim to guarantee good platform uptime as required by the Product Directions, by addressing issues discussed in detail in the essays under the Scale topic.

To ensure the data integrity of the content on WMF systems, and protect the privacy of our users,


 * our software systems SHOULD be designed to only collect data we need, and retain it only as long as necessary.
 * our software and infrastructure MUST be designed in such a way to prevent unauthorized access to sensitive information.
 * our system architecture SHOULD isolate components to reduce attack surface, and to minimize the the impact of individual components being compromised.
 * resilience against data corruption SHOULD be a design goal for our system architecture, and be built into the software we write.
 * tools and processes SHOULD be designed to allow us to be responsive as well as proactive in ensuring security.
 * our deployment infrastructure and dependency management SHOULD make it easy to keep system components up to date.
 * our deployment infrastructure SHOULD make it easy to change configuration settings without disruption.

Motivation: These principles aim to guarantee good platform uptime as required by the Product Directions, by addressing issues discussed in the essay about Scale. They also aim to protect our users' privacy and reflect the best practice of system administration.

Additional Notes and Considerations (not authoritative)
This section contains some notes and considerations that do not fit the scope of architecture principles, and are beyond the authority of this document. These include engineering practices, processes, and community engagement which seem relevant in this context. They are included here for consideration, but are not prescriptive.

Product Guidance and Requirements

 * MediaWiki SHOULD provide equitable access to knowledge, for contribution and consumption
 * MediaWiki SHOULD be built in a way that allows different user interfaces to be built for different tasks and audiences.
 * MediaWiki SHOULD be easy to install and upgrade in a development environment.

Processes and Practices

 * See also Technical Collaboration Guidance/Principles
 * Development processes SHOULD ensure good coverage with unit tests that enforce compliance with the documented interface contracts.
 * Unit tests are not a replacement for integration tests. We SHOULD have unit tests as well as integration tests and compliance tests.
 * The creation of user interface tests SHOULD be well integrated into the development process.
 * Documentation on all levels SHOULD explain the rationale behind specific design decisions.
 * Technical debt SHOULD only be incurred consciously. New technical debt should be documented and tracked explicitly, with a clear plan and timeline for reduction or elimination
 * We SHOULD contribute upstream any improvements we make to tools and libraries we use, and be part of the ecosystem of the tools and libraries we use.
 * Our software architecture and development processes SHOULD be geared towards building an ecosystem of 3rd party re-users and all kinds of contributors to the code base (programmers, designers, documenters, translators), volunteers and professionals alike.
 * Document the rationale of engineering decisions, make explicit the trade-off considerations.
 * Features and non-functional requirements are constantly traded off against engineering constraints and the estimated cost of overcoming such constraints. This requires constant iteration of the decision making process between product owners and engineers, and should involve the respective communities when appropriate.
 * Failures on all levels SHOULD be followed by a post-mortem analysis and documentation.
 * Engineering solution SHOULD be "eventually consistent": we should allow for experiments and make breaking changes where needed, but we should aim for architectural coherence and avoid diverging technologies.
 * We SHOULD aim to be inclusive towards technical contributions from people who do not speak English.
 * Communication in technical spaces SHOULD be welcoming and constructive, and MUST follow the Code of Conduct.