Wikimedia Engineering Architecture Principles

The following architecture principles and requirements guide all Wikimedia engineering endeavors. They are derived from the Wikimedia movement's strategic direction and the Wikimedia Foundation's product strategy as well as established best practices of the software industry. They are informed by past experience as well as present needs and constraints, and are expected to evolve when these needs and constraints change.

The architecture principles are defined by the Wikimedia Technical Committee (TechCom). Any substantial change to the normative part of this document (labeled "Principles") must be discussed and approved using the TechCom's RFC process.

Application
The below principles are intended to guide engineering decisions on all levels, from detailed core review to high level RFCs. People with merge rights on software in production on WMF servers, as well as people responsible for technical decision making and planning for such systems, are expected to know and apply these principles. Any new component we design should be checked against these principles. It is however acknowledged that not all existing code is compliant with all the principles. Teams working on such code should make an effort to improve compliance as far as reasonably possible. Prototype projects do not have to comply to with all principles immediately, but have to be made compliant before they are made available to the public, including as "beta" or "pilot" functionality.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Any proposed change to the code or design that violates a MUST principle is to be rejected. Any violation of a SHOULD principle requires documentation of the rationale for violating the principle. When existing code is discovered to violate a MUST or SHOULD principle, steps for making the code compliant with the architecture principles need to be planned.

Principles
 The architecture principles are derived from strategic and operational goals of the organization, as understood and interpreted by TechCom. This document refers to these goals to document the motivation for the architecture principles.

Product strategy documents, for reference:


 * Movement Strategy outcomes and report (Phase One, September​​ 2017)
 * Wikimedia Foundation Goals and Priorities (Medium-term plan 2019)
 * Product Directions (Wikimedia Technical Conference 2018, Toby Negrin)

Points of View Essays:

Trust:


 * Reliability
 * Transparency
 * Accountability

Experience:


 * Form Factor
 * Rich Content
 * Contributors
 * Customization
 * Discovery

Scale:


 * Community
 * Content
 * Resilience
 * Ubiquity

Augmentation:


 * Content Curation
 * Content Generation
 * Governance
 * Machine Translation

Culture:


 * Inclusion
 * Language
 * Content Gaps

Tools:

To allow users to consume, create, and interact in a form suitable for their devices, with the connectivity they have, in a language they speak,
 * Tools For Developers
 * Tools For Organizers
 * Tools For Moderators


 * EQUITY/DEVICE: software that interacts with users MUST be designed to make key functionality available on devices with a variety of capability and restrictions, with tradeoffs explicitly considered for mobile form factors and connectivity.
 * EQUITY/LANGUAGE: software that interacts with users MUST support internationalization. Internationalization mechanisms SHOULD be consistent across platforms.
 * EQUITY/ACCESS: software that interacts with users MUST consider accessibility concerns and SHOULD follow accessibility guidelines.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and reflected in the Wikimedia Foundation's Goals and Priorities: "Grow participation globally, focusing on emerging markets". They address issues discussed throughout the Point of View essays, especially the ones on Inclusion, Content, Ubiquity, and Form Factor, as well as the one on Tools for Moderators.

To empower contributors to collaboratively grow and curate content, and to build the tools that they need to do so,


 * EMPOWER/TWEAK: our software SHOULD provide APIs, libraries, and services that empower the community with ways to develop workflows using programming languages, and ensure safety and maintainability of custom programs.
 * EMPOWER/COMBINE: software that manages or interact with user editable content SHOULD support close integration of different media and data types from local and remote sources.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and reflected in the Wikimedia Foundation's Goals and Priorities: "Modernize our product experience", particularly by "integrating content from Commons, Wikidata, Wikisource and other projects into Wikipedia". They address issues discussed in the Reliability, Resilience, Inclusion, and Contributors essays. The need to empower the community to contribute code and build their own tools is stressed especially in the Tools for Developers chapter, while the Content Curation chapter calls for communities to contribute to training AI systems.

To provide public APIs that allow efficient interaction with wiki content, as well as provide data that can be easily processed and reused in bulk,


 * API/DOMAIN: for all interactions defined on the abstract domain model, stable public APIs MUST exist. APIs geared towards a specific user interface MUST be considered part of the component that implements that user interface, and MAY be considered private to that component.
 * API/STRUCTURED: our software MUST be designed in a way that makes all content and public meta-data available as structured data or media with semantic markup (e.g. as JSON or as annotated HTML).
 * API/SCHEMA: data we offer for re-use MUST use clearly specified data schemas and SHOULD be based on widely used open standards.
 * API/GRANULAR: public APIs SHOULD allow access to content with high granularity.
 * API/NOWIKI: APIs MUST be designed to avoid any need for clients to process wikitext.
 * API/SOURCE: data formats and APIs that provide access to user generated content SHOULD be designed to ensure verifiability through the integration of provenance information.
 * API/LICENSE: data formats and APIs that provide access to user generated content MUST be designed to provide easy access to all relevant information about authorship and licensing.

Motivation: These principles aim at the Knowledge as a Service goal set by the Movement Strategy Outcomes and the Wikimedia Foundation's Goals and Priorities: "enable people, institutions, sites and machines to create, share, and access the sum of all knowledge, on and off the Wikimedia sites". They address issues around the creation of specialized user experiences mentioned throughout the Point of View essays, especially in the Form Factor chapter of the Experience topic. They also touch upon issues related to verifiability and licensing, as discussed in the Reliability essay, and the need for distributing our content though APIs and as dumps, as discussed in the Resilience essay.

To provide an open-source software stack that can be easily used, modified, and extended by others,


 * OPEN/FLOSS: software we write MUST be published for under a free license for the ability of third parties to audit our code and to create forks of the code base.
 * OPEN/REUSE: software components that implement broadly applicable functionality SHOULD be designed for reusability and be published for re-use.
 * OPEN/RELEASE: software that exposes public interfaces (as libraries, frameworks, hooks, or APIs do) SHOULD be subject to release management with clear and consistent versioning. Any breaking changes to such interfaces then MUST be announced in a timely and predictable manner over relevant channels.
 * OPEN/STABLE: any elements scheduled for removal from a stable public interface MUST be documented to be deprecated beforehand, and SHOULD be kept for backwards compatibility for a reasonable time.
 * OPEN/MODULAR: our software architecture SHOULD be modular, with components exposing narrow interfaces, to allow components to be replaced and refactored while maintaining a stable interface towards other components as well as 3rd party extensions.
 * OPEN/GIANTS: our software SHOULD be built on top of reliable, documented, well-maintained open-source components.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and the Product Directions in a broad sense, enabling third party developers to re-use our software components for their own purpose. The need for a modular architecture is also evident in the call for specialized user experiences throughout the Point of View essays, such as the one on Rich Content.

To maintain a code base that can be modified with confidence and readily understood,


 * SOLID/MODEL: our software architecture SHOULD follow explicitly specified domain models that define relevant entities and actions (nouns and verbs), to facilitate clear communication between software components as well as among people.
 * SOLID/TEST: all code SHOULD be designed for testability.
 * SOLID/COVER: the software we develop MUST provide a test suite, which SHOULD be sufficiently detailed, comprehensive, and noticeable to provide confidence for developers to see that their changes did not break anything.
 * SOLID/DOCS: comprehensive documentation MUST be maintained along with the code.

Motivation: These principles aim at the goal of being Extensible and Sustainable and the need for fast and confident deployment defined in the Product Directions and reflected in the Wikimedia Foundation's Goals and Priorities, which calls for a "fully automated and continuous code health and deployment infrastructure", requires tooling to be "easy to use, well-documented, and accessible", and states that "intentional focus on code quality and testing will allow for more innovative and faster experimentation". More generally, these principles reflect best practices of software engineering processes.

To provide a web application that can be freely used to collaboratively collect and share knowledge,


 * RUN/EVERYWHERE: the basic MediaWiki stack MUST be easy to deploy on standard hosting platforms.
 * RUN/CHEAP: small MediaWiki instances MUST function in low-budget hosting environments.
 * RUN/MORE: for every component and feature, the intended target audience and supported target platform MUST be clearly defined.
 * RUN/EASY: it SHOULD be possible to install and upgrade MediaWiki without much technical knowledge.
 * RUN/UPDATE: it MUST be possible to upgrade MediaWiki without the risk of losing content or disrupting operation.

Motivation: These principles aim at the equity goal set by the Movement Strategy Outcomes and the Product Directions, working to provide others with the tools to collect, curate, and share knowledge. More generally, they aim at the values of the Free Software Movement, to allow others to run, study, modify, and share our software.

To ensure availability and performance of WMF projects through scalable and resilient system design,


 * FAST/SCALE: horizontal scalability SHOULD be a design goal on all levels of the architecture, especially for storage and query mechanisms.
 * FAST/MEASURE: observability and analytic instrumentation SHOULD be explicitly considered in in the design of new components and services.
 * FAST/LOAD: our software and infrastructure SHOULD be designed to be resilient against spikes in demand and failure of backend systems.
 * FAST/ROUTE: services and APIs SHOULD be designed to allow the identification of read-only and read-write HTTP requests to optimize routing and caching.
 * FAST/GEO: storage and caching systems SHOULD be designed with a distributed multi-datacenter architecture in mind.
 * FAST/CHANGE: system design and technology choice SHOULD aim to reduce operational overhead and the cost of change management.

Motivation: These principles aim to provide "infrastructure to serve [content] with high performance, high redundancy, and low latency to all parts of the world" as requested by the Wikimedia Foundation's Goals and Priorities, by addressing issues discussed in detail in the essays under the Scale topic.

To ensure the data integrity of the content on WMF systems, and protect the privacy of our users,


 * SECURE/AUSTERE: our software systems MUST be designed to only collect data needed for a specific purpose, and retain it only as long as necessary.
 * SECURE/PRIVATE: our software and infrastructure MUST be designed in such a way to prevent unauthorized access to sensitive information.
 * SECURE/ISOLATED: our system architecture SHOULD isolate components to reduce attack surface, and to minimize the the impact of individual components being compromised.
 * SECURE/INTEGRITY: resilience against data corruption MUST be a design goal for our system architecture, and be built into the software we write.
 * SECURE/ACT: tools and processes SHOULD be designed to allow us to be responsive as well as proactive in ensuring security.
 * SECURE/PATCH: our deployment infrastructure and dependency management SHOULD make it easy to keep system components up to date.
 * SECURE/CONFIG: our deployment infrastructure SHOULD make it easy to change configuration settings without disruption.

Motivation: These principles aim to guarantee good platform uptime as required by the Product Directions, by addressing issues discussed in the essay about Scale. They aim to provide software "that ships with testing, analytics, monitoring, security and privacy built in" as specified in the Wikimedia Foundations Goals and Priorities, and reflect the best practice of system administration.

Additional Notes and Considerations (not normative)
This section contains some notes and considerations that do not fit the scope of architecture principles, and are beyond the authority of this document. These include engineering practices, processes, and community engagement which seem relevant in this context. They are included here for consideration, but are not prescriptive.

Product Guidance and Requirements

 * MediaWiki SHOULD provide equitable access to knowledge, for contribution and consumption
 * MediaWiki SHOULD be built in a way that allows different user interfaces to be built for different tasks and audiences.
 * MediaWiki SHOULD be easy to install and upgrade in a development environment.

Processes and Practices

 * See also Technical Collaboration Guidance/Principles
 * See also the Architecture guidelines and Manual:Coding conventions/PHP for best practices when making changes to MediaWiki
 * Development processes SHOULD ensure good coverage with unit tests that enforce compliance with the documented interface contracts.
 * Unit tests are not a replacement for integration tests. We SHOULD have unit tests as well as integration tests and compliance tests.
 * The creation of user interface tests SHOULD be well integrated into the development process.
 * Documentation on all levels SHOULD explain the rationale behind specific design decisions.
 * Technical debt SHOULD only be incurred consciously. New technical debt should be documented and tracked explicitly, with a clear plan and timeline for reduction or elimination
 * We SHOULD contribute upstream any improvements we make to tools and libraries we use, and be part of the ecosystem of the tools and libraries we use.
 * Our software architecture and development processes SHOULD be geared towards building an ecosystem of 3rd party re-users and all kinds of contributors to the code base (programmers, designers, documenters, translators), volunteers and professionals alike.
 * Document the rationale of engineering decisions, make explicit the trade-off considerations.
 * Features and non-functional requirements are constantly traded off against engineering constraints and the estimated cost of overcoming such constraints. This requires constant iteration of the decision making process between product owners and engineers, and should involve the respective communities when appropriate.
 * Failures on all levels SHOULD be followed by a post-mortem analysis and documentation.
 * Engineering solution SHOULD be "eventually consistent": we should allow for experiments and make breaking changes where needed, but we should aim for architectural coherence and avoid diverging technologies.
 * We SHOULD aim to be inclusive towards technical contributions from people who do not speak English.
 * Communication in technical spaces SHOULD be welcoming and constructive, and MUST follow the Code of Conduct.