Wikimedia Engineering Architecture Principles

The Wikimedia Technical Committee believes the following principles and requirements SHOULD guide the Platform Evolution Program and other Wikimedia engineering endeavors.

These principles and requirements are derived from strategic and operational goals of the organization, as understood and interpreted by TechCom. This document is not intended to define these goals, it merely re-states them to provide a rationale for the engineering principles we set forth. That is to say, each goal is the input from which the principles listed below it are derived.

Product Strategy, for reference:


 * Movement Strategy outcomes and report (Phase One, September​​ 2017)
 * Product Directions (Wikimedia Technical Conference 2018, Toby Negrin)

Points of View Essays:

Trust:


 * Reliability
 * Transparency
 * Accountability

Experience:


 * Form Factor
 * Rich Content
 * Contributors
 * Customization
 * Discovery

Scale:


 * Community
 * Content
 * Resilience
 * Ubiquity

Augmentation:


 * Content Curation
 * Content Generation
 * Governance
 * Machine Translation

Culture:


 * Inclusion
 * Language
 * Content Gaps

Tools:


 * Tools For Developers
 * Tools For Organizers
 * Tools For Moderators

To allow users to consume, create, and interact in a form suitable for their devices, with the connectivity they have, in a language they speak:

 * Software that interacts with users SHOULD be designed to make key functionality available on devices with a variety of capability and restrictions, as well as potentially limited connectivity.
 * Software that interacts with users MUST support internationalization and SHOULD follow accessibility guidelines. Internationalization mechanisms SHOULD be consistent across platforms.

[Equity / Readers, Editors]

To empower contributors to collaboratively grow and curate content, and to build the tools that they need to do so:

 * Our software SHOULD provide APIs and libraries that empower the community with ways to develop workflows using scripting languages, and ensure safety and maintainability of custom scripts.
 * Software that manages or interact with user editable content SHOULD support close integration of various kinds of content.

[Equity / Editors, Power-Users, Admins]

To provide public APIs that allow efficient interaction with wiki content, as well as provide data that can be easily processed and reused in bulk:

 * For all interactions defined on the abstract domain model, stable public APIs MUST exist. APIs geared towards a specific user interface MUST be considered part of the component that implements that user interface, and MAY be considered private to that component.
 * Our software SHOULD be designed in a way that makes all content and public meta-data available as structured data or media with semantic markup (annotated HTML).
 * Data we offer for re-use MUST use clearly specified data schemas and SHOULD based on widely used open standards.
 * Public APIs SHOULD allow access to content with high granularity.
 * APIs SHOULD be designed to avoid any need for clients to process wikitext.
 * Data formats and APIs that provide access to user generated content SHOULD be designed to ensure verifiability through the integration of provenance information.
 * Data formats and APIs that provide access to user generated content SHOULD be designed to provide easy access to all necessary licensing information.

[KAAS, Sustainability / 3rd party developers]

To provide an open-source software stack that can be easily used, modified, and extended by others:

 * Software components SHOULD be designed to be reusable, and be published for re-use.
 * Software that exposes public interfaces (as libraries, frameworks, hooks, or APIs do) SHOULD be subject to release management with clear and consistent versioning. Any breaking changes to such interfaces then MUST be announced in a timely and predictable manner over relevant channels.
 * Any elements scheduled for removal from a stable public interface MUST be documented to be deprecated beforehand, and SHOULD be kept for for backwards compatibility for a reasonable time.
 * Our software architecture SHOULD be modular, with components exposing narrow interfaces, to allow components to be replaced and refactored while maintaining a stable interface towards other components as well as 3rd party extensions.
 * Our software SHOULD be built on top of reliable, documented, well-maintained open-source components.

[FLOSS, Equity / professional 3rd party developers and volunteers]

To maintain a code base that can be modified with confidence and readily understood:

 * Our software architecture SHOULD follow explicitly specified domain models that define relevant entities and actions (nouns and verbs), to facilitate clear communication between software components as well as among people.
 * All code SHOULD be designed for testability.
 * The software we develop MUST provide a test suite, which SHOULD be sufficiently detailed, comprehensive, and noticeable to provide confidence for developers to see that their changes did not break anything.
 * Comprehensive documentation SHOULD be maintained along with the code.

[Sustainability, FLOSS / developers]

To provide a web application that can be freely used to collaboratively collect and share knowledge:

 * The MediaWiki stack SHOULD be easy to deploy on standard hosting platforms.
 * Small MediaWiki instances SHOULD function in low-budget hosting environments.
 * MediaWiki SHOULD be easy to install and upgrade without much technical knowledge, and without the risk of losing content or disrupting operation.

[FLOSS, Equity / wiki-owners]

To ensure availability and performance of WMF projects through scalable and resilient system design:

 * Horizontal scalability SHOULD be a design goal on all levels of the architecture, especially for storage and query mechanisms.
 * Our software and infrastructure SHOULD be designed to be resilient against spikes in demand and failure of backend systems.
 * Services and APIs SHOULD be designed to allow the identification of read-only and read-write HTTP requests to optimize routing and caching.
 * Storage and caching systems SHOULD be designed with a distributed multi-datacenter architecture in mind.
 * System design and technology choice SHOULD aim to reduce operational overhead and the cost of change management.

[Scalability, Resilience, KAAS / Users, Opsen]

To ensure the data integrity of the content on WMF systems, and protect the privacy of our users:

 * Our software systems SHOULD be designed to only collect data we need, and retain it only as long as necessary.
 * Our software and infrastructure SHOULD be designed in such a way to prevent unauthorized access to sensitive information, and to minimize the the impact of individual components getting compromised.
 * Our system architecture SHOULD isolate components to reduce attack surface while minimizing system complexity.
 * Resilience against data corruption SHOULD be a design goal for our system architecture, and be built into the software we write.
 * Tools and processes SHOULD be designed to allow us to be responsive as well as proactive in ensuring security.
 * Our deployment infrastructure and dependency management SHOULD make it easy to keep system components up to date.
 * Our deployment infrastructure SHOULD make it easy to change configuration settings without disruption.

[Security, Privacy / Users]

Additional Notes and Considerations
This section contains some notes and considerations that do not fit the definition of architecture principles, but seem relevant in this context non the less. These include engineering practices, processes, and community engagement.

Product Guidance and Requirements

 * MediaWiki should provide equitable access to knowledge, for contribution and consumption
 * MediaWiki should be built in a way that allows different user interfaces to be build for different tasks and audiences.
 * MediaWiki SHOULD be easy to install and upgrade in a development environment.

Processes and Practices

 * See also Technical_Collaboration_Guidance/Principles
 * Development processes SHOULD ensure good coverage with unit tests that enforce compliance with the documented interface contracts.
 * Unit tests are not a replacement for integration tests. We should provide both.
 * The creation of user interface tests should be well integrated into the development process.
 * Documentation on all levels SHOULD explain the rationale behind specific design decisions.
 * Technical debt SHOULD only be incurred consciously. New technical debt should be documented and tracked explicitly, with a clear plan and timeline for reduction or elimination
 * We SHOULD contribute upstream to tools and libraries we use, and be part of the ecosystem of the tools and libraries we use.
 * Our software architecture and development processes SHOULD be geared towards building an ecosystem of 3rd party re-users and all kinds of contributors to the code base (programmers, designers, documenters, translators), volunteers and professionals alike.
 * Document the rationale of engineering decisions, make explicit the trade-off considerations.
 * Features and non-functional requirements are constantly traded off against engineering constraints and the estimated cost of overcoming such constraints. This requires constant iteration of the decision making process between product owners and engineers, and should involve the respective communities when appropriate.
 * Failures on all levels should be followed by a post-mortem analysis and documentation.
 * Engineering solution should be "eventually consistent": we should allow for experiments and make breaking changes where needed, but we should aim for architectural coherence and avoid diverging technologies.
 * We should aim to be inclusive towards technical contributions from people who do not speak English.
 * Communication in technical spaces should be welcoming, constructive, and follow the Code of Conduct.