MediaWiki Product Insights/Artifacts/Unraveling Complexity: Mapping MediaWiki Software Components into User-Driven Workflows

This document describes the initial attempt to map MediaWiki's system behavior onto user-driven workflows, the challenges that this attempt exposed, and what those challenges reveal about our system and its complexity.

Author: Moriel Schottlender

Introduction[edit]

As part of developing a product strategy for MediaWiki, multiple research questions were raised regarding the shape of Wikimedia's technical systems. One of those questions is how essential workflows used by the Wikimedia projects are currently served through the MediaWiki software ecosystem. As an initial step, product management defined six user workflows that encompass the majority of user interactions. These workflows describe actual use cases of the system, and the requirement was that the system should be mapped according to them.

The mapping exercise is exploratory and ongoing, and is scoped to lead to actionable steps and opportunities for iteration. As part of the process, some challenges were discovered that shed some light on the complexity of the system. This document details the challenges that this exercise exposed, and describes a modeling strategy that can help us discover patterns and boundaries within the system.

The Workflows[edit]

As part of the product exploration, user behavior was divided into six workflows representing the real-life usage of wikis in Wikimedia production systems. The workflows are defined as:

Consume: reading pages, viewing images, reusing content, etc.
Edit: editing articles and data, bulk editing, semi-automated editing, etc.
Upload: uploading media
Patrol: watchlist, admin and oversight features, etc.
Communicate: notification system, talk pages, etc.
Discover: content search, discoverability of existing tools, documentation, etc.

This document describes the initial attempt to map MediaWiki's system behavior onto these workflows, the challenges that this attempt exposed, and what those challenges reveal about our system and its complexity.

Initial Approach: Mapping components to workflows[edit]

Initially, the idea was to map MediaWiki software components into the given user-driven workflows. In order to accomplish this, there was a need to decide what "components" are, since the definition of components within the system is not agreed upon (as this will be further explained later in this document).

Furthermore, the mapping experiment focused on workflows from the user perspective, which meant that "components" were not strictly restricted to the MediaWiki Core repository, but rather covered a more conceptual "core behavior", some of which exists in extensions, default or widely used gadgets.

An experimental compromise was to use the list of components in the Maintainers list on MediaWiki.org. This list isn't perfect, but the general sense is valid for this experiment: It includes a list of pieces of the software that are fairly distinct in functionality and ownership. While the code itself may not be as distinct between the components as we'd like, the conceptual separation is useful for this type of exercise. The list was then copied into a spreadsheet, where each "component" was tagged with the relevant workflow it impacts.

This exercise was beneficial to discover some initial insights about the structure and relationship of code pieces within the system, but it also exposed several inherent challenges in the complexity of the system that proved impassable and required a change in strategy.

Challenges Encountered[edit]

The MediaWiki system is a complex monolithic system that has many interrelated components, most of which are hard to define and difficult to describe individually, which makes it difficult to map those into the workflows. The vast majority of the "components" from the Maintainers list ended up either belonging to all workflows, no workflows, or partial workflows.

This exercise exposed several challenges inherent in the shape and nature of the system.

"Components" are not properly defined in MediaWiki[edit]

MediaWiki lacks a clear definition of components, making it challenging to precisely identify and describe them. Depending on the perspective, components can be defined into low-level classes or higher level behavioral groups of code, but both of these strategies tend to fail when trying to define boundaries for the scope and dependencies of these components.

Even when components can be defined, they are shown to be internally complex, serving multiple workflows either inherently or due to overloaded extended behavior. Distinguishing between these cases became challenging, raising questions about whether components were fundamental to the system or in need of simplification or decoupling.

User-driven workflows are not distinct[edit]

One fundamental challenge surfaced during the initial modeling – the user-driven workflows may not be completely distinct in terms of the system's operation. If the workflow can serve multiple use cases and is not distinct, it's hard to map components onto it.

For example, the action of examining a diff page poses questions about whether it falls under the "Patrol" workflow, the "Edit" workflow during an edit conflict, or the "Consume" workflow for historical research.

Another example is the distinction between "Consume" and "Edit" workflows. There are two main options to relate to those workflows:

"Consume" and "Edit" relate to system processes. "Consume" is the workflow whereby an output is displayed on the screen – any output of HTML/JS for the user – while "Edit" is the internal action of the backend system processing an edit.
"Consume" and "Edit" relate to the intent of the user. That is, "Consume" is any action where the user intends to consume information and "Edit" is any action the user takes with the intent to edit.

The first option is simpler to define in terms of system operation: Any output on the screen is "Consume", which means components like ResourceLoader, OOUI and Codex are, always, components that are used for the "Consume" workflow.

However, that would mean that the loading of VisualEditor on the screen is also a "Consume" workflow, and the "Edit" workflow only triggers when the user clicks the Publish button and includes the system operation that follows.

The second option is more conceptually correct, and follows a user-driven perspective. However, modeling components into workflows with this approach would mean ResourceLoader, OOUI and Codex are used for everything, and there is no meaning in assigning them to a "workflow".

None of the approaches above are wrong, but each of those takes the modeling – and the conclusions that will be made based on them – in a slightly different direction, and requires different considerations for the analysis, depending on the goal and desired use of the output.

Difficulty in Use Case Definition[edit]

MediaWiki and the Wikimedia production environment are complex not only in terms of the system operation, but the way users are utilizing it. This creates a situation where similar use cases have significant divergences in terms of the system and components used, which makes the mapping exercise more complex and delicate, and requires us to document many potential complexities along the way, even in use cases that appear straightforward.

For example, some of the exercises start with a seemingly simple use case: "An unregistered user edits a typo". We had to refine the use case to make sure we could create a straightforward model that represented internal complexities without creating so much divergence that it was impossible to model. For instance:

There is a significant system difference between mobile and desktop.
The upcoming Temporary Accounts for Unregistered Editors project means there may be a significant difference between an "Unregistered" user and a "Temporary" user.
There is significant divergence in system actions between editing a typo on a wikitext article, an on-wiki JSON or JavaScript document, or a widely-used template.

As a result, the refined use case ended up being defined as "An unregistered (not yet temporary) user edits existing wikitext content (fixing typo) on desktop".

While we can go back and model the other variations, the resulting over-specified use case highlighted the intricacies involved in modeling and mapping.

Bottom-Up Mapping Limitations[edit]

The bottom-up approach, mapping from code/components to workflows, proved challenging for several reasons, but the main issue was the conflict between the bottom-up component and the top-down workflows. The workflows were defined from a product and user perspective ("top" view), which is then hard to project onto a code-driven division of components ("bottom" view), especially in a system that is so entangled, and given the challenges specified above regarding the lack of definition, the complexity and the interconnectedness.

It limited the focus to an arbitrary, non-defined space in the component list, hindering the exploration of inherent system behavior since that behavior could include multiple parts and pieces of multiple components. A "bottom up" view resulted in several components not only belonging to multiple workflows, but, potentially, pieces of the components belonging to different workflows or different combinations of components belonging to different workflows.

Balancing Ambiguity in Engineering Context[edit]

Navigating ambiguity in the engineering context proved difficult. While precision is typically favored in engineering tasks, this exploratory exercise required a delicate balance between engineering specificity and exploratory ambiguity. The exercise is meant for a product-driven strategic view, which requires more of a holistic top-down look at the system. Modeling this view is often difficult to manage with an engineering perspective that deals with the "bottom up" parts of the system that have very specific targeted objects.

A New, Holistic Approach[edit]

It became clear that if the goal of this exercise is to identify pieces of the system that are more inherent and "core" for the use of the system (based on the workflows) then we need to take a more holistic approach to the modeling. Regardless of how the system is physically built with the code, classes, and repositories, the behavior of the system can better represent conceptual components that can then be identified as inherent to the system, or to certain workflows, and assist in the description of the way the system works.

Leveraging Insights from Bottom-Up Mapping[edit]

While the bottom-up mapping exercise did not yield the desired solution, it provided valuable insights into the complexity and entanglement of the system. It highlighted the challenge of bottom-level components serving multiple workflows, prompting questions about their nature and potential simplification.

The process of sorting components by workflow is valuable on its own as a way to expose the challenges and questions about the shape of the system, the dependencies of code pieces, and the possible opportunities for technical improvements and further explorations.

The exercise could be opened to larger groups of interested participants as a way to expose more perspectives and discuss the questions raised.

Strategic Top-Down Mapping[edit]

To overcome the challenges, we shifted the approach to a strategic top-down mapping.

Since the goal is to describe the system based on the system behavior, we could focus on the behavior of the system as high-level conceptual components that describe the behavior and actions of the system, regardless of the structure of the low-level code. This will allow us to describe and model system behavior while exposing where similar behavior "clusters" are shared between our different products. These clusters of similar behavior will not necessarily dictate the new code architecture directly. Instead, they will provide a method of discovering where we have actionable opportunities to improve the system, and where we need to investigate further.

With this strategy, conceptual components represent pieces of logical action in the system regardless of where the code that operates them lives, or whether the action is backed by multiple code pieces or repositories. Notes regarding complexity are added to clarify potential pitfalls, but the system behavior is represented at a higher level where the behavior is more important than the specific implementation.

This is done by first describing representative, valuable use-cases per workflow, and then modeling the behavior of the system as it exists within our systems.

This shift in perspective allows for a more holistic understanding of how components contribute to user-driven workflows.

Multi-Wiki Mapping Exercise[edit]

Mapping the entire set of production use cases is a huge task, so the exercise needs to be scoped, timeboxed, and have realistic expectations for its outcomes and its limitations. We will start by focusing on three wikis in production: English Wikipedia, German Wikipedia, and Wikidata. This will enable us to produce an initial comparative dimension to the behavior of the system, where it may be shared more deeply and where it diverges.

While this does not represent all Wikimedia use cases, it is a good starting point that will enable us to expose opportunities for further iterations while limiting the scope to achieve actionable goals. We are considering adding more production use cases (like Commons) to expose more boundaries within core behaviors, while balancing scope and complexity. The focus on specific Wikipedias allows us to demonstrate specific differences as felt by users, such as the effects of extensions, configuration, or widely used gadgets.

The goal is to identify where conceptual behaviors diverge and converge between the wikis, pinpointing base and common behaviors versus those unique to specific cases.

Conclusion[edit]

Mapping MediaWiki software components into user-driven workflows is a complex endeavor that demands a thoughtful balance between precision and exploration.

The challenges encountered during the initial iteration underscore the need for a strategic top-down approach, leveraging use cases and conceptual components. The proposed solution, coupled with a multi-wiki perspective, aims to expose complexities in the system, show us where to focus next, and pave the way for a more comprehensive understanding and potential architectural enhancements.

This analysis contributes to the ongoing efforts to optimize the functionality and coherence of the MediaWiki product definition and architecture. As we continue the mapping exercise, we will be publishing more artifacts about the process and output. For more information, follow the MediaWiki Product Insights Reports.