Jump to content

MediaWiki Product Insights/Artifacts/Platform Evolution: Holistic mapping of MediaWiki software

From mediawiki.org

This document presents the process of the holistic technical mapping project conducted on the MediaWiki software. This document goes over the process of the mapping exercise, some of the insights that were gained by it, and how it informs the future of these explorations into evolving MediaWiki’s architecture.

Author: Moriel Schottlender

The purpose

[edit]

As the Product strategy of MediaWiki was being formulated, we needed to explore the current shape of the system for two main reasons: The first was to get a picture of what the software is doing holistically, and how the technical side connects to the way users interact with our products. The second was to try to delve into the architecture so we can identify opportunities to evolve the system towards the goals that Product Strategy has outlined as MediaWiki’s product requirements and direction for the future.

For more understanding of the process, please refer to the previous artifact: "Unraveling Complexity: Mapping MediaWiki Software Components into User-Driven Workflows."

Change of approach

[edit]

The previous document outlined the first attempt to map the system by trying to connect the underlying code pieces (and trying to define technical components in the system) into user-driven workflows. That exercise proved useful in showing the entanglement between our components and an overview of some of the challenges in the system, but it also proved lacking in providing higher level architecture insights regarding how to evolve the system, or how to properly inform product strategy.

This caused us to change the approach of the modeling: instead of mapping underlying code (which is often entangled and has fuzzy component boundaries), we changed course to map the overarching holistic behavior of the system. The same user workflows were used, but this time, we chose representative use-cases per workflow, and modeled the technical behavior.

The mapping process

[edit]

The mapping process itself involved a holistic “higher level” exploration of the system's behavior, using user workflows from a product perspective, with the goal of representing the differences in behavior. We wanted to see where (and how) behavior of a use case is the same for our wikis, where it splits apart between the different wikis, and whether those divergent actions can give us some insight into the architecture, purpose and structure of the platform.

We chose three wikis for the mapping exercise: English Wikipedia, German Wikipedia, and Wikidata. This was meant to capture variations in user workflows on our deployed systems, and show where the system itself diverges internally to accommodate those behaviors.

Because this process concentrated on “high level” external behavior, there was a risk of glossing over some valid complexities. When that happened, we marked areas of complexity for further exploration. By selecting a representative set of use cases per workflow, we enabled the discovery of complexities not only between different types of wikis but also within individual wikis and user configurations.

The sessions

[edit]

The mapping sessions were conducted collaboratively, with 2-3 engineers participating in each session. Participants included individuals with varying levels of experience and expertise, comprising both subject-matter experts and non-experts.

Using groups with varying experiences and contexts encouraged meaningful discussions and enabled participants to question system behavior, uncover complexities, and explore areas that may have been overlooked. It also helped us navigate the boundaries between “too broad” and “too specific”, which were among the concerns when the mapping was designed.

Observations and Initial Results

[edit]

The mapping process produced a number of meaningful insights, most of which emerged through the discussions and questions that came up either within the mapping session or afterwards in the examination of the model.

Many of the questions raised led to discussions regarding whether some code distribution between components was still necessary or can be merged, and whether some of the system behavior shows an alternative way to define components – both in terms of code organization and area of maintenance and ownership.

One notable success of the mapping exercise was the establishment of a method for modeling workflows between different wikis, irrespective of their combination and complexity. We can use this methodology in the future beyond this exercise to uncover and clarify complexities in the system and how they relate to supporting user workflows.

Complexities and Divergences

[edit]

Another interesting insight was that the complexities and divergences are not solely confined to differences between types of wikis (e.g., Wikidata vs. Wikipedia) but also exist within individual wikis and user configurations. This is not a new insight, but the model itself shed light on some of the divergences in a way that raised questions about the differences between behavior and code distribution, ownership, maintainability, configuration, and added complexity.

Conceptual actions often involve multiple pieces of code, raising questions about historical reasons and the potential need to reassess the placement and fragmentation of code.

Purpose and benefits

[edit]

The main benefit of this exercise was in the conversations and insights that were gained into how the system operates, the places where behavior diverges, and how code fragmentation impacts elements like ownership, maintainability, and component boundaries.

This exercise informed both the product strategy work and some opportunities for evolving the system, but it is by far not complete. This type of exercise can be used to continuously understand the system and identify leverage points for improvements. These leverage points represent opportunities where relatively small changes can bring impactful evolution of the software.

As an example, we can look at some insights from mapping the use case “Unregistered (not yet temporary) user editing existing wikitext content (fixing typo) on desktop”.

Insights

[edit]

It’s worth noting that the first valuable insight was in the eventual framing of the use case itself. This was discussed in the previous document; the simple use case (“unregistered user editing content”) had to be made progressively more specific, because of the divergences in behavior that came up. An unregistered user that edits a mass-used template, for example, would result in several other system actions. For simplicity, and to avoid decision paralysis (and, frankly, to be able to represent the behavior in a legible way) we had to make the use case specific.

While this may be obvious to some, it does point to an interesting impact of how our users’ actions are represented in the system, and the amount of diverging options that can happen as a result.

You can see the full modeling of this use case here.

Another valuable insight happened when we looked at the action “Decide which editor to display”. In general, when a user clicks the “Edit” button, the system performs internal checks that, among other things, include whether the user is allowed to edit the page, and which editor to display.

This behavior showed an interesting divergence when we looked at MediaWiki generally, MediaWiki’s operation as it accounts for Wikipedias, and MediaWiki’s instance of Wikidata.

A snippet from the holistic use case mapping of MediaWiki software for the use case of "Unregistered (not yet temporary) user editing existing wikitext content (fixing typo) on desktop"

This behavior is shared for all MediaWiki instances, but the code making the decision is split in both timing and operation. On Wikipedias, the decision accounts for user preferences, wiki preferences, and the type of content. On Wikidata, the decision is based primarily on the data type of the content. On Wikipedia, the operation is primarily done after the user clicks the “Edit” button. On Wikidata, there are split considerations that are done some on page load, some on click of “Edit”. Extensions can also play a part in the decision here, but there are multiple points where decisions can be made by the system and then changed based on external logic.

This shouldn’t be entirely surprising. Wikidata’s editable items are much simpler and more straightforward, and the decision of which editor to display is available “cheaply” on load. Wikipedia’s operation, however, requires us to perform some more “expensive” operations to decide which editor to display, which involve checking information about user permissions, blocking state, and other preferences that make sense to check only after the user indicates they’re intending to edit, rather than on every page load.

This insight, then, isn’t about whether this behavior is “correct” or not. This insight exposes another aspect of the system: component boundaries. The code itself is split up and fragmented between multiple locations; some of it makes base decisions and then allows for changes (through hooks), and some overrides other decisions in separate locations. Some fragmentation is understandable, as it can change based on different extensions (and the logic “lives” in the extension itself), but the way the logic interacts with core decisions is not entirely consistent.

This raises several interesting questions:

  • Is there a reason for the “early vs late” loads? While there’s a performance reason for Wikipedias’ behavior, delaying Wikidata’s decision to the same point may give us the ability to consolidate some of the operation.
  • Is this behavior shared broadly with all wikis? The answer here is yes, but the logic of the operation is fragmented quite a bit.
  • Is there an opportunity here for a component boundary? Abstracting some of the ground logic work into the platform, and allowing a consistent way for external code to provide the logic consideration may show a leverage point for improvement.
  • What should be the underlying platform behavior, vs the feature? The answer to this is not necessarily straightforward, but the act of asking it as we go proves meaningful. The idea of MediaWiki as the underlying platform that provides the means for other wikis and systems to perform their actions means we need to be more thoughtful about what the platform itself does (and does not do), and how to provide a proper internal API (be it hooks, providers, or anything else) so that features are easy to understand and implement over a robust platform that allows for extendable behavior.

There’s benefit in thinking about this, since at the moment, the fragmentation of the code for the same behavior means fragmentation in ownership and maintainability. The pieces of logic are spread around unexpected places; fixing this will improve the structure, make it easier to identify bugs and follow stack traces, and to decide structures of ownership.

Other insights showed us the need to delve deeper into some of the operations, or exposed the level of fragmentation of both behavior and code in elements that provide the same general behavior.

These may not yet show us absolute paths of action to change, but they definitely expose questions we should continue to examine to get more leverage points into making the MediaWiki platform more modular, maintainable, and to empower easier and more streamlined feature development.

The value of mapping: inside operation for outside workflows

[edit]

This exercise emphasized the importance of considering both "inside" and "outside" perspectives of the system. The "outside" perspective represents user expectations and product requirements, while the "inside" perspective delves into the architectural workings of the system.

When we consider the evolution of the architecture, we can continue utilizing both perspectives to continue to help us understand the system, its needs, and the capabilities required to support the product vision. This type of exercise is never “done”; there is a continuing need to explore opportunities to evolve the system, particularly when dealing with a system that demands modern features and operations.

The purpose of the evolution of the software is to enable the long-term needs of users and the product vision. For this to happen, the technical and product strategies need to continue informing one another, while considering its essential operations, and ensuring the system is flexible and adaptable for new products and features.

The future

[edit]

This exercise showed us that there are leverage points for evolving the system. The next step is to experiment with ways to move towards a modular and more maintainable core platform.

In the next year, we will be looking into leverage points in the way that external code interacts with internal MediaWiki elements. This includes the hooks system and the extension registry, as well as potentially some of the concepts we’ve been adding into the system in the past years like Providers, etc.

The goal is to get a picture of how the current system works and discover opportunities for introducing modularization patterns. We’ll look into how hooks and extension registry parameters are used by extensions and map some of the challenges that we’re encountering with the current structure. With this, we hope to find opportunities to evolve the architecture to enable the future needs that are described in the product vision.