Platform Evolution/Recommendations

From mediawiki.org
Jump to navigation Jump to search

Purpose, background and scope[edit]

Purpose[edit]

The document aims to provide a technical direction for our platform in order to drive our products and technology forward to meet the goals of our Medium Term Plan.

It contains the findings of a collaborative review and analysis of the work and research performed under the Platform Evolution Program over the past 2 years. Its development was sponsored by the Core Platform Team and was created in collaboration with a consultant, Mentrix, which began in Spring 2019 and concluded in June 2019.

The key result of this analysis is the identification of several key opportunities that provide value to the WMF in supporting our goals, while at the same time requiring critical technical work and decisions that will enable us to modernize our platform.

As a next step, the opportunities in this document should be used as the basis for establishing a Frontend Architecture Working Group to be sponsored by Technology and Product department leadership. This working group will evaluate the five specific challenges laid out below in order to generate solutions and well scoped projects to be developed by cross functional teams selected from both Product and Technology staff.

Background[edit]

Over the past two years, staff at the Wikimedia Foundation have been engaged in efforts to modernize our platform in order to achieve our Movement Strategy as well as address long standing issues facing our engineers. These efforts began with staff at the WMF and at WMDE discussing priorities in the Audiences Technology Working Group (ATWG) which formed in the Winter of 2017. These discussions led to the formation of both the Platform Evolution Program (PE) and the Core Platform Team (CPT). Starting in Summer 2018, members CPT engaged in consultations with staff, volunteers and community members to define our technical direction, culminating with the first Wikimedia Technical Conference in Fall 2018. From there the CPT was able to turn the work and feedback generated from all these stakeholders into goals for the PE Program and input for the WMF Medium Term Plan and the Annual Plan for FY1920.

While it has been a long road, the result has been the comprehensive documentation of our technical goals and priorities based on collaborative process involving a cross-section of stakeholders both within and outside the WMF. The reason we have put so much effort into creating these goals is so that we can now prioritize projects and make technical decisions based on this shared understanding of our technical priorities.

The CPT has already begun the first phase of work based on these priorities, including building a REST API in MediaWiki, Splitting RESTBase into separate components and beginning to librarize MediaWiki into components. Other teams at the WMF have done the same… The Product Department is leading the effort of unifying our parsers in order to simplify our technology stack.

This document signifies the beginning of the next phase of work stemming from the PE program and our shared priorities - work that truly drives our technology stack forward to meet our Medium Term Plan goals and in order to achieve our organizational mission.

Scope[edit]

The scope of this document focuses on changes geared towards our user interface (“frontend”), the technical discussion and transformations that the recommendations will require us to push forward our capabilities in both structured data and modularity. These two critical capabilities enable better knowledge equity, mobile support, modern user experiences and improvements to multimedia. This focus was chosen specifically because it provides visible and immediate user value through modernizing our user experience (one of our Medium-Term Goals). However, structured data and modularity also support our initiatives around machine learning, meaning that we further our medium-term goals in this critical area as well simply by making the architecture changes to improve our frontend.

This document is focused on providing challenges to be solved that ultimately improve our architecture in order to support our long term goals. However, it does not:

  • Provide solutions to those challenges
  • Discuss “ilities” associated with any solutions, such as scalability, security, maintainability, etc…
  • Address constraints such as resourcing, time or budget.
  • Community relations

While these considerations are important, they are not discussed in this document because they fall within the scope of the proposed working group which will generate solutions and must consider these and other potential impacts on our infrastructure, organization, community and users.

Priorities and methods[edit]

Priorities[edit]

The Wikimedia Foundation's medium-term plan includes interdependent priorities whose success depends on modernizing the current system(s) architecture. Priorities include:

  • Delightful UI experiences across all devices
  • Integration of content from multiple projects (and other initiatives) to deliver new, dynamic knowledge experiences
  • Integration and discoverability of rich content including video, audio, and interactive media, as well as the infrastructure to serve it with high performance, high redundancy, and low latency to all parts of the world

Activities[edit]

The goal of this elaboration was to understand the current architectural impediments and collectively move towards those priorities. This document highlights architectural patterns that represent core challenges and opportunities. Activities we engaged in included:

  1. Interviewing the internal engineering team
  2. Reviewing previous documentation and discussions
  3. Modeling of current business process flows
  4. Discussing, debating and describing the findings
  5. Recommending opportunities for growth

Questions[edit]

During the exploration, we had questions in mind like:

  • How does content flow through the system now and where is it entangled? Where are the sticking points when considering system-level change?
  • How do we want to build user interfaces and how is that different from what we are doing now?
  • How do we make it easier for engineers to scale, maintain and test Wikimedia software projects and products?
  • What changes enable us to deliver more features and adjustments in less time?

This document does not try to answer these questions in isolation but instead, establishes a foundation that supports the upcoming Working Group. They will dive deeper into the highest-priority patterns.  

Summary of challenges[edit]

Many of the challenges the WMF faces when modernizing their system(s) are shared across the industry. Three are relatively unique and the most essential challenges for the Working Group to work through:

A "Frontend" must be defined[edit]

Although we do and will have frontends on many clients apps, voice assistants and otherwise, it is useful to think of the frontend from the browser perspective as it helps us define what we need to make these different experiences available to all.

In the browser context, the frontend can be described as:

Anything the browser touches. What a user sees, clicks or taps as well as what may be running invisibly within the browser.

This definition of a “frontend” can be easily extended to our other experiences by generalizing it to:

Anything the client touches. What a user sees, clicks or taps as well as what may be running invisibly within the client.

This gives us a good starting place. We can further delineate our frontends by the experience of the user. In this light, there are five different frontends that Wikimedia project user experience:

  • Reading: A highly-cached, quickly-loaded displayer of pages frontend. This category could be broken down into desktop, phone and app reader frontends because they are not the same.  
  • Editing: A composition frontend that provides tools like Visual Editor, Wikitext editor and Content Translation
  • Curating: A workflow-driven frontend that uses a suite of tools to interact with content and content events. Currently implemented in various WMF and community developed tools such as: Watchlists, bots, gadgets, special pages, etc…
  • Discussing: A conversational frontend providing users a means to interact with other readers, editors, and curators. Currently implemented with Talk Pages and Flow.
  • Administrating: A set of tools that allows administrators to setup and run a wiki. This is mostly implemented in scripts, command line utilities and some special pages.

While these frontends sometimes overlap and conflated, true power comes from establishing clear boundaries between each. With such encapsulation, we can create more powerful interfaces by composing them in different ways (Such as overlaying Discussing on top of Reading).

In addition to how you use it, the frontend is also defined by how you build it. A frontend can be defined by the workflow of frontend developers, though that flow differs depending on whether you are an on-wiki, tool or platform developer. Most importantly to our discussion, there is an emerging definition of frontend that defines frontend as “products built without building a new backend.” This type of frontend depends on a modern software and/or systems architecture.

Building modern software tools that improve one of those experiences complicates others. A challenge for the Working Group will be to balance the needs of these “frontends”, identify the highest-value modernization opportunities and outline the tradeoffs.  

Systems of software must be intentionally designed[edit]

The Wikimedia Foundation's projects are systems of software. Integrating multi-project content and building new product UI experiences depend on communication between parts of the system(s). The primary impediment to this communication is that many of these “parts” are entangled within the mechanics of the software itself. These entangled mechanics are both necessary and constraining.  

To move forward, the Working Group will explore opportunities for modularization, disentangling software from system capabilities. The goal is to map a progressive evolution towards better communication between projects and within the system itself. What roadmap for change delivers software parts that interact more easily with each other? What bridges between them need to be built?  

Software patterns don't (always) scale. Wikitext, for example, works well for the software but does not structure data for healthy system-level communication. Also, software activities are designed to react when pages change, whereas system activities can happen anytime for many reasons, including when embedded content or data from another page changes. Communication between software components, in their own time, is critical. The Working Group should consider ways to improve system communication while respecting the integrity of the software.

Essential note: Most individual MediaWiki software instances (3rd party use cases) do not share these challenges. The Working Group will need to define the tradeoffs between software simplicity serving those use cases and engineering modern architectural enhancements that can improve Wikimedia's systems and products in order to enable leadership to make the best possible decisions in these cases.

Page content must be modularized and compose-able[edit]

The priorities, challenges and modernizations discussed here depend on one thing above all else: the ability to “build” or compose a page of content from blocks of various content.

At present a rendered page is a blend of content, sometimes from different sources, but the current software architecture displays a fully-rendered page originating (wholly) from Wikitext. There are not component parts to break up a “page” into smaller parts that can be shared. Practically this means the full content of our articles, whether it be text, media or data are “trapped” inside Wikitext. When rendering a page on a desktop this isn’t problematic.

However, when features or products require modularity, semantic meaning or structured data, they become much harder to develop, more prone to errors or even worse, impossible to create. This is in part due to Wikitext not being easily interpreted by machine learning apps or content layout tools, and these use cases often require time intensive brute force techniques and unscalable heuristics to overcome. This limitation has impacted the development of modern UI features such as responsive layout, creating stackable content such as page summaries, and structured data use cases such as Structured Data on Commons.

Wikitext has been critical to the proliferation of Wikimedia content by empowering hundreds of thousands of editors to publish millions of high quality articles on Wikipedia and other projects. Wikitext as a markup language has many upsides, but its use to manage layout and data have come with tradeoffs that have much deeper implications in 2019 than they did in 2009. To be successful with our recommendations, we need to always keep in mind our users and the need to provide them great authoring tools as we make architectural changes in pursuit of the goals of the Medium Term Plan.

This is an essential area of focus. Recommendations for the Working Group include multiple ways to experiment with decomposing a page without starting with a “Major Overhaul of Everything”.

Recommendations[edit]

Desired qualities[edit]

The goal of these recommendations is to develop three essential qualities in the system:

Modularity

  • There is some separation of concerns and activities.
  • Boundaries between software and bridges between software are defined. (See Systems of software above.)
  • Changes can be made to one area without impacting other areas.
  • Data shared between parts can be decomposed, pulled apart, and composed, put back together based on the context. The process scales, new combinations can be made without writing more software.
  • Enables things like libraries, stores, encapsulation, services etc

Just-enough coupling

  • Currently, the architecture is highly coupled, which is another way of saying entangled. Decoupling is not an event but a process, a series of steps that change the overall pattern.

Timing: reduce temporal coupling

  • Create less procedural, sequential control of relationships and communication.
  • Page generation is a nondeterministic activity by nature. Communication generally requires more ability to predetermine responses.

Exploration Areas[edit]

Our elaboration effort exposed five areas of potential exploration. We recommend these areas of focus for the Working Group. They satisfy valuable priorities, test the waters for further disentanglement, and/or raise issues that need to be resolved, regardless of the path chosen.

1. Develop a user interface using modern frontend tooling[edit]

This effort is focused on two important goals for product development: The ability to build delightful, dynamic user interfaces - and providing an architecture which enables frontend engineers to quickly develop new user interfaces which are decoupled from the platform and do not require knowledge of how the platform works internally.

An essential discussion point for both “defining the frontend” and modularization is how to untangle Javascript and bring it into the frontend. How do we enable Javascript developers to keep the code as close to the browser as possible and reuse common functions? OOUI enables the integration of MediaWiki with Javascript enhancements but because it was developed prior to the current set of industry frameworks, it conceptually differs in fundamental ways from those tools.

These architectural differences mean that MediaWiki is not easily integratable with industry standard frameworks and also make it difficult to share the localization and accessibility advances that have gone into OOUI with those engineers attempting to use frameworks like Vue and React. Figuring out how to align with industry standards while integrating the work that has gone into OOUI represents a huge opportunity for Wikimedia to contribute to localization and accessibility of modern JS development for the web.

This area of focus lays the groundwork for moving further towards “frontend”. The Working Group will need to explore how the code contribution process might work, among other factors, which will engage Javascript developers in the architecture process.

2. Create a service for rendering components from template content[edit]

Templates in the MediaWiki ecosystem are overloaded with different use cases: they provide content structure, reusable components, and allow contributors to modify the display of content with custom logic. This recommendation has strong potential for creating modularization that benefits the system as a whole. It is tricky to discuss though, because "templates" mean different things to different people and have different use cases.

In this case, we don't mean Mustache templates that govern look and feel. We mean encapsulating the "boxes" of content that are added to pages. These may (or may not) take parameters defining some of their content. A first step for the Working Group will be to define what is included in this "template" modularization effort and what isn't.

These “boxes” are an ideal focus area for creating modularity. They represent self contained features and also an opportunity to enable equitable sharing of user features across projects and languages be establishing a cross-project service to share templates. This project will also force us to consider how to handle content layout and structure separately from composable pieces content.

The patterns architected during this initiative will inform other types of modularization (libraries for frontend enhancements, for example).

3. Build a non-page based watchlist using APIs[edit]

The Watchlist page does not rely on content pages (or need to break them down) but instead, reports on changes to pages. Curators are interacting with that page in ways that are ideal for prototyping a "frontend" tool that communicates with the system but does not need to be enmeshed into it.  The stakeholder group is smaller and eager to provide feedback.

This area of focus lays the groundwork for architectural patterns that can apply to special pages and gadgets in general.

4. Introduce page summaries as distinct, editable and curate-able content[edit]

In order to lay the groundwork for structured data in articles, we should develop a feature that:

  • Will use all the multiple types of data storage, such as text and images.
  • Are composed from different sources, such as Wikipedia and Wikidata
  • For which we can easily generate and populate content
  • Have an immediate benefit for our product plans

Page summaries are designed to be an easily distributable “block” of content which can represent the full content of a page and can be displayed in different contexts. They are currently generated via a hardened, well tested service. The content from summaries can be used to create Open Graph markup to make our content machine readable to power machine learning applications.

Moreover, page summaries are currently only generated heuristically and are not able to be defined or modified by editors. Creating a modular architecture to store and curate summaries establishes a pattern that can apply to other elements of the page.

Summaries also begin the conversation with the community about the types of modernization inherent in the medium-term plan. These changes will eventually require socio-technical process for change management as they introduce new types of curatable content. This recommendation gives the Working Group an opportunity to consider and architect people processes that enable their technology strategies. And include staff who are especially skilled at this aspect.

5. Introduce topic maps to ontologically categorize content[edit]

Categories have been used by contributors in order to create a taxonomy of pages for years. They use them as a means to tag content for discovery or curation or other patrolling activities. This has been accomplished by adding Wikitext markup causing semantic meaning has been embedded within the content. This also means that categories have no inherent structure or hierarchy and are not able to be used across projects.

Topic maps present a new way to separate the taxonomy of the content from the content itself providing structure to our wealth of content and enabling new means of discovery. With this first project we could establish a means to tag to pages and other types of content with editable, curatable, standardized metadata as well as machine generated metadata.

Next steps[edit]

Forming a Frontend Architecture Working Group[edit]

The recommendations above are intended to drive the planning of the architecture changes needed to modernize our platform and serve the known and unknown use cases of the WIkimedia ecosystem for the next decade. As a next step, it is recommended that a Frontend Architecture Working Group is formed to further elaborate and scope each of the five explorations. Thoughtful composition, processes and facilitation will be critical to the Working group’s success. These will be developed and defined in a document in order to provide structure to the work and help guide the group towards impactful outcomes.

Generating project proposals[edit]

The first goal of the working group should be to generate a proposal to solve one of the five explorations with the output being a well scoped project plan. The proposal should have full project plan with product and technical requirements, as well as resources, milestones, risks and tradeoffs. Aspects of performance, reliability, scalability and security should be accounted for in solutions. Metrics should be established to measure success. Projects should be focused on delivering a user facing feature and also be as short as possible (no longer than 3 months) to ensure they can be accomplished. The exact format and content of a proposal will be specified in another document.

Developing and delivering a project[edit]

For each proposal, a cross functional team from the staff of the Product and Technology Departments should be created. Team members should be chosen for not only their skills, but also their enthusiasm to exploring new ideas, eagerness to push the boundaries of our technology and focus on delivering value to our end users and community. These teams will focus on delivering the solution proposed by the working group and will work exclusively on the project for the duration. The result of the project should be a shippable user facing feature or product. This focus on delivery is important as it keeps the team driving towards real needs and real users.

Retro and repeat[edit]

This is a living document and the recommendations within it represent our best current understanding. As we build, we learn and this document should be updated to reflect those changes in understanding. Following the delivery of a project, both the working group and leadership should perform retrospectives on the process. Improvements and changes should be suggested and implemented. New projects should be proposed, existing projects should be re-evaluated and the next exploration should be chosen. At this point new working group members can rotate. This template should be followed until all five explorations have been researched and solutions have been developed.

Success[edit]

The success of this initiative is measured by the change in our architecture, products and processes. We should see real changes in engineer productivity and onboarding new engineers, (which we will likely need a way to quantify), we should see real value delivered to users as measured by metrics we develop, and we should establish new ways for our engineering teams to work together by experimenting with cross functional teams.

Appendix[edit]

Additional materials and further information can be found here:

Additional information to accompany the priorities, findings and reccomendations