Wikimedia Maps/2021 modernization plan

Summary
''' Wikimedia Maps are transitioning towards a more modern architecture. ''' The first phase of this transition will be replacing Tilerator with Tegola as our vector tile server. This is a change in the Maps infrastructure, so there should be little to no impact to the end users’ experience.

It is important that we are able to provide software that is sustainable to support, before we can guarantee a reliable user experience. Wikimedia Maps aim to provide Wikimedia users a consistent experience contributing to and learning about geoinformation. To achieve this goal, we will empower those engineers maintaining the Wikimedia Maps infrastructure to do so with ease and low effort.

Context
Wikimedia Foundation has supported the enablement of geoinformation through the creation and maintenance of Wikimedia Maps. The set of maps services provides the capability to render both dynamic and static images of geographic locations that have been embedded in Wikimedia projects as well as standalone projects run by third-party services.

The Wikimedia 2030 Movement Strategy states that our strategic direction is becoming the essential infrastructure of the ecosystem of free knowledge, and anyone who shares our vision will be able to join us. Although the foundation wants to enable contributors and people to learn about and contribute to geoinformation knowledge, we have to consider the constraints both from an organization and technology perspective. This has led us to make a conscious decision to maintain and not grow our geoinformation capability beyond what experience we provide today.

Goal
As we have come to understand our constraints on technology and capacity, we have provided the following outcomes as a target for where we want to have the maps infrastructure:

Wikimedia users will have a reliable and consistent experience contributing to and learning about geo-information

In order to achieve these goals, there are a few objectives in which we plan to mitigate in the short-term as well as some longer-term objectives that we need to explore more in order to truly understand how we resolve the problems that we have today.

Short-term Objectives

 * 1) Reduce map latency with OpenStreetMap (OSM)
 * 2) Empower SRE to support maps related incidents and maintenance


 * 1) Define SLO for both map consumers and SRE
 * 2) Provide efficient monitoring capabilities to support maps (Prometheus instead of Graphite)

Longer-term Objectives

 * 1) Reduce SRE dependency, empower client-side autonomy (Phase 1)

Problem
Wikimedia Maps has the most number of outages at the foundation to date. As those outages occur, there is no defined performance metric to indicate success or service degradation; our current monitoring capabilities for maps are poor and unhelpful; and the various pieces of related code are very complex and not easily understood in order to gain support and maintenance.

Hypothesis
We believe that modernizing the maps infrastructure will reduce complexity, enable monitoring capabilities, and better empower SRE to resolve issues quickly and intuitively.

The overall rationale behind the following phased approach is to be able to do atomic changes without breaking the current functionality and with minimal disruption. We also want to allow for evaluation of the changes and enable feedback to be provided along the way. The plan is to approach this modernization iteratively and starting with replacing our current vector tile server, tilerator, with the open-source vector tile server, Tegola. We hope moving away from server-side raster rendering to client-side reduces dependency on SRE and allows this team to be autonomous when it comes to supporting and maintaining the maps stack.

By modernizing our maps infrastructure, we empower SREs to support maps-related incidents and maintenance by


 * Moving away from static allocation of services to bare metal to services in Kubernetes
 * Reduce the complexity of the infrastructure by removing legacy/deprecated dependencies
 * Use technologies where our SREs have a lot of expertise

Timeline
January 2021 - June 2021 (pending architecture review)


 * Phase 1: Modernize Vector Tile Infrastructure

Future (To Be Determined)


 * Phase 2: Modernize Raster Tile Infrastructure
 * Phase 3: Sunset Legacy Maps Infrastructure

What are these changes about?
Maps will get a stronger, more modern architecture over the course of the several next months. The first step in this path will affect the vector tile pipeline, as we'll adopt Tegola. End users should notice no change.

Does this mean you are resuming major work on Maps?
Not at this time. Until the end of the fiscal year, we'll be busy modernizing the vector tile infrastructure. We hope this will make it less painful for the Site Reliability Engineering team to support maps-related incidents and maintenance.

How can people recommend to add a Maps feature?
We recommend that you keep channeling such requests through the Community Wishlist survey; you're welcome to flag any bugs in Phabricator via the Maps tag instead.

Who will be leading this project?
The Product Infrastructure team is currently in charge of the work.

How can we contact the team?
Feel free to use Maps-l for questions. For urgent matters, you can find us on IRC at #wikimedia-infrastructure.