Wikimedia Technology/Annual Plans/FY2019/TEC2: Modern Event Platform

From MediaWiki.org
Jump to navigation Jump to search

TEC2ː Modern Event Platform[edit]

Efforts on this program are around building an event data platform that can be used by data driven production features and analytics.

Our goal is to lower the difficulty in building interoperable systems for both production and analytics purposes, by encouraging event-first oriented services. We will build the backend systems and conventions that support this architecture.

EventLogging is home grown, and was not designed for purposes other than low volume analytics in MySQL databases. However, the ideas it was based on are solid and convergently have become an industry standard, often called a Stream Data Platform. In the last two years, we have been developing the EventBus sub-system with the aim of standardizing events to be used both internally for propagating changes to update the dependent artifacts as well as exposing them to clients. While this has been a success, integrating these events with different systems requires much custom and cumbersome glue code. There exist open source technologies for integrating and processing streams of events. This program is about modernising our event production and collection systems with strong open source technologies and best practices.

We often have use cases that depend on the same 'event' happening. For example, a cache purge needs to know that a page was edited. In the analytics arena we track these events for various purposes. With a comprehensive event based architecture all consumers, analytics or otherwise, can share streams of events and take action as pertains. This is somewhat possible with our current system, but quite cumbersome, static and non fine-grained. We need a more robust solution that allows us to build services that can both consume and produce standardised data in a predictable fashion. We are already moving towards this type of system design with EventBus and Change Prop, but slowly and without a larger vision. To do this right, we need a standardised, organization-wide way of producing, transforming, and consuming events. This will make it easier to share data for production features and to integrate analytics systems for querying and dashboarding.

Engineering teams should be able to quickly develop features that are easy to instrument and measure, as well as for those features to react to events from other systems.

Additionally, experience in our existing infrastructure informs the need for ordering and de-duplication of interdependent event-driven tasks, and so as part of this work we plan to explore an implementation of fine-grained dependency tracking. For example, when we utilize an event stream to purge entities from caches, we often issue many unnecessary purges out of an abundance of caution, since information about the relationships of entities is unknown. A dependency tracking system will allow these relationships to be known and up to date. Event Data Platform components can support a dependency tracking service that can intelligently update dependencies in real time. As such, the choice of a stream processing system is tightly coupled with a dependency tracking solution. As part of this program, we will collaboratively research and choose a stream processing system that enables complex dependency tracking, as well as make architecture decisions about how to eventually build a dependency tracking system.

Program outline[edit]

Teams contributing to the program[edit]

Site Reliability Engineering, Analytics, Services

Annual Plan priorities[edit]

Primary Goal: 3. Knowledge as a Service - evolve our systems and structures

How does your program affect annual plan priority?[edit]

By building a comprehensive event data platform, we will reduce the friction involved in building analytics and production services that need to reliably share data with current and future systems.

Program Goal[edit]

A modern event data platform will make it easier for engineers to build infrastructure for Knowledge as a Service. It will enable measuring the effectiveness of engineering projects, and also provide a base for smart reactive services, such as dependency tracking.

Outcomes[edit]

Outcome 1[edit]

Wikimedia engineers have a reliable, scalable, and comprehensive platform for building services that produce and consume event data for analytics and production.
Output 1.1

Events can easily and reliably be produced by internal and external clients and consumed by other internal services.

Output 1.2

Analytics systems can easily consume events for aggregation, querying, and dashboarding.

Output 1.3

It is clear to engineers how to design event schemas to support analytics and production features to ease future maintenance and evolution of those systems.

Output 1.4

Events can easily be consumed into any system or state store for analytics and production features.

Outcome 2[edit]

Stream processing system with dependency tracking system conceptual design.
Output 2.1

A set of use-cases of stream processing and dependency tracking identified and their requirements established.

Output 2.2

Open source stream processing systems evaluated and one chosen based on requirements. Stretch: chosen stream processing deployed to production.

Output 2.3

Scalable and reliable dependency graph storage solutions investigated and chosen.

Output 2.4

Prototype dependency tracking architecture designed.

Targets[edit]

Outcome 1 Measurement[edit]

Analytics team has deployed at least one event based service or automated dashboard using Event Data Platform. WMF engineers satisfied with Event Data Platform and willing to use it to build services.

Outcome 2 Measurement[edit]

Consensus on stream processing system choice and preliminary dependency tracking technologies.

Resources[edit]

People FY2017–18 FY2018–19
Analytics
  • none
  • Engineer (reallocation)
Services
  • none
  • 1.5 ✕ Engineer (reallocation)
SRE
  • none
  • 0.5 ✕ SRE (reallocation)
CapEx
  • none
  • none
Travel and Other
  • none
  • none

Dependencies[edit]

  • Output 1.3 requires upfront input from many Audiences and Technology engineers, to ensure confidence in system architecture choices.
  • Output 2.2 Stretch goal and all requires that Streamlined Service Delivery (Kubernetes) succeeds and is usable.

Other Docs[edit]

See also