Jump to content

Platform Engineering Team/Event Platform Value Stream

From mediawiki.org

What is it?

[edit]

Engineers from across Technology — including Platform, Data Engineering, Search, and Enterprise — collaborate on a shared event streaming platform that provides a unified capability for producing and consuming events across the Wikimedia ecosystem.

Existing event streams indicate that a change has occurred, but often lack critical context needed to interpret that change (see phab:T291120). The Event Platform enables the creation of enriched, structured, and semantically meaningful event data, supporting both Wikimedia Foundation initiatives and community-driven innovation.

What we aim to achieve

[edit]
  • Evaluation of event streaming platforms
  • Implementation of the selected event streaming solution as a proof of concept (no SLOs)
  • Development of initial stream processing services:
    • Simple Enrichment — transform a single stream with context from MediaWiki APIs
    • Research Use Case — transform a single stream to provide data for a validated research need
    • Data Integration — integrate streaming data with existing databases
  • Documenting the pathway and considerations required to take the chosen solution to production readiness
  • Creating tooling and pathways to enable other engineering groups to build and operate streaming services or processors

How does this benefit the movement?

[edit]
  • Knowledge as a service — Publishing enriched event streams enables developers, researchers, and partners worldwide to build new knowledge experiences on top of Wikimedia’s open infrastructure.
  • Knowledge equity — By making enriched, contextual data streams openly available, we reduce technical barriers to accessing and using Wikimedia data, supporting innovation across languages, regions, and communities that are currently underserved.
[edit]

Creating the first service — T307959

[edit]

With Flink selected as the solution, the first service will consolidate existing streams, enrich messages with page content (wikitext, JSON, etc.), and publish the output to a new topic.

More details are documented in this Phabricator task.

As part of the proof-of-concept work, we have also developed tooling to simplify consumption of existing Event Platform streams — see T308356.

Milestone: A demonstration video is available here.

[edit]

To be refined and groomed:

Ticket Title Description Lead / Backup Timebox Status
T310082 State Changelog schema design Streaming event data represents changes to an entity (e.g. a page). If we can represent these changes so that the current state becomes materializable solely via consumed past events, then the event stream functions as a changelog. Flink can automatically consume changelog streams and present them as materialized views of current state. This task explores designing event streams in this way. Andrew Otto / David Causse 1 week Planned / Needs grooming
T309784 Consolidated and ordered page change stream Proof-of-concept service to simulate a consolidated single stream with correctly ordered events. David Causse / Gabriele Modena 4 weeks? Planned
T306627 Integrate Image Suggestions Feedback with Cassandra Design, implement, and deploy a service that listens for image suggestion feedback and writes it to the Cassandra schema for persistent storage. Thomas Chin / Group Unbounded In progress
To be created Research Use Case Demonstrate and discuss with Research potential event stream use cases, such as diffs and enrichments. Collaborate on a proof-of-concept driven by Research needs. Group (initially) ~4 weeks (TBD) Planned

Future Phases: Tooling and abstractions

[edit]

To be further defined and groomed:

Ticket Title Description Lead / Backup Timebox Status
T310218 Flink output support for Event Platform A Table API abstraction now exists for consuming Event Platform streams as a Table source. This task explores creating an automated way to emit events as well — likely by wrapping the existing JsonEventGenerator. Andrew Otto / David Causse ~4 weeks (TBD) Planned / Needs grooming
To be created AsyncLookupTable for the MediaWiki API Investigation into whether an AsyncLookupTable abstraction should be built for the MediaWiki API — to provide built-in retry handling and simplify usage within Flink. Andrew Otto / TBD ~4 weeks (TBD) Planned / Needs grooming
T309699 Retry logic / error handling Define patterns and tooling for resilient error handling and retry strategies for streaming services. To be assigned TBD Planned