Platform Engineering Team/Event Platform Value Stream
|
Event Platform
|
What is it?
[edit]Engineers from across Technology — including Platform, Data Engineering, Search, and Enterprise — collaborate on a shared event streaming platform that provides a unified capability for producing and consuming events across the Wikimedia ecosystem.
Existing event streams indicate that a change has occurred, but often lack critical context needed to interpret that change (see phab:T291120). The Event Platform enables the creation of enriched, structured, and semantically meaningful event data, supporting both Wikimedia Foundation initiatives and community-driven innovation.
What we aim to achieve
[edit]- Evaluation of event streaming platforms
- Implementation of the selected event streaming solution as a proof of concept (no SLOs)
- Development of initial stream processing services:
- Simple Enrichment — transform a single stream with context from MediaWiki APIs
- Research Use Case — transform a single stream to provide data for a validated research need
- Data Integration — integrate streaming data with existing databases
- Documenting the pathway and considerations required to take the chosen solution to production readiness
- Creating tooling and pathways to enable other engineering groups to build and operate streaming services or processors
How does this benefit the movement?
[edit]- Knowledge as a service — Publishing enriched event streams enables developers, researchers, and partners worldwide to build new knowledge experiences on top of Wikimedia’s open infrastructure.
- Knowledge equity — By making enriched, contextual data streams openly available, we reduce technical barriers to accessing and using Wikimedia data, supporting innovation across languages, regions, and communities that are currently underserved.
Links
[edit]- Assess what is required for the enrichment pipeline to run on k8s
- Build simple stateless service using Flink SQL
- Build simple stateless service using PyFlink
- Contribution
- Evaluate a pyflink version of Mediawiki Stream Enrichment
- Event Catalog
- Event Driven Use Cases
- PoC Mediawiki Stream Enrichment
- Pyflink Enrichment Service Deployment
- Stream Processing Framework Evaluation
- Use case: Event Platform SDLC practices
- Use case: compute needs for streaming pipelines
With Flink selected as the solution, the first service will consolidate existing streams, enrich messages with page content (wikitext, JSON, etc.), and publish the output to a new topic.
More details are documented in this Phabricator task.
As part of the proof-of-concept work, we have also developed tooling to simplify consumption of existing Event Platform streams — see T308356.
Milestone: A demonstration video is available here.
(In Progress): Building on Flink Learnings and the POC Service
[edit]To be refined and groomed:
| Ticket | Title | Description | Lead / Backup | Timebox | Status |
|---|---|---|---|---|---|
| T310082 | State Changelog schema design | Streaming event data represents changes to an entity (e.g. a page). If we can represent these changes so that the current state becomes materializable solely via consumed past events, then the event stream functions as a changelog. Flink can automatically consume changelog streams and present them as materialized views of current state. This task explores designing event streams in this way. | Andrew Otto / David Causse | 1 week | Planned / Needs grooming |
| T309784 | Consolidated and ordered page change stream | Proof-of-concept service to simulate a consolidated single stream with correctly ordered events. | David Causse / Gabriele Modena | 4 weeks? | Planned |
| T306627 | Integrate Image Suggestions Feedback with Cassandra | Design, implement, and deploy a service that listens for image suggestion feedback and writes it to the Cassandra schema for persistent storage. | Thomas Chin / Group | Unbounded | In progress |
| To be created | Research Use Case | Demonstrate and discuss with Research potential event stream use cases, such as diffs and enrichments. Collaborate on a proof-of-concept driven by Research needs. | Group (initially) | ~4 weeks (TBD) | Planned |
Future Phases: Tooling and abstractions
[edit]To be further defined and groomed:
| Ticket | Title | Description | Lead / Backup | Timebox | Status |
|---|---|---|---|---|---|
| T310218 | Flink output support for Event Platform | A Table API abstraction now exists for consuming Event Platform streams as a Table source. This task explores creating an automated way to emit events as well — likely by wrapping the existing JsonEventGenerator. | Andrew Otto / David Causse | ~4 weeks (TBD) | Planned / Needs grooming |
| To be created | AsyncLookupTable for the MediaWiki API | Investigation into whether an AsyncLookupTable abstraction should be built for the MediaWiki API — to provide built-in retry handling and simplify usage within Flink. | Andrew Otto / TBD | ~4 weeks (TBD) | Planned / Needs grooming |
| T309699 | Retry logic / error handling | Define patterns and tooling for resilient error handling and retry strategies for streaming services. | To be assigned | TBD | Planned |