Platform Engineering Team/Event Platform Value Stream/Stream Processing Framework Evaluation

=Shared Event Platform Project=

What is it?
Engineers from across Technology (Platform, Data Engineering, Search and Enterprise) will collaborate on a shared event streaming platform capability that is beneficial to each group and the overall foundation.

Existing event streams serve as a change of state but lack many details required to make sense of that change (see T291120), the event platform will enable us to build enriched data streams that will allow the foundation and community to build and share better knowledge experiences.

What we aim to achieve?

 * Evaluation of event streaming platforms
 * Implementation of chosen event streaming solution as a proof of concept (no SLO's)
 * Implementation of the following services/stream processors:
 * Simple Enrichment - transform a single stream by enriching with calls to MediaWiki API's
 * Research Use Case - transform a single stream to provide data for a Research Use Case
 * Data Integration - integrating streams and databases
 * Understanding the pathway and considerations to take the chosen solution to production
 * Creating tooling and pathways for other engineering groups to build streaming services/processors

How does this benefit the movement?

 * Knowledge as a service - Publishing enriched event streams to the world will allow anyone to build on that to create new knowledge experiences
 * Knowledge equity - By publishing enriched streams we break down technical barriers in navigating and accessing data that could be used to build new knowledge experiences

Phase 1: Evaluating Solutions
The analysis for Flink and Kafka streams has been supplemented from the evaluation conducted by the Search team

Flink

 * Java API is somewhat limited, because of type erasure (doc). Because of this, Scala seems a better choice.
 * Testing API enables both stateless and stateful testing. Same with timely UDFs (user defined functions) (doc)
 * There is a script to launch Scala Flink REPL, seems useful
 * There are few different levels of API here, ranging from SQL analytics to low level stateful stream processing (1.10 Documentation: Dataflow Programming Model)

Kafka Streams

 * It focuses more heavily on SQL-like - called KQL- approach, when it comes to data mangling
 * It looks cool for simple operations on Kafka topics, but the philosophy here is to augment existing applications (Kafka Streams API is a library) with a dash of data processing, rather than create standalone processing applications. They say so basically in the first, introductory video (1. Intro to Streams | Apache Kafka® Streams API)
 * It’s difficult to find code examples in their documentation - Apache Flink’s is much better in that regard.

Knative

 * To be done

Final Decision Record
To be decided