Platform Engineering Team/Event Platform Value Stream/Event Driven Use Cases

Here we collect a list of projects that would benefit from a stream processing platform. This page categorizes these use cases with the following requirements.

Enrichment? - Is the pipeline a 1 to 1 event transformation? Many use cases are, and these all have very similar deployment requirements, making them good candidates for the platform library and tooling abstractions. If this is not true, then the pipeline likely requires many input streams and/or does some kind of aggregations on them.

Streaming State? - If implemented with stream processing, does the stream processor need to maintain state? If so, the complexity of implementing and operating the pipeline increases. We'd still like it to be easy to use Event Platform streams without a lot of boiler plate, but abstracting away the pipeline logic will be difficult. Developers will have to understand more about the streaming framework and how to operate it.

Needs Backfill? - Once deployed, does the output data need backfilled? E.g. perhaps your job wants recommendations available for all MW pages, not just the ones that have been recently edited? If so, this is true, then ideally, we can run the same pipeline logic on an input dataset in batch mode. Ingestion to serving datastore? - Will the output of the pipeline need to be written to a storage or database for serving to real users? We'd like to make streaming ingestion automated, similar to how Data Engineering has done for the  Hive database. Ideally, all that a user would need to do is add some configuration to have their output stream automatically ingested into their datastore.

Use Cases
Some of these use cases already exist in some form, and others are nascent product ideas.

Examples of Wikimedia one-off data pipelines
Original source of table