Jump to content

Flink SIG/Meetings/2025-05-22

From mediawiki.org

Agenda

[edit]
  • Review past action points.
  • Flink <> Iceberg integration, Enterprise meeting (TODO).
  • Peg operator update to k8s cluster update.

Action points

[edit]
  • Test Flink 2
    • Can wikimedia-event-utilities compile?
    • How can we replace WDQS-updater scala APIs use?
  • XC: to meet with Enterprise and discuss Iceberg integration with streaming tech
  • XC: to check in with ML Platform re discussing their streaming use cases in next deep dive.

Notes

[edit]

Action points from last time:    

https://www.mediawiki.org/wiki/Flink_SIG/Meetings/2025-02-20

AH: start a google doc / wiki to collect existing and future use cases

[edit]
  1. https://docs.google.com/document/d/172F4gBGN64fGWCb37XcH6lbZUVpxz-9wW6pRI15dXTU/edit?tab=t.0 GM: can we use flink? DC: let of tech debt. Would like to move on the same approach of article topics output predictions to stream, consumer uses it XC: Asked Chris for a DPE Deep Dive for ML team. Should we cover what infra they are using? BT: they have ml-serve, ml-staging, airflow instance on dse (with GPUs), mllab boxes. GM: changeprop? DC: they are triggerd from changeprop to predict article level predictions (topics), they send back an event to EventGate (or calling liftwing - returns a prediction in response). DC: they push a stream for weighted tags via EventGate, DC: would like the same approach for similar use cases.

 

BK: Document operator dev and maintenance on wikitech. https://phabricator.wikimedia.org/T386950

[edit]
  • no update

BT: when we upgrade k8s we'll review operators

BT: should test on staging

DC: if staging is OK, nbd to upgrade

GM: this upgrade would bring in 1.20

DC: k8s operator is a blocker. We are not far, but we need to find time to do it

BT: is there a hello world we can use for integration test?

DC: couple of real apps running in staging

GM: tutorial has a hello world that streams from SSE

  XC: does operator run on app basis?

  DC: no you need something compatible with the op. The op runs in its own namespace and deploys flink jobs.

  XC: can you have multiple version running at the same time?

  DC: op needs to know which namespaces to listen to

  BT: list of namespaces. Listens for an API call to make deployments

  DC: op can handle multiple flink version, but there's an upper bound in version. We might be able to run multiple ops, but requires k8s multi tenancy.

  XC: we'll have this convo with Enterprise. Iceberg <> Flink connector that does work towards what we do with refine. Enterprise is doing this in prod. (Slack link: https://wikimedia.slack.com/archives/CSV483812/p1745935435566159 )

  GM: Flink PoC should be nice to have

  XC: could be in scope for Dumps 2

  BT: update tied with Hadoop 3 update

DC: Research what's coming with FIlnk 2.

[edit]

       DC: worked to try to make flink more cloud native. Now state is stored locally and pushed to s3. Instead of having full state, cache only what we need locally and keep full state elsewhere. No hit on latency, but increased state capacity

       DC: Paimon + laehouse

       BT: helped with accessing mariadb binlog

       DC: schema change on the fly

       DC: stream batch unification. Iceberg (old data), Kafka (fresh data), use flink for etl

       DC: big rounds of deprecation. WDQS will be affected, scala has been derpecated

       GM: shall we try to update mediawiki-event-utilities? can we estimate WDQS?

       GM: does WDQS use the bundled scala API?

       GM:

GM: wrap up the work on ListState and EXACTLY_ONCE semantics.

[edit]
  1.    https://gitlab.wikimedia.org/repos/data-engineering/eventutilities-python/-/merge_requests/83