Flink SIG/Meetings/2025-05-22
Agenda
[edit]- Review past action points.
- Flink <> Iceberg integration, Enterprise meeting (TODO).
- Peg operator update to k8s cluster update.
Action points
[edit]- Test Flink 2
- Can wikimedia-event-utilities compile?
- How can we replace WDQS-updater scala APIs use?
- XC: to meet with Enterprise and discuss Iceberg integration with streaming tech
- XC: to check in with ML Platform re discussing their streaming use cases in next deep dive.
Notes
[edit]Action points from last time:
https://www.mediawiki.org/wiki/Flink_SIG/Meetings/2025-02-20
AH: start a google doc / wiki to collect existing and future use cases
[edit]- https://docs.google.com/document/d/172F4gBGN64fGWCb37XcH6lbZUVpxz-9wW6pRI15dXTU/edit?tab=t.0 GM: can we use flink? DC: let of tech debt. Would like to move on the same approach of article topics output predictions to stream, consumer uses it XC: Asked Chris for a DPE Deep Dive for ML team. Should we cover what infra they are using? BT: they have ml-serve, ml-staging, airflow instance on dse (with GPUs), mllab boxes. GM: changeprop? DC: they are triggerd from changeprop to predict article level predictions (topics), they send back an event to EventGate (or calling liftwing - returns a prediction in response). DC: they push a stream for weighted tags via EventGate, DC: would like the same approach for similar use cases.
BK: Document operator dev and maintenance on wikitech. https://phabricator.wikimedia.org/T386950
[edit]- no update
BT: when we upgrade k8s we'll review operators
BT: should test on staging
DC: if staging is OK, nbd to upgrade
GM: this upgrade would bring in 1.20
DC: k8s operator is a blocker. We are not far, but we need to find time to do it
BT: is there a hello world we can use for integration test?
DC: couple of real apps running in staging
GM: tutorial has a hello world that streams from SSE
XC: does operator run on app basis?
DC: no you need something compatible with the op. The op runs in its own namespace and deploys flink jobs.
XC: can you have multiple version running at the same time?
DC: op needs to know which namespaces to listen to
BT: list of namespaces. Listens for an API call to make deployments
DC: op can handle multiple flink version, but there's an upper bound in version. We might be able to run multiple ops, but requires k8s multi tenancy.
XC: we'll have this convo with Enterprise. Iceberg <> Flink connector that does work towards what we do with refine. Enterprise is doing this in prod. (Slack link: https://wikimedia.slack.com/archives/CSV483812/p1745935435566159 )
GM: Flink PoC should be nice to have
XC: could be in scope for Dumps 2
BT: update tied with Hadoop 3 update
DC: Research what's coming with FIlnk 2.
[edit]DC: worked to try to make flink more cloud native. Now state is stored locally and pushed to s3. Instead of having full state, cache only what we need locally and keep full state elsewhere. No hit on latency, but increased state capacity
DC: Paimon + laehouse
BT: helped with accessing mariadb binlog
DC: schema change on the fly
DC: stream batch unification. Iceberg (old data), Kafka (fresh data), use flink for etl
DC: big rounds of deprecation. WDQS will be affected, scala has been derpecated
GM: shall we try to update mediawiki-event-utilities? can we estimate WDQS?
GM: does WDQS use the bundled scala API?
GM: