Kubernetes SIG/Meetings/2023-04-25

Introductions for new members (if any)
SIG administrivia:
- Nothing to report
Topic: Kubernetes versioning discussion (continuation)
- Action items from previous meeting:
  - Speak to our respective managers and get sign-off for pre-planning this upgrade.
  - Get acquainted with reading changelogs and security bulletins
  - Read https://wikitech.wikimedia.org/wiki/Kubernetes/Kubernetes_Infrastructure_upgrade_policy
- Rough summary from previous meeting
  - Production use cases should converge to the same policies.
  - Testing is a use case that might be hampered by overly strict convergence. That use case could be served in different ways too
  - Overall agreement to support 2 versions (one currently in and one to upgrade to) at most.
  - Needing management buy in to make sure we all allocate time to upgrade our clusters, and plan to upgrade each year.
  - For point releases (specifically security): shared understanding that things won’t break when executing such upgrades in place, will do them on as needed basis (https://kubernetes.io/releases/version-skew-policy/).
- Parts not discussed in the previous meeting
  - Do we need to support a different method of updating then re-init (for clusters that are not multi-DC and can’t be easily depooled)?
  - Should we try to update components “off-band” (like updating calico while not updating k8s to unclutter the k8s update)
- Off predefined topic
  - How far have we got with spark and user-generated workload on dse?
    - How can we manage secrets better for users sharing namespaces? Or should we not?
    - Draft document detailing progress with Spark/Hadoop/Hive already open for comments.

Notes

[AO] We can do downtime for DSE, I believe, might want to check with Luca and ML team.

[CD] We can also do downtime for aux-k8s.

[SD] If we could create a business case for upgrading more often, would it lead to less work instead of the work for the big invasive upgrades?

[AK] Currently this hasn’t been yet possible. Upgrading once a quarter hasn’t been a thing up to now cause never managed to do so

[CD] This doesn’t answer the question. It’s orthogonal, as in it hasn’t happened, but we still could

[JM] I think it might be possible we come into a situation where we could do in place upgrades if we happen to prove that we’re able to keep up with k8s upgrade (and don’t have to do big jumps)

[AK] We did in place upgrades in the past (which worked) but mostly without workload

[JM] “off-band” updates of k8s components might help in keeping the actual k8s upgrades smaller and less scary (even help with in place k8s updates)

[AK] Maintaining the compatibility matrix in an ongoing fashion will be hard because of all the inter-dependencies, but overall it has the potential to pay off by making part of the work in piece meal parts.

[AK] Action item: Create a PoC for a Compatibility Matrix of Kubernetes versions vs Cluster components.

Big topics at KubeCon CloudnativeCon 2023:
- Kubernetes Gateway API and Cluster API
- Developer portals and service catalogs like backstage and crossplane
- Code sandboxing like Webassembly and ebpf
- CI (mostly argo)
- And of course security, AI and energy efficiency
- I can link relevant talks when they are uploaded

[All] Action Item: Read (and edit) Kubernetes_Infrastructure_upgrade_policy

[All] Action Item: Talk with managers about cluster upgrade resources (roughly 1 engineer a quarter per year)