Kubernetes SIG/Meetings/2023-07-26

Meeting notes:

Introductions for new members (if any):
- Welcoming Maryum
SIG administrivia:
- Janis will be running these meetings in September-November
Misc:
- [BT] Progress have been made towards getting Ceph in production. The storage is ready to go into commission (mention of numbers, missed that one). One of the goals is to see how to make these available to the DSE cluster. It might be nice to have that as an enabled technology. Also looking at S3 support (external though to k8s). Still moving forward at a relatively slow pace.
Topic: Alpine
- [OD] The usage of Alpine images would greatly benefit Datahub deployment procedures. WMCS reportedly already supports Alpine images, but it has different considerations and needs.
- [BT] Provided description of how the Datahub installation works currently in the production realm. Datahub requires 5 different container images to run. Current situation is:
  - Significant improvements in the process of updating Datahub to the latest versions. Still maintaining a parallel build process, but very lean now.
  - Datahub releases exclusively containers in Dockerhub. It’s mainly java, but not exclusively.
  - When specing the MVP approach, DE decided to run all of Datahub in the production realm. Built it from source using the Deployment Pipeline. Decided to keep Blubber files, maintaining a feature branch with all the blubber files in order to get them working with the way things were done in the datahub images. It was a time consuming process for DE. Took 6 months to upgrade from 0.9 to 0.10.
  - One idea to solve the problem was to download the alpine images and grab the jar files and put them in Debian.
  - Another would be to have our own Alpine images and use their build process, meaning we ‘d need to support Alpine in production
  - Another was to just streamline the build process. In the end, DE went with the third option, making significant changes and effectively un-forked, building now from pristine datahub sources and the process is much much cleaner.
  - The immediate operational need for running Alpine in production has gone away. Expecting to see if 0.10.5 upgrade will prove that (while also moving the pipeline to Gitlab)
- [OD] There is still some work to be done
- [BT] That’s right, e.g. one of the containers does database/elasticsearch migrations. So, adding a new container shouldn’t be very difficult.
- [OD] We can for now to shelve this discussion or we can continue this discussion as there might be more applications in the future that could benefit from using Alpine.
- [GL] Kudos to Ben for the work on making the Datahub work easier. An ask to share that information.Splitting the rest of the discussion in 2 different tracks
  - Usage of Dockerhub images
    - Supply chain attacks are a thing but we still have them anyway
    - The actual problem of using upstream Docker hub images is that when a big vulnerability shows up in one of those images, we end up being hostages of the upstream image builders. We always wanted to have full control over our image building process to avoid exactly that and allow us to fix problems anywhere in the production realm. This is a general problem for everything
  - Usage of a different base distro than Debian
    - Not opposed in principle, but there is a whole toolchain that reports e.g. packages that are vulnerable, images that are vulnerable etc. We ‘d have to reinvent and reimplement all of this toolchain if we wanted to move to a different base distro. We ‘d need a strong business case for that. Personal opinion on Alpine is that it isn’t justified to use, e.g. DNS resolution is broken Alpine due to the libc used, building Python packages is very slow, etc. Evaluation of case by case may make sense. Hoping that our pipelines will become way easier to operate when we move to Gitlab.
- [JM] Adding a few extra data points. We ‘ve already seen Dockerhub in very very bad quality in the past and we know that some won’t anyway work in our environment due to how they are executed. In regards to alpine, many of the benefits aren’t going to materialize as due to how we have the image tree constructed, everything in the base images is already cached in all kubernetes nodes by design, making gains from Alpine irrelevant
- [BT] Never tried to say “I think we should run images from Dockerhub”. Rather “download the image and copy binaries from the images to our images”. Agreeing on the low quality thing.
- [JM] We already do the copying binaries outside of images, including things I ‘ve written. It depends on the type of software, it can be situation where we ‘ll have hard times anyway. Golang mostly works fine with this. Envoy is different beast
- [AK] Extra mention of Envoy and how glibc was a problem and that we already do the “copy binaries” thing in the cases we know that it doesn’t offer.
[AK] Would like to discuss this further, such as the recent rise of ‘distroless’ images - these are very lightweight images that don’t contain anything except the application and its runtime environment.
[GL] Interesting and useful in e.g. Golang cases, but in many of our use cases, e.g. Mediawiki/Shellbox it can be very difficult to work with.
[BT] Distroless sounds very interesting. Wonder whether it could/should be used for eventgate?
[GL] The “no images from Dockerhub” is written somewhere but it probably is good to write it down as accepted as by the group.
[JM] Document the reason why we think that using other distributions would require quite a bit work to introduce. Add Security based stuff as well (moritz is focused on Debian right now).

Action Items:

[AK] I 'll post meeting notes and summarize the reasoning behind the learnings under https://wikitech.wikimedia.org/wiki/Kubernetes/Images