Talk:Platform Engineering Team/Event Platform Value Stream/Use case: Event Platform SDLC practices

About this board

GLederrey (WMF) (talkcontribs)

It seems that we want to allow dependencies on SNAPSHOT artifacts. This seems slightly problematic to me, as by definition SNAPSHOT don't identify a clear state of the code base. I suspect that the idea is to speed up integration by not requiring a release before cross project integration. A bit more discussion of the needs / problem we are trying to solve, and the drawbacks of the different approaches would be welcomed!

GModena (WMF) (talkcontribs)

> I suspect that the idea is to speed up integration by not requiring a release before cross project integration.

To a degree. There are cases now when we have WIP code (e.g. a flink pipeline) depending on third party WIP (e.g. upstream changes in eventutils). The integration strategy (to the best of my knowledge) is to build and integrate manually (on local checkout/builds).

That gets a bit annoying when ultimately we need to test both changes in a remote environment (e.g. a YARN cluster). The need to track changes manually (checkout and build upstream, manually integrate & deploy) adds some overhead and lengthens the feedback loop.


IMHO there are instances where not having a clean state (SNAPSHOT) might be ok, *if* we start with the assumption that things are expected to break. E.g. working in a development environment.


I worked in teams where we would automatically publish ephemeral artifacts on branch, and allow cross project development deps. This assumed:

  1. a degree of env separation (no integration testing in prod).
  2. a Gitflow-ish set of guidelines to enforce conventions (not endorsing). Both don't apply to us though, and I can see things becoming messy (SNAPSHOT state spilling to prod).
GModena (WMF) (talkcontribs)

Does the above make sense? How would you model this scenario?

Reply to "Releases vs Snapshots"

Additional documentation on a Maven / Java CI workflow

2
GLederrey (WMF) (talkcontribs)
GModena (WMF) (talkcontribs)

This is great and I would be keen in adopting it as baseline doc for JVM projects. Any chance we could move it to a wiki page?

Reply to "Additional documentation on a Maven / Java CI workflow"

Dependency arrows direction in Figure 2

2
GLederrey (WMF) (talkcontribs)

Nit: I tend to view arrows in a dependency diagram as meaning <<depends on>>. The figure 2 seems to use them with a meaning of <<is used by>>. This is slightly confusing to me.

GModena (WMF) (talkcontribs)

Ack.

Reply to "Dependency arrows direction in Figure 2"

SonarCloud integration

2
GLederrey (WMF) (talkcontribs)

SonarCloud has be integrated with most (if not all) of our projects. I think keeping this integration on a new platform should be part of the requirements.

GModena (WMF) (talkcontribs)

Agree. We'll add SonarCloud as a requirement.

Reply to "SonarCloud integration"
DDuvall (WMF) (talkcontribs)

On the topic of existing workflows in our Gerrit/Zuul/Jenkins based system, you might want to have a look at the documentation for PipelineLib. PipelineLib depends on Jenkins so we won't be using it in its current state from GitLab, but we do plan to provide a GitLab CI library that will provide the same functions (albeit using native GitLab CI configuration) as well as tooling for easy migration.

GModena (WMF) (talkcontribs)

Thanks for the pointer @DDuvall (WMF).

I've read through https://wikitech.wikimedia.org/wiki/PipelineLib/Concepts, and I wanted to understand the following points a bit better:

  1. What is the time horizon for a Gitlab-compatible PipelineLib?
  2. Will it support deployments to environments other than production k8s (e.g. yarn, DSE cluster)?
  3. The doc says We only support microservices. What's the definition of microservice in this context? Our value stream will develop artifacts that are not microservices (e.g. data pipelines, flink jobs), but that will adhere to deployment-charts norms for k8s targets. Would these fit PipelineLib use cases?

In general, how would you advise we proceed with new Gitlab projects? Is it safe to start building workflows around current Gitlab CI capabilities (e.g. by piggybacking on efforts like https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/tree/main/gitlab_ci_templates).

Reply to "Workflows"
TCipriani (WMF) (talkcontribs)

Deployments should be automated and predicated upon code review.

vs.

We lack well defined development, staging and production environments.

This makes it sound like you'll need to deploy not just to standard wiki-production, but also to other environments, is that right?

What does manual deployment to those other environments look like at the moment?

For services destined for wiki-production we tend to use helm deploy—is that still what you're targeting? Or is something more complex needed?

Also, if you're deploying into other environments, are you still deploying artifacts from the Wikimedia docker registry?

GModena (WMF) (talkcontribs)

This makes it sound like you'll need to deploy not just to standard wiki-production, but also to other environments, is that right?

That's correct. Right now we need to target systems other than wiki-production, for example Hadoop. You can find an example of manual deployment here: https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-stream-enrichment#deploy-on-yarn. I'm aware that other teams have bespoke deployment jobs that target that system (see reference to Airflow in the page), but to the best of my knowledge there is no off-the-shelf reusable solution.

The reason we target Hadoop is because of a mix of development and integration testing needs, that require access to data stored in Kafka and HDFS. We are not married to this platform though, and would be open to follow guidelines on alternative systems. Actually, having disjoint platforms (k8s vs yarn) for production and devel/test is a pain point we identified.


For services destined for wiki-production we tend to use helm deploy—is that still what you're targeting? Or is something more complex needed? Also, if you're deploying into other environments, are you still deploying artifacts from the Wikimedia docker registry?


For services destined to wiki-production we'll follow deployment-charts guidelines. We will be partnering with Search and SRE to streamline how we deploy shared stacks (e.g. apache flink). We are relying on images from the WIkimedia docker registry, and on a mix of Gitlab and Archiva for publishing jars and wheels.

GModena (WMF) (talkcontribs)
Reply to "Deployment questions"
There are no older topics