Talk:Platform Engineering Team/Event Platform Value Stream/Use case: Event Platform SDLC practices

About this board

Releases vs Snapshots

3 comments • 12:45, 17 November 2022 1 year ago

3

It seems that we want to allow dependencies on SNAPSHOT artifacts. This seems slightly problematic to me, as by definition SNAPSHOT don't identify a clear state of the code base. I suspect that the idea is to speed up integration by not requiring a release before cross project integration. A bit more discussion of the needs / problem we are trying to solve, and the drawbacks of the different approaches would be welcomed!

Reply 13:25, 15 November 2022 1 year ago

GModena (WMF) (talkcontribs)

> I suspect that the idea is to speed up integration by not requiring a release before cross project integration.

To a degree. There are cases now when we have WIP code (e.g. a flink pipeline) depending on third party WIP (e.g. upstream changes in eventutils). The integration strategy (to the best of my knowledge) is to build and integrate manually (on local checkout/builds).

That gets a bit annoying when ultimately we need to test both changes in a remote environment (e.g. a YARN cluster). The need to track changes manually (checkout and build upstream, manually integrate & deploy) adds some overhead and lengthens the feedback loop.

IMHO there are instances where not having a clean state (SNAPSHOT) might be ok, *if* we start with the assumption that things are expected to break. E.g. working in a development environment.

I worked in teams where we would automatically publish ephemeral artifacts on branch, and allow cross project development deps. This assumed:

a degree of env separation (no integration testing in prod).
a Gitflow-ish set of guidelines to enforce conventions (not endorsing). Both don't apply to us though, and I can see things becoming messy (SNAPSHOT state spilling to prod).

Reply 12:45, 17 November 2022 1 year ago

GModena (WMF) (talkcontribs)

Does the above make sense? How would you model this scenario?

Reply 12:45, 17 November 2022 1 year ago

Reply to "Releases vs Snapshots"

Additional documentation on a Maven / Java CI workflow

2 comments • 12:28, 17 November 2022 1 year ago

2

GLederrey (WMF) (talkcontribs)

The Search Platform team did some work previously to standardize our builds on gerrit / jenkins. As far as I know, all of the Java projects built on jenkins are following the same process and JJB templates. Some documentation is available at https://docs.google.com/document/d/1g7opvXdWDWgOCnbMX9eYIVCSXYDiMpScy072LN1-AhI/edit#heading=h.duyfr16ybaqc

Reply 13:17, 15 November 2022 1 year ago

GModena (WMF) (talkcontribs)

This is great and I would be keen in adopting it as baseline doc for JVM projects. Any chance we could move it to a wiki page?

Reply 12:28, 17 November 2022 1 year ago

Reply to "Additional documentation on a Maven / Java CI workflow"

Dependency arrows direction in Figure 2

2 comments • 12:26, 17 November 2022 1 year ago

2

GLederrey (WMF) (talkcontribs)

Nit: I tend to view arrows in a dependency diagram as meaning <<depends on>>. The figure 2 seems to use them with a meaning of <<is used by>>. This is slightly confusing to me.

Reply 13:19, 15 November 2022 1 year ago

GModena (WMF) (talkcontribs)

Ack.

Reply 12:26, 17 November 2022 1 year ago

Reply to "Dependency arrows direction in Figure 2"

SonarCloud integration

2 comments • 12:25, 17 November 2022 1 year ago

2

GLederrey (WMF) (talkcontribs)

SonarCloud has be integrated with most (if not all) of our projects. I think keeping this integration on a new platform should be part of the requirements.

Reply 13:27, 15 November 2022 1 year ago

GModena (WMF) (talkcontribs)

Agree. We'll add SonarCloud as a requirement.

Reply 12:25, 17 November 2022 1 year ago

Reply to "SonarCloud integration"

Workflows

2 comments • 08:33, 23 September 2022 1 year ago

2

DDuvall (WMF) (talkcontribs)

On the topic of existing workflows in our Gerrit/Zuul/Jenkins based system, you might want to have a look at the documentation for PipelineLib. PipelineLib depends on Jenkins so we won't be using it in its current state from GitLab, but we do plan to provide a GitLab CI library that will provide the same functions (albeit using native GitLab CI configuration) as well as tooling for easy migration.

Reply 17:44, 20 September 2022 1 year ago

GModena (WMF) (talkcontribs)

Thanks for the pointer @DDuvall (WMF).

I've read through https://wikitech.wikimedia.org/wiki/PipelineLib/Concepts, and I wanted to understand the following points a bit better:

What is the time horizon for a Gitlab-compatible PipelineLib?
Will it support deployments to environments other than production k8s (e.g. yarn, DSE cluster)?
The doc says We only support microservices. What's the definition of microservice in this context? Our value stream will develop artifacts that are not microservices (e.g. data pipelines, flink jobs), but that will adhere to deployment-charts norms for k8s targets. Would these fit PipelineLib use cases?

In general, how would you advise we proceed with new Gitlab projects? Is it safe to start building workflows around current Gitlab CI capabilities (e.g. by piggybacking on efforts like https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils/-/tree/main/gitlab_ci_templates).

Reply Edited 08:33, 23 September 2022 1 year ago

Reply to "Workflows"

Deployment questions

3 comments • 09:06, 19 September 2022 1 year ago

3

TCipriani (WMF) (talkcontribs)

Deployments should be automated and predicated upon code review.

vs.

We lack well defined development, staging and production environments.

This makes it sound like you'll need to deploy not just to standard wiki-production, but also to other environments, is that right?

What does manual deployment to those other environments look like at the moment?

For services destined for wiki-production we tend to use helm deploy—is that still what you're targeting? Or is something more complex needed?

Also, if you're deploying into other environments, are you still deploying artifacts from the Wikimedia docker registry?

Reply Edited 23:03, 15 September 2022 1 year ago

GModena (WMF) (talkcontribs)

This makes it sound like you'll need to deploy not just to standard wiki-production, but also to other environments, is that right?

That's correct. Right now we need to target systems other than wiki-production, for example Hadoop. You can find an example of manual deployment here: https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-stream-enrichment#deploy-on-yarn. I'm aware that other teams have bespoke deployment jobs that target that system (see reference to Airflow in the page), but to the best of my knowledge there is no off-the-shelf reusable solution.

The reason we target Hadoop is because of a mix of development and integration testing needs, that require access to data stored in Kafka and HDFS. We are not married to this platform though, and would be open to follow guidelines on alternative systems. Actually, having disjoint platforms (k8s vs yarn) for production and devel/test is a pain point we identified.

For services destined for wiki-production we tend to use helm deploy—is that still what you're targeting? Or is something more complex needed? Also, if you're deploying into other environments, are you still deploying artifacts from the Wikimedia docker registry?

For services destined to wiki-production we'll follow deployment-charts guidelines. We will be partnering with Search and SRE to streamline how we deploy shared stacks (e.g. apache flink). We are relying on images from the WIkimedia docker registry, and on a mix of Gitlab and Archiva for publishing jars and wheels.

Reply Edited 09:06, 19 September 2022 1 year ago

GModena (WMF) (talkcontribs)

Re environments: I'd like to think of dev / test / staging / prod in terms on needs and let you and SRE guide us on which resource pool better fits our needs (or help us build one).

Two pages that go a bit more in detail in the flink / hadoop side of things are:

Platform Engineering Team/Event Platform Value Stream/Use case: compute needs for streaming pipelines

Platform Engineering Team/Event Platform Value Stream/Assess what is required for the enrichment pipeline to run on k8s

Reply 09:01, 19 September 2022 1 year ago

Reply to "Deployment questions"

There are no older topics