Platform Engineering Team/Event Platform Value Stream/Use case: Event Platform SDLC practices

This page is a RFC.

Use case: Event Platform SDLC practices
Author: Gabriele Modena <[mailto:gmodena@wikimedia.org gmodena@wikimedia.org]>, 2022-09-05

Internal draft (read-only): https://docs.google.com/document/d/1okEcCs2qiWJOtB8iHldX1mkoesgzB3dZK5xM9jEfIgE/edit#

In Event Platform we need to establish software development life cycle (SDLC) processes to orchestrate the build and releases of our artifacts (applications, libraries and services).

We are in the process of taking ownership of a number of codebases hosted on Gerrit and Gitlab. As part of our Team Charter we are reviewing the maturity and support level of CI/CD capabilities and processes currently in place.

Desired workflow and current state
A push (or merge) to a git remote should trigger a CI pipeline. Upon successful builds, an artifact should be generated with a version / tag that indicates its status (e.g. RELEASE vs SNAPSHOT). Deployments should be automated and predicated upon code review. Eventually, software artifacts (at different stages of the release cycle) should be published to a repository (e.g. Gitlab maven, docker registries).

Repositories
Currently our codebases are stored in Gerrit (existing projects) and Gitlab (newly created projects). We lack process uniformity for contribution, code review practices and deployment processes. Some teams use a +2 model for approval in Gerrit. We should evaluate how this scales with our team norms and establish practices for Gitlab repositories. Other models might be a better fit (e.g. a MR must receive two distinct +1).

Continuous Integration
We have established CI practices that rely on internal docker images and build steps. Integration testing, however, still requires developers to either run software locally or manually deploy artifacts to remote environments. ''Firgure 2. Cross project dependency. Service B might depend on different versions of Library A at different stages of its lifecycle.''Lack of automation for artifact publishing might break cross project dependencies. It also impacts development velocity, as well as code review effectiveness. It’s essential to improve the feedback loop, by shortening the time between code being committed to it being available to test (or, in case of a library, as an artifact).

Deployments
We lack well defined development, staging and production environments. Different components might target different technology stacks, with different levels of support and process maturity. This is a particular pain point for streaming applications built atop Apache Flink  [1 ].

Next steps
In order to improve work efficiency and interfacing with other teams (e.g. by means of SLOs) our Team Charter should provide well documented SDLC practices. As next step we should document Gerrit processes, and compare them with other workflows for Gitlab codebases [2][5][6][7]. Questions we seek to answer:


 * How do we carry out deployments?
 * How do we build the capability of running multiple versions of a service/application/library at different stages of its lifecycle (development, staging, production)?
 * What does our contribution model look like?
 * Do we provide standard development environments (tooling, automation)?
 * What is our code review process? We should have a CONTRIBUTION.md template for our repos.
 * What is the versioning and release process for software artifacts?
 * What is the versioning and release process for services?

Appendix
Components that are owned / might be owned byEvent Platform. WIP.