User:LarsWirzenius/NewCI

= META =

This is v3, being written, work in progress. Feedback to wikitech-l or lwirzenius@wikimedia.org. Please don't edit this wiki page directly.

TODO
Things that need changing in this page:


 * fix/write scenarios for the various use cases
 * write missing chapters (marked FIXME)
 * review/fix the security embargo use case
 * copyedit

Changes from v2

 * Added open questions (see below), based on comments to previous versions.
 * Clarified that “CI implementation” and “Transitioning to new CI system” are waiting for RelEng to experiment.
 * Clarified what production-like environments are. Combined the two sections disucssing environments into one.
 * Added the ArtifactStorageManagement, BuildRecursively, and DebianPackages requirements.
 * Wrote proposal for handling embargoed security fixes.
 * Added suggestion that credentials-holding components protect against Rowhammer, Spectre, etc.
 * Added Cucumber/Gherkin-style scenarios for some use cases, outlined some more.
 * Various minor improvements to spelling, grammar, style.
 * Converted from Google Docs to a wiki page for better maintainability and software freedom.

Changes from v1
This is a summary of the changes compared to the previous version circulated for comment. It seemed easier to make a new version than edit the previous one while people were reading and commenting on it.


 * Added section on what CI/CDel/CDep are
 * Added SELFSERVE/MIGRATION requirement.
 * Renamed POSTMERGETESTS to REPOGROUPS and reworded it explicitly consider concurrent changes to multiple repositories.
 * Added MAINTAINED/KNOWNLANG requirement.
 * Combined GERRIT and OTHERGITORTICKETING under new WORKFLOWTOOLING requirement. Renamed OTHERGITORTICKETING to PHABRICATOR.
 * Added AUTOMATEDEPLOYMENT/STOPCHANGES.
 * Added Scala to PROGLANGS.
 * Reworded CACEDEPS requirement to specify some package repositories, and not get into details.
 * Dropped HA requirement, as the description doesn't really seem useful.
 * Added TESTINSTANCES wishlist.
 * Raises EMBARGO up from wishlist.
 * Added an optional code health stage.
 * Added section about projects that don't target WMF production.
 * Added section on not being production-like.
 * Added a section clarifying environments.
 * Many minor tweaks and improvements.
 * Added some claification about security reviews still being needed.
 * Change requirement ids to be CamelCase instead of ALLUPPERCASE.

Open questions

 * How should the Understandable requirement be tested? Specifically, how do we measure that our developers can use CI productively?
 * How can we protect the dependency cache (CacheDeps requirement) against poisoning with vulnerable versions?
 * How can we do security patches (Embargo requirement) securely, without breaking embargo, in an automated fashion?
 * If we can’t convert all deployments to be to Kubernetes containers, do we keep manual deployments?
 * Automated deployments need automated test suites of sufficient quality that they give us confidence to deploy if they pass. Alternatively, we could have automated rollbacks at the first sign of trouble. What’s the best approach for MediaWiki and its extensions?

= Introduction =

The CI WG plans replacement of its current WMF CI system with one of Argo, GitLab CI, Zuul v3. These were selected in the first phase of the CI WG.

We aim to do “continuous deployment”, not only “continuous integration” or “continuous delivery”. The goal is to deploy changes to production as often and as quickly as possible, without compromising on the safety and security of the production environment.

This document goes into more detail of how the new CI system should work, without (yet) discussing which replacement is chosen. A meta architecture if you wish.

It is assumed as of the writing of this document that future CI will build on and deploy to containers orchestrated by Kubernetes.

An important change is that we aim to change things so that as much as possible, all software deployments are made to containers orchestrated by Kubernetes, in the long run. There will by necessity be a long transition period when other kinds of deployments will continue to be necessary. Also, some software only needs to be built and published, rather than deployed, such as Debian packages and Docker base images.

On continuous integration, delivery, and deployment
Somewhat simplistically, continuous integration (CI) is the practise of integrating changes from all developers in a project to the main line of development frequently. Crudely, changes get merged at least once a day, if the software still builds, and that its tests pass. The benefit of the practise is that there is rarely a lot of effort that needs to be spent on merging, because changes are small, and there are rarely merge conflicts. If something breaks, debugging is usually easy, again because the change is small. In WMF, we use Gerrit to make sure changes are not even merged unless the software builds and tests pass. Further, it is less likely that different parts of the development community spend time on features or other changes that will conflict. This encourages developers to communicate about their intentions and collaborate on significant changes before committing effort on something that’s going to require a different approach to work with other changes that are happening at the same time.

Continuous delivery builds on top of CI to deliver the software after every change. This might, for example, mean publishing release tarballs after every successful integration. Effectively, the software is released after every successful change. The benefit of this is that those who want can easily use the very latest of the software, giving fast feedback to the developers about the direction of the software.

Continuous deployment builds on top of CI to also deploy the software to production. This gives some of the same benefits as continuous delivery, but is more suited for web applications and web sites.

Vision for CI
This is Lars’s personal opinion, for now, but it’s based on discussions with various people while at WMF. It’s not expected to be new, radical, or controversial, compared to the status quo.

In the future, CI at WMF serves WMF, its developers, and the Wikipedia movement by making software development more productive, more confident, and faster. The cycle time of changes (the time from idea to running in production) is short: for a trivial change, as little as five minutes. At the same time, the safety and security of production is protected: malicious changes do not get deployed, mistakes are rare, and can easily be fixed or the problematic change reverted.

A major change here is to start doing continuous deployment instead of a weekly train.

Production here means all the software needed to run all the sites (Wikipedias in different languages, Commons, etc), as well as supporting services, including tooling and services that supports development.

Overall solution approach
The overall approach to the architecture of the CI system, and the workflow supported by it, is to make all changes via version control (git), which includes code, configuration, and scripts for building and deploying. When a change is pushed to version control, CI builds and tests the change, humans review the change, and if all seems to be in order, CI deploys to production.

It should be pointed out explicitly that continuous deployment relies on an automated test suite that the movement, and its developers, Release Engineering, and SRE are comfortable with. Everyone needs to trust that the tests catch problems with sufficient likelihood that if the tests pass, it is safe to deploy to production. This may take some time to achieve, but it will enable us to develop software with much greater productivity.

Regardless of how good the tests will be, we will need to have a good way to roll back any changes to a previous working version. A good test suite and rolling back will enable us to deploy boldly.

On environments
The new CI system will need to provide various environments, which are sufficiently production-like for testing or running the software. These environments will include all the components that are needed for simulating production, for the tests in question. The specific components depend on the test, and a mechanism for specifying them will be built detail during the CI implementation phase.

For example, an environment might include MediaWiki, MariaDB, and the microservices we have supporting wikis, all configured to work together, and independently from other environments. The various components might all run in the same container or VM, or in several, depending on what makes the most sense.

It will be possible to specify that a production-like environment has multiple wikis, to test things that require interaction between multiple wikis.

The environments for running acceptance and other integration tests will be implemented in a way that makes them indistinguishable from a newly built fresh environment.

The production-like environments will be created automatically. They will be sufficiently similar to production to allow tests to happen, in such a way that tests passing in a production-like test environment gives us reasonable confidence that the software will work in the real production environments as well.

Sometimes there’s a need to use different versions of software than in production in test environments. For example, when preparing a migration to a new version of PHP. Production will stay with the currently best version of PHP, but there’s a need for test environments with the next version of PHP so that tests and related debugging can be done using that. This will be supported.

On security and security reviews
Nothing in this document is meant to lessen how security reviews are handled. Just like with the current status quo, if a MediaWiki extension embeds (vendors) a dependency into its own source tree, the vendored version needs to be acceptable to those responsible for reviewing software running in production from a security perspective.

Stakeholders
Stakeholders in the WMF CI system include:


 * RelEng, who are responsible for keeping CI running
 * SRE, who are responsible for the infrastructure on which CI runs
 * MediaWiki developers, who develop MW and its extensions, and will (eventually) be doing MW releases for external MW deployers
 * staff and volunteer developers of anything else built, tested, and deployed by CI
 * WMF, who pays for CI using donations
 * The Wikimedia movement, who use sites and services operated by WMF
 * all of humanity, who benefit from having knowledge freely disseminated

= Requirements =

This chapter lists the requirements we have for the CI system and which we design the system to fulfill.

Each requirement is given a semi-mnemonic unique identifier, so it can be referred to easily.

The goal is to make requirements be as clear and atomic as possible, so that the implementation can be more easily evaluated against the requirement: it’s better to split a big, complicated requirement into smaller ones so they can be considered separately. Requirements can be hierarchical: The original requirement can be a parent to all its parts. Further, a way to objectively check if a requirement is met should be outlined.

Note that these requirements are not meant to constrain or dictate implementation. The requirement specifies what is needed, not how it is achieved.

These requirements were originally written up in the [https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Requirements WG wiki pages] and have been changed compared to that (as of the 21 March 2019 version): [https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Requirements Phase 1 report]. This document is now the maintained version of the requirements.

Scope of work
These are non-negotiable requirements that must all be fulfilled by our future CI system.


 * SelfHostable Must be hostable by the Foundation. It’s not acceptable for WMF to rely on an outside service for this, to achieve the security, reliability, and privacy required of Wikimedia.


 * FreeSoftware Must be free software / open source. “Open core” like GitLab can be good enough, as long as we only need the parts that provide software freedom. This is partly due to the SelfHostable requirement, but also because free software is a form of free knowledge, and it’s a WMF value is to prefer open source.


 * GitSupport Must support git. We’re not switching version control systems for CI.
 * SelfServe Must support self-serve CI, meaning we don’t block people if they want CI for a new repo. Due to ProtectProduction, there will probably be some human approval process for new projects, but as much as possible, people should be allowed to do their work without having to ask permission.


 * SelfServePipeline Should allow the developers to define or declare at least parts of the pipeline jobs in the repository: what commands to run for building, testing, etc.
 * Understandable Must be understandable without too much effort to our developers so that they can use CI/CD productively. Acceptance criteria: our developers can use CI productively, after given a short, 1-page tutorial on how to specify build instructions for their software.
 * Migration There needs to be a migration strategy to move existing repositories to the new CI.


 * RepoGroups All automated tests must pass both before and after merging a change, for each repository separately and all the repositories together that get deployed together to production. For example, tests for each extension must pass, but also for mediawiki with all the extensions that we deploy to production, and any supporting services. This can be achieved by various means (run tests twice, or make sure target branch doesn’t change in between). The justification is that there can be changes to another repository while tests for one repository runs, and those changes may break things for this repository.

Requirements
These are not absolute requirements, and can be negotiated, but only to a minor degree.


 * Rollback: If a change is deployed to production and turns out to cause problems, it is easy and quick to revert the change.
 * Fast Must be fast enough that it isn’t perceived as a bottleneck by developers. We will need a metric for this.


 * ShortCycleTime Must enable us to have a short cycle time (from idea to running in production). CI is not the only thing that affects this, but it is an important factor. We probably need a metric for this.


 * Transparent Must make its status and what-is-going-on visible so that its operation can be monitored and so that our developers can check the status of their builds themselves. Also the overall status of CI, for example, so they can see if their build is blocked by waiting on others.
 * Feedback Must provide feedback to the developers as early as possible for the various stages of a build, especially the early stages (“got the source from git”, “building”, “running unit tests”, etc.). The goal is to give feedback as soon as possible, especially in the case of the build failing.


 * Feedback2 Must support providing feedback via Gerrit, IRC, and email, at the very least. These are our current main feedback channels. We don’t want to replace these other tools at this time.


 * Secure Must be secure enough that we can open it to community developers to use without too much supervision.
 * ProtectProduction Must protect production by detecting problems before they’re deployed, or revert troublesome changes after deployment, and must in general support a sensible CI/CD pipeline. This is necessary both for the safety and security of our production systems, a higher speed of development, and higher productivity. The protection brings developer confidence, which tends to bring even more speed and productivity.
 * EnforceTests Must allow Release Engineering team to enforce tests on top of those specified by system consumers, to allow us to set minimal technical standards. The repository owner mustn’t be able to disable or skip the enforced tests.
 * MaxBuildtime Should have build timeouts so that a build may fail if it takes too long. Among other reasons, this is useful to automatically work around builds that get “stuck” indefinitely.
 * Isolation Builds must done in isolated environments or sandboxes. Jobs should not be able to affect the environment of other jobs.
 * Maintained Must be maintained and supported upstream, and we should expect that to be the case for at least the next five years. The CI system should not require substantial development from the Foundation. Some customization is expected to be necessary.
 * Relationship We should have a good relationship with the CI system’s upstream.
 * KnownLang It would be preferable for the CI system to be implemented in a language known by foundation staff to make customizations, debugging, fixing, etc, easier.
 * Metrics Must enable us to instrument it to get metrics for CI use and effectiveness as we need. Things like cycle times, build times, build failures, etc. FIXME: Start collecting a list of metrics we must have, want to have, or would like to have.
 * WorkflowTooling CI system must not tie us to specific tooling for workflow. We must have the freedom to change tooling in the future without it being a major project.
 * Gerrit Must work with Gerrit as that’s what we have now for change review and will continue to use, unless we want to change at a later date.
 * Phabricator Must work with Phabricator for ticketing, as that’s what we have and will continue to use, unless we want to change at a later date.
 * Promotion Must promote (copy) Docker images and other build artifacts from “testing” to “staging” to “production”, rather than rebuilding them, since rebuilding takes time and can fail or produce a different result. Once a binary, Docker image, or other build artifact has been built, exactly that artifact should be tested, and eventually deployed to production.
 * LocalTests Must allow a developer to replicate locally the tests that CI runs. This is necessary to allow lower friction in development, as well as to aid debugging. For example, if CI builds and tests using Docker container, a developer should be able to download the same image and run the tests locally.
 * AutomatedDeployment Must allow deployment to be fully automated.
 * StopChanges The developers must have a way to stop changes from being merged into a repository, and therefore from being deployed. This is needed, for example, if there’s a problem in the way deployments work, and new changes should wait until the deployment is fixed.


 * AutomatedSelfDeployment The CI system itself must be automatically deployable by us or SRE, onto a fresh server.
 * DeployCIBuilt Only build artifacts built and tested by CI may be deployed to production by CI itself.


 * Scalable Must be scalable to the projects and changes we work with. This includes number of projects, number of changes, and kind of changes.
 * HScalable Must be horizontally scalable: we need to be able to add more hardware easily to get more capacity. This is particularly important for build workers, which are the mostly likely bottleneck. Also, probably environments used for testing.
 * ProgLangs Must be able to support all programming languages we currently support or are likely to support in the future. These include, at least, shell, Python, Ruby, JavaScript, Java, Scala, PHP, Puppet and Go. Some languages may be needed in several versions.
 * OutputLinks Must support HTTP linking to build results for easier reference and discussion. This way a build log, or a build artifact, can be referenced using a simple HTTP (or HTTPS) link in discussions.
 * ArtifactArchive Should allow archiving build logs, executables, Docker images, and other build artifacts for a long period.
 * ArtifactStorageManagement The artifact store should delete old artifacts in order avoid filling all storage.


 * Retention The retention period should be configurable based on artifact type, and whether the build ended up being deployed to production.


 * ConfigVC Must keep configuration in version control. This is needed so that we can track changes over time.
 * Gating Must support gating / pre-merge testing. FIXME: This needs to be explained.
 * PeriodicBuilds Must support periodic / scheduled testing. This is needed so that we can test that changes to the environment haven’t broken anything. An example would be changes to Debian, upon which we base our container images.
 * CIMerges Must support tooling to do the merging, instead of developers. We don’t want developers merging by hand and pushing the merges. CI should test changes and merge only if the tests pass, so that the branches for main lines of development are always releasable.
 * TestVC Must support storing tests in version control. This is probably best achieved by having tests be stored in the same git repository where the code is.
 * BuildDeps The repository must declare its build dependencies, and CI must make sure they are provided in the build environment.
 * BuildRecursively After a repository (or project) has been built successfully, build and test anything that depends on it. For example, if MediaWiki core uses a library to generate cat videos randomly, build MW after the library has built successfully.
 * TestServices Must support services for tests — i.e., some PHPUnit tests require MySQL. These are most important for integration tests. Proper unit tests do not depend on any external stuff. However, integration tests may well need MediaWiki, some specific extensions, and backing services, such as databases, “oid” services, and possibly more. CI needs to be able to provide such environments for testing as defined by developers.
 * CacheDeps Must support dependency caching of popular package repositories: Debian, npm, pypi, and probably more. This is needed for speed, if nothing else. However, this needs to be done securely.
 * GoodOps: The CI system needs to be maintained in a proper way, with monitoring, multiple people who can get notified of problems and can fix things.


 * Monitoring: The CI system needs to have monitoring to automatically alert of problems.
 * OnCall: There must be multiple, named people who can be alerted to deal with problems in CI that users can’t deal with themselves. These people need to cover enough time zones to be available at almost all times, and have enough overlap to cover for each other when they’re not available due to illness, vacation, travel, or other such reasons.
 * SLA: RelEng provides a service-level agreement, for how well CI serves WMF and the movement.


 * CleanState Builds and tests must run in a clean, well-defined environment.
 * CleanBuildEnv Must provide a build environment with only explicitly declared build dependencies installed.
 * CleanWorkspace Must provide a clean workspace for each build or test run - either a clean VM or container. The workspace should have the source code of the project to be built, with the right commit checked out from version control, and anything else explicitly declared, but nothing else.
 * Secrets Must support secure storage of credentials / secrets.
 * AllTests When merging a change, all tests for all components must pass. For example, when changing one MediaWiki extension, which is depended on by another extension, the change should be rejected if either extension’s tests fail.
 * Embargo Would be nice to be able to run for secret/security patches. This means CI should be able to build and deploy changes that can’t be made public yet, for security embargo reasons.
 * DebianPackages Support building and publishing Debian packages (.deb) for software.

Wishlist items
These requirements are easily negotiated and can be left unfulfilled if hard to achieve.


 * LiveLog Should have live console output of build so it can be viewed while the build is running, without having to wait until the build is finished.
 * RateLimit Should have rate limiting - one user/project can not take over most/all resources.
 * CheckSig Should support validation and creation of GPG/PGP-signed git commits as well as artifacts and generated containers.


 * LimitBoilerplate Would be nice for test abstractions to limit boiler-plate, i.e., all of our services are tested roughly the same way without having to copy instructions to every repository.
 * PrioritizeJobs Would be nice to prioritize jobs.


 * Use case: if there is a queue of jobs, there should be some mechanism of jumping that queue for jobs that have a higher priority.
 * We currently have a Gating queue that is a higher priority than periodic jobs that calculate Code Coverage.
 * We could run jobs that have historically been fast in a separate queue.


 * PostMergeBisect Would be nice to post-merge git-bisect to find patch that caused a particular problem with, say, a Selenium test.
 * DeployWherever Would be nice to have a mechanism for deployment to staging, production, toollabs. We could do with a way to deploy to any of several possible environments, for various use cases, such as bug reproduction, manual exploratory testing, capacity testing, and production.
 * TestInstances Should provide test instances for proposedt changes. If a developer pushes change 12765 to Gerrit, CI should build something like https://12765.test.wikimedia.org with the change running a sufficiently production-like environment that the change can be tested “for real” by testers and volunteering users. This might only happen for changed marked specially.
 * Mobile Would be nice to support building and testing mobile applications (at minimum for iOS and Android).

= The CI pipeline =

FIXME: insert pipeline image of the default pipeline

CI will provide a default pipeline for all projects. Projects may use that or specify another one. Each project will be able to pick the stages it wants in its pipeline, though this may be overridden by Release Engineering or SRE to protect the sites (ProtectProduction requirement).

The pipeline will be divided into several stages. Mandatory stages for all changes and all projects are the commit and acceptance test stages. Other stages may be added to specific changes or projects as needed. Most projects will need either a stage to publish releases or deploying to production.

The goal is that if the commit and acceptance test stages pass, the change is a candidate that can be deployed to production, unless the project is such that it needs (say) manual testing or other human decision for the production deployment decision. Likewise, if the component or the change is particularly security or performance sensitive, stages for checking those aspects may be required. CI will have ways of indicating the required changes per component, and also per change. (FIXME: It is unclear how this will be managed.)

If the commit or acceptance stage fails, there is not a production candidate. The pipeline as a whole fails. Any artifacts built by the pipeline will not be deployable to production, but they may be deployable to test environments, or downloaded by developers for inspection.

The commit stage
The commit stage builds any deployable artifacts, such as executable binaries, minimized JavaScript, translation files, or Docker images. It is important that artifacts don’t get rebuilt by later stages, because rebuilding does not always result in bitwise identical output, and any change in the artifacts may invalidate any testing that has been done. Instead the goal is to build once, test the artifacts, and deploy the tested artifacts, instead of rebuilding and maybe deploying something different than what was tested.

The commit stage also runs lint and unit tests, and any other tests that can be run in isolation from other parts of the system, and that are also quick. The commit stage does not have access to backing services, such as databases or other components of the overall system. For example, when the pipeline processes a change to a MediaWiki extension, the commit stage doesn’t have access to a running MediaWiki core or the MariaDB MediaWiki uses. Integration or system tests should be done in the acceptance test stage.

The commit stage also runs code health checks. (Unless they are too slow.)

The commands to build (compile) or run automated tests are stored in the repository, either explicitly, or by indicating the type of build needed. There might, for example, be a .pipeline/config.yaml file in the repository, which specifies that make is the command that builds the artifacts. Otherwise, the file may specify that it’s a Go project, and CI would know how to build a Go project. In this case RelEng can change the commands to build a Go project by changing CI only, without having to change each git repository with a Go program.

Only the declarative style will be possible for building Docker images (e.g., .pipeline/blubber.yaml), as we want control over how that is done (Secure requirement).

CI may enforce specific additional commands to run, to build or test further things; this can be used by RelEng to enforce certain things. For example, we may enforce code health checks, or enable (or disable) debug symbols in all builds. Such enforcement will be done in collaboration with our developers.

Any build dependencies needed during the commit stage must be specified explicitly. For example, the minimum required version of Go that should be installed in the build environment would be a build dependency. If a project build-depends on another project, it needs to specify which project, and which artifacts it needs installed from the other project. Explicit build-dependencies are more work, but result in fewer problems due to broken heuristics.

A code health stage
If the commit stage is too restrictive for running some code health tools, they can be run a separate stage instead. Or possibly moved to the acceptance test stage, or a separate stage that can run in parallel with other stages. We’ll figure it out during implementation when we see what’s needed.

An integration stage
We may want to optionally have an integration stage. This would be like the acceptance stage, but run tests that are more developer oriented than acceptance tests are. For example, some of browser-based (Selenium) tests may fit here. But if they fit well into a project’s acceptance stage, they can go there.

The acceptance test stage
During the acceptance test stage CI deploys artifacts built in the commit stage to an isolated production-like system that has the same versions of all software as production, except for the changes being processed by the pipeline. CI will then run automated acceptance tests, and other integration and system tests, against the deployed software. The test environment is clean and empty, and well-known, unless and until the test suite inserts data or makes changes.

The deployment stage
If prior stages have passed successfully, and manual code review (“Gerrit CR:+2 vote”) has approved the change, this stage deploys the change to production.

Manual tests
Testers may instruct CI to deploy any recently built set of artifacts to a dedicated test environment, and can use the software in that environment where it is isolated from others, and won’t suddenly change underneath them. The details of how this will be implemented are to be determined later.

This feature of the CI can also be used to demonstrate upcoming features that are not yet ready to be deployed to or enabled in production.

(“Recent” means any artifacts not yet purged from the artifact store. See ArtifactArchive requirement.)

Capacity tests, non-functional requirements
Capacity tests, and other tests for non-functional requirements, will also be done in dedicated, isolated production-like environments. RelEng will work with the performance team to sort out the details.

On repositories, components, projects not targeting WMF production
We can support projects, components, and repositories that don’t target WMF production. As an example, MediaWiki tarball releases. For such projects, we can have the commit stage build the tarball (or whatever is released for the project), and acceptance or other stages test that. We can even deploy the projects to test environments to test the code in a production-like environment. We can test things with many versions of PHP, many Linux distributions, and any other variant we want to support. However, most of this document aims at improving things in WMF production, so other things are given less attention, at least for now.

= CI Architecture =

The WMF development ecosystem
FIXME: Insert pic of the WMF develompent ecosystem

The figure above is simplistic, but gives a general idea of what happens when a developer has finished with a change:


 * 1) developer pushes a change to Gerrit, which triggers CI
 * 2) CI builds and tests change (commit stage)
 * 3) CI deploys to a test environment, runs tests against that (acceptance test stage); if everything is OK, Gerrit is notified and requests code reviews from relevant parties
 * 4) testers can request CI to deploy the change to an environment dedicated for manual testing
 * 5) after a successful code review, CI merges changes to the branch, runs all automated tests again, and deploys to the production environment

The commit and acceptance test stages are triggered as soon as a developer pushes changes to be reviewed. Human reviews won’t be requested automatically until the two stages pass, as there’s no point in spending human attention on things that are not going to be candidates for deployment to production. Reviews for changes marked work-in-progress won’t be requested automatically. Developers may request reviews of work-in-progress changes when they want.

Other stages may run in parallel with code review, and if they fail they may nullify the release candidacy of the change. For example, stages for manual and capacity testing, and security test/review; depending on the change and the component in question, some or all of these may be necessary.

CI internal components
FIXME: Insert pic of CI internal components

Note: this is speculative for now, until we actually implement at least one of the candidates.

The CI system consists, at least conceptually, of several internal components, depicted in the architecture diagram above. The actual implementation may end up using different components, but the conceptual roles will probably still be there.


 * CI interacts with Gerrit and production externally. Also test environments, but those ephemeral. Gerrit is code review and git server, and produces an event stream to which CI reacts.
 * The controller listens to the Gerrit event stream and commands the various workers to do things as appropriate. It is the conductor of the symphonic CI orchestra.
 * The VCS worker interacts with Gerrit in its role as a git server. The VCS worker is the only CI component which has credentials for accessing Gerrit. The VCS worker retrieves source code via git, does merge and other git operations, pushes changes back to Gerrit (when it’s OK to do that), and publishes snapshots of the working tree to the artifact store. Build workers get their source code from the artifact store, not from Gerrit directly. This avoids having to give build workers credentials, or network access, to gerrit, and having to push speculative changes to Gerrit.
 * The artifact store stores blobs of things that CI needs or produces: git working trees, built artifacts, etc.
 * The build worker runs builds and tests. It gets the source code from the artifact store and publishes built artifact back to the artifact store. There can be any number of build workers running in VMs or containers. The build worker sets up (or installs) build dependencies in the build environment, before it starts the actual build, based on information provided by the repository being built. Depending on the way the build is configured, the actual build may prevent network access.
 * The deployment worker deploys built artifacts to an environment, such as a test environment or the production environment.
 * The log store stores build logs, what Jenkins calls “console output”: the stdout and stderr of the command run to do a build, and to run tests. Also the commands to deploy, and to interact with Gerrit.
 * A test environment is an isolated environment that is sufficiently like production to running tests. The test may be automated or manual. There can be any number of test environment, and they might be ephemeral, if that turns out to be useful.
 * The production environment is what the world sees. It’s the actual Wikipedia in all the various languages, and all the other sites we run.

The components that have security sensitive credentials (VCS worker, deployment worker) should probably run on separate physical hardware from the rest, to avoid exposing the credentials to Rowhammer, Spectre, and other CPU security issues. Alternatively, the software implementing the components might store the credential encrypted in RAM with a random key, and only decrypt them while the credential is used.

= CI use cases =

This chapter contains some scenarios that are intended as (sketches for) acceptance tests for the CI system. To achieve the necessary precision for unambiguity and clarity, they describe what is supposed to happen using a format inspired by Cucumber/Gherkin-style tests.

Some services are simple (one service, running one application, which comes from one git repository). Others are complex (one service, running code from multiple repositories; one service, running two applications as determined by config; three services which interact with each other; …). We need to consider each case.

One change to a simple service
This section describes what is meant to be the simplest possible case for CI/CD: a change to a project, which runs a single application (and supporting operating system), built from a single git repository. An example might be the Blubberoid service built out of the blubber.git repository.

It all starts with a change being pushed to Gerrit, which triggers a build and test of the change. This happens in a &quot;pre-merge&quot; pipeline, which consists of the commit and acceptance test stages. For simplicity, this scenario assumes there are no other changes happening at the same time.

given some software in hello.git, with branch master resolving to commit REF0 and a service foo consisting of hello.git and a configuration that builds and deploys a new version of foo from hello.git when a developer pushes a change to hello.git, giving commit REF and change CHANGE then Gerrit triggers the CI pre-merge pipeline for service foo for change CHANGE and the commit stage finishes successfully for CHANGE and a Docker image IMG is built for service foo from CHANGE and the Docker image IMG contains software from hello.git, commit REF and the Docker image IMG is deployed to a test environment ENV and the acceptance test stage finishes successfully against ENV and the Docker image IMG is NOT deployed to production and the pre-merge pipeline for CHANGE finishes successfully and the change is marked Verified+2 on Gerrit and a code reviewer is notified of the change CHANGE

At this point, the change passes all the automated checks and a human code reviewer needs to approve the change by voting Code-Review+2 in Gerrit.

when a code reviewer votes CR+2 in Gerrit for the change CHANGE then the commit and acceptance and Docker image build stages get re-run then the commit REF is merged into hello.git master and CI runs the post-merge pipeline for CHANGE and the post-merge pipeline for CHANGE finishes successfully and only the deployment stage to production was run and the Docker image IMG was deployed to production and production runs version REF of hello in the foo service

Two changes to a simple service
NOTE: This section needs updating.

This is otherwise similar to the simplest change scenario, but two developers push changes at almost the same time. Note that this is still one service, with software from one repository. The goal here is that both changes are built and tested individually. Both get merged after CR+2 voting, but the build+test stages get re-run. If those stages fail, the change fails, and it is up to the developer to fix the problem and push a new change revision (patch set) to Gerrit.

given some software in hello.git, with branch master resolving to commit REF0 and a service foo consisting of hello.git when one developer pushes a change to hello.git, giving commit REF1 and change CHANGE1 and another developer pushes a change to hello.git, giving commit REF2 and change CHANGE2 then Gerrit triggers the CI pre-merge pipeline for service foo with CHANGE1 and Gerrit triggers the CI pre-merge pipeline for service foo with CHANGE2

The two pipeline runs are run to completion sequentially. They are independent of each other.

then the commit stage finishes successfully for CHANGE1 and the commit stage finishes successfully for CHANGE2 and a Docker image IMG1 is built for service foo from CHANGE1 and a Docker image IMG2 is built for service foo from CHANGE2 and the Docker image IMG1 contains software from hello.git commit REF1 and the Docker image IMG2 contains software from hello.git commit REF2 and the Docker image IMG1 is deployed to test environment ENV1 and the Docker image IMG2 is deployed to test environment ENV2 and the acceptance test stage finishes successfully against ENV1 and the acceptance test stage finishes successfully against ENV2 and the Docker image IMG1 is NOT deployed to production and the Docker image IMG2 is NOT deployed to production and the pre-merge pipeline for CHANGE1 finishes successfully and the pre-merge pipeline for CHANGE2 finishes successfully and code reviewers are notified of change CHANGE1 and code reviewers are notified of change CHANGE2

Code reviewers have been notified of both changes. At this point we have two changes that both pass tests. They might conflict, but this will be noticed after they get re-built and re-tested after a merge.

when a code reviewer votes CR+2 in Gerrit for change CHANGE1 then the commit REF1 is merged into hello.git master and CI re-runs the build and test stages for CHANGE1 and CI runs the post-merge pipeline for CHANGE1 and the post-merge pipeline for CHANGE1 finishes successfully and only the deployment stage to production was run and the Docker image IMG1 was deployed to production and production runs version REF1 of hello in the foo service

The second change can now be reviewed at leisure.

when a code reviewer votes CR+2 in Gerrit for change CHANGE2 then the commit REF2b is merged into hello.git master and CI runs the post-merge pipeline for CHANGE2 and the post-merge pipeline for CHANGE2 finishes successfully and only the deployment stage to production was run and the Docker image IMG2b was deployed to production and production runs version REF2b of hello in the foo service

At this point, both changes have been tested and reviewed and deployed.

Note that changes are not tested together, the way Zuul does. This may mean some changes will have to be built and tested up to two times (once before code review, once after CR+2), but this can also happen in grouped changes in Zuul, if they don't all work together. Further, this approach seems simpler and easier to understand than the Zuul grouping one.

Conflicting changes to simple service that conflict
Simple service. This needs to check that a CR+2'd change that causes a merge conflict during rebase will cause a change to be undeployable.

Change to one component of a complex service
Complex here means that the source code for a service comes from multiple git repositories. Acceptance criteria is that a change to either repository of a service that consists of two repositories causes a build, test, and deployment.

Concurrent changes to two components of a complex service
This should be the same as for concurrent changes to a simple service, except it needs to handle the case of refs changing in several repos at the same time.

Conflicting changes to a complex service
Similar to conflicting changes to simple service. For changes to conflict syntactically, they need to be to the same repository.

Two changes, second depends on first
Simple or complex service. Two changes, C1 and C2, such that C2 requires C1, but doesn't include it. C2 can't go through the pipeline until C1 is merged.

Two changes, which depend on each other
Simple or complex service. Possibly multiple services. Two changes, C1 and C2, such that they must either both be merged (and deployed) at the same time, or neither.

I'm not sure we should support this. It complicates things, and seems to me like bad software development practice.

Security embargoed change
NOTE: There is yet no good solution described here. Further thought is needed.

A general goal of the future CI system is to make all changes go via git, so that they can be tracked. At the same time, we must not break an embargo on security fixes. The embargo means the fixes probably need to be in private git repositories, unless we can in the future have secure private branches, which Gerrit currently doesn't provide in a way that we trust. The embargo also means we can't publish a Docker image with the fixes applied, since the PHP, JS, and other code is in cleartext in the Docker image. Alternative approaches that fulfill these requirements are listed here for discussion.


 * We could not publish Docker images at all. This makes it harder for our developers to develop, test, and debug issues locally. Probably not acceptable.


 * We could only publish Docker images when they don't have embargoed security fixes. This would still mean local development etc is hampered, but also leaks the information that there is an embargoed fix in the pipeline. Probably not acceptable.


 * We could build, test, and publish a Docker image from public sources, and build, test, and deploy to production a second one, with the security changes applied. The fundamental problem here is that the published Docker image, which our developer would use for local development, is different from what actually runs in production. However, this would be a temporary problem that only lasts as long as the embargo. This is effectively what we currently have. Security patches are applied by the train conductor during deployment, on top of the code that is in public git repositories.

If we go for the last option, someone with access to security patches would need to be responsible for dealing with problems found in the second Docker image, but not in the first one. Basically, the process would be something like:


 * build public Docker image
 * deploy public image to a test environment
 * run acceptance tests against that test environment
 * build private Docker image by applying security patches
 * deploy private image to a second test environment
 * run acceptance tests against second test environment
 * deploy private image to production

If anything goes wrong with the building, testing, or deploying of the second image, it needs to be debugged by someone with access to the security patches. Ideally, this would be the person or people who wrote the security patch in the first place, possibly backed by the release engineering team.

Log storage
FIXME: This needs to be written fully.


 * We want to capture the build log or “console output” (stdout, stderr) of the build and store it. This is an invaluable tool for developers to understand what happens in a build, and especially why it failed.
 * Ideally, the build log is formatted in a way that’s easy for humans to read.
 * It’d also be nice if the build log can be easily processed programmatically, to extract information from it automatically. More than nice, having log files that programs can automatically extract information from would be so good it’s almost a requirement. But it may be difficult to achieve for some programs and build systems.
 * We may want to store build logs for extended periods of time so that we can analyze them later.
 * FIXME: We should have a list of metrics we want to gather from CI logs.

Artifact storage
FIXME: This needs to be written fully.


 * Artifacts are all the files created during the build process that may be needed for automated testing or deployment to production or any other environment: executable binaries, minimized JavaScript, automatically generated documentation from source code.
 * We basically need to store arbitrary blobs for some time. We need to retrieve the blobs for deployment, and possibly other reasons.
 * We may want to store artifacts that get deployed to production for a longer time than other artifacts so that we can keep a history what was in production at any recent-ish point in time.
 * We will want to trace back from each artifact which git repository and commit, and CI job/build, it came from.
 * We can de-duplicate artifacts (a la backup programs) to save on space. Even so, we will want to automatically expire artifacts on some flexible schedule to keep storage needs in control.
 * We need to decide when we can make these artifacts publicly accessible.
 * Artifact storage must be secure, as everything that gets deployed to production goes via it.
 * There are some artifact storage systems we can use.

Credentials management and access control
Credentials and other secrets are needed to allow access to servers, services, and files. They are highly security sensitive data. The CI system needs to protect them, but allow controlled use of them.

Example: a CI job needs to deploy a Docker image with a tested and reviewed change as a container orchestrated by production Kubernetes. For this, it needs to authenticate itself to the Kubernetes API. This is typically done by a username/password combination, but might be an API token of some kind (though it doesn’t really matter; it’s all just secret bits at some level). How will the future CI system handle this?

Example: for tests, and in production, a MediaWiki container needs access to a MariaDB database, and MW needs to authenticate itself to the database. MW gets the necessary credentials for this from its configuration, which CI will install during deployment. The configuration will be specific for what the container is being used: if it’s for testing a change, the configuration only allows access to a test database, but for production it provides access to the production database.

Builds are done in isolated containers. These containers have no credentials. Build artifacts are extracted from the containers and stored in an artifact storage system by the CI system, and this extraction is done in a controlled environment, where only vetted code is run, not code from the repository being tested. The build environment can’t push artifacts directly to the artifact store.

Deployments happen in controlled environments, with access to the credentials needed for deployment. The deployment retrieves artifacts from the artifact storage system. The deployments are to containers, and the deployed containers don’t have any credentials, unless CI has been configured to install them, in which case CI installs only the credentials for the intended use of the container.

Note that credentials should not come directly from the source code of the deployed program. CI deploys configuration when it deploys the software. This way, the same software (build artifacts) can be deployed to different environments. (This may be complicated by the way MediaWiki is configured, using a PHP file in the source tree. This will need discussion.)

Tests run against software deployed to containers, and those containers only have access to the backing services needed for the test, and may even be firewalled to not have access to any other network locations.

Suggestion: Deployments will be done dedicated deployment environments, which run a “pingee” service. When a pipeline executes a deployment stage, deploying to any environment, the stage runs in a suitable container, but doesn’t actually do the deployment itself. Instead, it “pings” a deployment service, with information on who is deploying, what, and where. The deployment service then inspects the change, and if it looks acceptable, does the actual deployment to the desired environment. The deployment service has access to the credentials it needs for accessing the artifacts and doing the deployment. There may be several deployment services, for deploying to environments with different security needs.

= CI implementation =

FIXME This needs to be written, but it needs a lot of thinking and experimentation first. RelEng is prototyping with GitLab, Argo, and Zuul v3.

= Transitioning to new CI system =

FIXME This needs to be written. The migration plan will depend on the details of the new system, and so will need to wait until we’ve chosen what to implement. As it stands, there are at least the following questions to solve:


 * MediaWiki isn’t going to be ready to deploy to a container for many months. How will we deal with this? Continue doing manual deployments? Automate the deployment to non-containers?
 * MediaWiki does not currently seem to have a test suite that gives us sufficient confidence to do automated deployments. What can and should we do about this?

= Maintenance =

FIXME this needs to be written.

This chapter discusses how maintenance of the new CI system will be divided between various teams.