Draft:Wikimedia Release Engineering Team/Seakeeper proposal (FY2019-20Q4 rework)

= Introduction =

This document serves as an up-to-date (as of FY 19-20 Q4) compendium of goals, architecture, and requirements for the Seakeeper project which aims to replace Wikimedia Foundation's existing general purpose continuous integration (CI) system with one that is more performant, capable, secure, and user friendly.

Purpose
The primary purpose of this document is two fold, to organize the goals and requirements of a viable replacement CI system as well as propose a path forward for its adoption and implementation. While it is highly informed by—and may outright borrow from—work in this domain over the past year —and lessons learned —it is not bound by any specific assertions of that work. It should be considered the latest iteration and primary reference document pertaining to CI replacement and is meant to be exhaustive in its representation of goals, requirements, and architecture.

Scope
While prior documents covered requirements and architecture for a general purpose CI replacement as well as some aspects of prospective continuous delivery system, this document will only cover aspects of the former. The reasoning for this narrowing of scope is summed up by the following comment from Security Concept Review task T240943.

We recognize that some of our documentation and process has conflated the requirements and policy of a general purpose CI system with that of the Deployment Pipeline project or another form of continuous delivery/deployment that we are working towards in the long term. While these systems are highly interrelated, they are also distinct and therefore can (and should) be reasoned about separately, for the sake of clarity in forming security policy, modeling threat, and proposing implementation.

The Deployment Pipeline is another important and ambitious project that will no doubt benefit from the success of this one; It both hinges on the success of a well planned and implemented CI platform, and deserves its own properly scoped process of planning, review, and implementation.

At its outset, this project has been driven by a very real need to replace the aging CI system we run now which handles for the most part general purpose workloads, is critical in supporting the daily work of WMF staff and volunteers, and is composed of unmaintained (some fully deprecated) components. Narrowing scope to accomplish a timely replacement seems self-evidently justifiable.

Audience
Wikimedia Site Reliability Engineering, the Wikimedia Release Engineering Team, and the Wikimedia Security Team are the intended primary audience of this document as they have been most deliberative. Additional audiences may include management and product owners from Wikimedia Technology and Wikimedia Product as well as third party vendors should we engage in any formal procurement of or consultative process for PaaS.

Individual users of our existing CI system are not the intended audience of this document. However, feedback is welcome from any and all stakeholders.

= Overview =

Described in this section are the problems we're aiming to solve with the Seakeeper project by replacing our existing CI system and our specific goals in implementing a replacement.

Statement of need
Our existing CI system system is heavily relied upon by staff and volunteer contributors for the static analysis, functional testing, and integration of patchsets to over 2,200 different projects.

The overall performance of our CI system is highly dependent on our ability to scale.

Below are figures for monthly 90th percentile per-minute concurrency levels and estimated capacity and time at capacity.

...

Assumptions
= Architecture =

Security architecture
= Scenarios =

[Role3] scenarios
= Design =

Security design
= Risks =

Cost
= References =