User:LarsWirzenius/NewCI/threats



= Threat modelling new CI =

See .

See attached abstract system diagram of new CI. This section sketches out a number of possible threats, by target. Mitigation strategies are suggest. Mitigation techniques are left for later.

This is a first preliminary draft. It's meant to provide a basis for discussion. It does not represent any final decisions.

We'll start with an abstract design of what a CI system might look like, and evolve a threat model from that. We will then iterate to improve the abstract design, and perhaps make it more concrete by applying it to possible CI implementations we're considering, and then evolve the threat model accordingly.

Sub-diagrams
Insecure part of CI
 * These nodes are accessible the developer or run code provided by the developer. Gerrit is included since it has the biggest exposed surface towards the developer.

Secure part of CI
 * These nodes are not directly accessible by developer, and run no unvetted code.

Production
 * These nodes run the sites, or provide binaries, Docker images, etc, that get deployed to production.

Nodes
Volunteer
 * A volunteer developer. Basically anyone online. Pushes code to CI via Gerrit, has read-only access to CI eb UI, has full HTTP and API access to test environments to test their changes.

Trusted developer
 * A trusted developer. Might be staff or volunteer. Can do anything a untrusted developer can, plus things that we decide require more trust.

Staff
 * Employed by WMF, has been trusted with admin level access, possibly even Unix root access.

Gerrit
 * Code hosting via git, code review via web UI. Triggers builds on build nodes on changes.

Build node
 * Builds the code from Gerrit. Runs code provides by developer.

CI web UI
 * Provides read-only access for viewing web logs, seeing what builds are happening.

Test environment
 * Runs the code provided by the developer, in an environment more or less like production, so the developer can test their changes, for when they need more than their personal machines to do that.

Artifact store for temporary blobs
 * Stores build artifacts from build nodes: binaries, Docker images, translation files, etc. Deployments to test enviroments happen from here. Build logs will be stored here.

Deployment node
 * Retrieves build artifacts from the temporary store, deploys them to test environments, or promotes them to the persistent store.

Artifact store for persistent blobs
 * Like the temporary store, but these are meant to be deployed to production.

Production nodes
 * These provides the sites and services we exist to provide, or are supporting procuction infrastructure for that, such as DNS and Puppet servers.

Threats
For now, this just lists possible threats, not mitigations. We can discuss those together.

Low severity

 * Deny service by using all build node capacity.
 * Deny service by filling Gerrit storage.
 * Deny service by filling temporary artifact storage.
 * Deny service by filling persistent artifact storage.
 * Deny service by filling production node storage.
 * Deny service by using all test environment capacity.
 * Deny service by using all production node capacity.

Medium severity

 * Spoof developer to Gerrit web UI.
 * Spoof developer to test environment, via HTTP.
 * Spoof developer to CI web UI.

High severity

 * Tamper with code modifying it in Gerrit.
 * Tamper with code operating the build node itself.
 * Disclose information about production site users.
 * Disclose secrets from build nodes, e.g., those needed to push artifacts to store.
 * Disclose security fixes under embargo, from production environment.
 * Elevate privilege by impersonating SRE/admin on Gerrit host (shell), over ssh.
 * Elevate privilege by impersonating SRE/admin on Gerrit UI/API, over HTTP.
 * Elevate privilege by impersonating SRE/admin on test environment, over ssh.
 * Elevate privilege by impersonating SRE/admin on test environment, over HTTP.
 * Elevate privilege by impersonating SRE/admin on CI web UI node, over ssh.
 * Elevate privilege by impersonating SRE/admin on CI web UI node, over HTTP.
 * Elevate privilege by impersonating SRE on build nodes, over ssh.
 * Elevate privilege by breaking out of build sandbox on build nodes.

= Meetings =

2020-01-15
* Dan: This can be done via K8s and Argo, without a need for separate K8s clusters.
 * Joe has an updated graph, similar to Lars's.
 * We want physically separate machines for building and deploying.
 * We'll try to have an in-person meeting at All Hands.
 * Joe and Dan to talk about Argo specifics for threat modelling before All Hands.

Actions:


 * Lars to send these notes to everyone.
 * Lars to talk to managers about an in-person meeting at All Hands.
 * Joe and Dan to talk about Argo specifics.