User:LarsWirzenius/NewCI/threats

From mediawiki.org
Simplified architecture of possible new CI system, for threat modelling

Threat modelling new CI[edit]

See <https://phabricator.wikimedia.org/T240679>.

See attached abstract system diagram of new CI. This section sketches out a number of possible threats, by target. Mitigation strategies are suggest. Mitigation techniques are left for later.

This is a first preliminary draft. It's meant to provide a basis for discussion. It does not represent any final decisions.

We'll start with an abstract design of what a CI system might look like, and evolve a threat model from that. We will then iterate to improve the abstract design, and perhaps make it more concrete by applying it to possible CI implementations we're considering, and then evolve the threat model accordingly.

Sub-diagrams[edit]

Insecure part of CI

These nodes are accessible the developer or run code provided by the developer. Gerrit is included since it has the biggest exposed surface towards the developer.

Secure part of CI

These nodes are not directly accessible by developer, and run no unvetted code.

Production

These nodes run the sites, or provide binaries, Docker images, etc, that get deployed to production.

Nodes[edit]

Volunteer

A volunteer developer. Basically anyone online. Pushes code to CI via Gerrit, has read-only access to CI eb UI, has full HTTP and API access to test environments to test their changes.

Trusted developer

A trusted developer. Might be staff or volunteer. Can do anything a untrusted developer can, plus things that we decide require more trust.

Staff

Employed by WMF, has been trusted with admin level access, possibly even Unix root access.

Gerrit

Code hosting via git, code review via web UI. Triggers builds on build nodes on changes.

Build node

Builds the code from Gerrit. Runs code provides by developer.

CI web UI

Provides read-only access for viewing web logs, seeing what builds are happening.

Test environment

Runs the code provided by the developer, in an environment more or less like production, so the developer can test their changes, for when they need more than their personal machines to do that.

Artifact store for temporary blobs

Stores build artifacts from build nodes: binaries, Docker images, translation files, etc. Deployments to test enviroments happen from here. Build logs will be stored here.

Deployment node

Retrieves build artifacts from the temporary store, deploys them to test environments, or promotes them to the persistent store.

Artifact store for persistent blobs

Like the temporary store, but these are meant to be deployed to production.

Production nodes

These provides the sites and services we exist to provide, or are supporting procuction infrastructure for that, such as DNS and Puppet servers.

Threats[edit]

For now, this just lists possible threats, not mitigations. We can discuss those together.

Low severity[edit]

  • Deny service by using all build node capacity.
  • Deny service by filling Gerrit storage.
  • Deny service by filling temporary artifact storage.
  • Deny service by filling persistent artifact storage.
  • Deny service by filling production node storage.
  • Deny service by using all test environment capacity.
  • Deny service by using all production node capacity.

Medium severity[edit]

  • Spoof developer to Gerrit web UI.
  • Spoof developer to test environment, via HTTP.
  • Spoof developer to CI web UI.

High severity[edit]

  • Tamper with code modifying it in Gerrit.
  • Tamper with code operating the build node itself.
  • Disclose information about production site users.
  • Disclose secrets from build nodes, e.g., those needed to push artifacts to store.
  • Disclose security fixes under embargo, from production environment.
  • Elevate privilege by impersonating SRE/admin on Gerrit host (shell), over ssh.
  • Elevate privilege by impersonating SRE/admin on Gerrit UI/API, over HTTP.
  • Elevate privilege by impersonating SRE/admin on test environment, over ssh.
  • Elevate privilege by impersonating SRE/admin on test environment, over HTTP.
  • Elevate privilege by impersonating SRE/admin on CI web UI node, over ssh.
  • Elevate privilege by impersonating SRE/admin on CI web UI node, over HTTP.
  • Elevate privilege by impersonating SRE on build nodes, over ssh.
  • Elevate privilege by breaking out of build sandbox on build nodes.

Meetings[edit]

2020-01-15[edit]

  • Joe has an updated graph, similar to Lars's.
  • We want physically separate machines for building and deploying.
 * Dan: This can be done via K8s and Argo, without a need for separate K8s clusters.
  • We'll try to have an in-person meeting at All Hands.
  • Joe and Dan to talk about Argo specifics for threat modelling before All Hands.

Actions:

  • Lars to send these notes to everyone.
  • Lars to talk to managers about an in-person meeting at All Hands.
  • Joe and Dan to talk about Argo specifics.