Continuous integration/Architecture/Castor

From mediawiki.org

Castor is an umbrella term for the caching of dependencies/package managers materials for the isolated instances .

The CI jobs start up in a fresh environment and have to retrieve dependencies over the internet and eventually, for native dependencies, compile them. The download phase can be arbitrarily long with package managers such as maven download a long list of dependencies, and has the risk of upstream blacklisting our network abusing bandwidth. The installation and compile phase can be quite slow as well and it does not make sense to compile again and again the same material.

We introduced a very lame system based on rsync. It copies from the instance a list of directories to a central place whenever the change succeeded in the Zuul gate-and-submit pipeline. When a job start, it first attempts to retrieve the material from the central cache, thus warming up the cache before invoking the package manager. The cache itself is namespaced by:

Variable Description
ZUUL_PROJECT The git project name
ZUUL_BRANCH git branch the patch has been made against
JOB_NAME The Jenkins job name

Mechanism[edit]

For reference see integration/config.git:jjb/castor.yaml

  • Instance: integration-castor05.integration.eqiad.wmflabs configured in Jenkins via CASTOR_HOST env variable
  • Location: /srv/castor/

When a job is in gate-and-submit and is successful, it triggers the jenkins job castor-save which runs on the Castor instance. The job will connect to the instance the original gate job ran on, and then rsync the package managers caches to the Castor instance.

The cache is namespaced by: Gerrit project name with / replaced by - (eg: mediawiki-core), target branch (eg: master) and job name (eg: rake-jessie).

The job have a builder macro that attempt to rsync the cache from castor into the home dir, thus populating the local cache. When the package manager installer is run (eg: npm install), it will hit the local cache, saving it from having to download packages over the internet.

The JJB macro refers to the host using the CASTOR_HOST environment variable which is configured as a global variable on the Jenkins controller.

/srv/castor is a Cinder Volume mounted in the instance.

References[edit]