Continuous integration/Architecture/Castor

Castor is an umbrella term for the caching of dependencies/package managers materials for the isolated instances.

The CI jobs start up in a fresh environment and have to retrieve dependencies over the internet and eventually, for native dependencies, compile them. The download phase can be arbitrarily long with package managers such as maven download a long list of dependencies, and has the risk of upstream blacklisting our network abusing bandwidth. The installation and compile phase can be quite slow as well and it does not make sense to compile again and again the same material.

We introduced a very lame system based on rsync. It copies from the instance a list of directories to a central place whenever the change succeeded in the Zuul gate-and-submit pipeline. When a job start, it first attempts to retrieve the material from the central cache, thus warming up the cache before invoking the package manager. The cache itself is namespaced by:

Mechanism
For reference see integration/config.git:jjb/castor.yaml


 * Instance: castor.integration.eqiad.wmflabs
 * Location: /mnt/jenkins-workspace/caches

When a job is in gate-and-submit and is successful, it triggers the jenkins job castor-save which runs on instance castor.integration.eqiad.wmflabs</tt>. The job will connect to the instance, then rsync the package managers caches to the central instance castor</tt>.

The cache is namespaced by: Gerrit project name with /</tt> replaced by -</tt> (eg: mediawiki-core</tt>), target branch (eg: master</tt>) and job name (eg: rake-jessie</tt>).

The job have a builder macro that attempt to rsync</tt> the cache from castor</tt> into the home dir, thus populating the local cache. When the package manager installer is run (eg: npm install</tt>), it will hit the local cache, saving it from having to download packages over the internet.