Continuous integration/Documentation generation

We automatically generate documentation on https://doc.wikimedia.org/ which is hosted on a production machine. Since the docs are often generated on labs instances, we had to set up a two steps process to be able to move the material from the labs instance that run the job to the production server that make it publicly available. This page documents the workflow being used, part of the technical implementation and how to define a new job.

Zuul
Our reacts on two kinds of Gerrit events which are matched by two different pipelines:

ref-updated could covers the changes being merged, but the event is not associated with a Gerrit change number which prevents us from reporting back to the Gerrit interface. We thus use postmerge to report back in Gerrit so the user knows it has been generated and the publish pipeline which handles references updates matching ^refs/tags/.

In both case (change-merged or ref-updated) we trigger the same job to generate the documentation for any branch or tag. We thus need to namespace the documentation under doc.wikimedia.org based on the branch name or the tag name. The information is carried differently between events and the reference is slightly different between branch updates and tags. The conversion logic is carried by a which is associated to all the publish jobs. It injects to the Gearman function (and thus the Jenkins job environment) a variable DOC_SUBPATH which represents the version. Example:


 * change merged on REL1_24 branch: DOC_SUBPATH = REL1_24
 * refs/tags/1.24.0 updated: DOC_SUBPATH = 1.24.0

Reference:

We can thus reuse that parameter to easily namespace the jobs per version.

Jenkins job builder definitions
Most of the logic is defined straight in Jenkins Job Builder doc.yaml configuration file.

In a job definition, a builder defines the command to generate the documentation which ends up being written under a build_path</tt>. We then have a macro doc-publish</tt> which takes care of publishing the documentation to the production server. It takes two parameters:
 * 1) docsrc</tt> which is the place where the doc has been generated (build_path from above)
 * 2) docdest</tt> the final destination path under https://doc.wikimedia.org/

Example job definition:

Will invoke make doc</tt> and publish the content under doc/build/html</tt> at https://doc.wikimedia.org/myproject/.

To namespace the documentation based on Zuul generated DOC_SUBPATH, simply insert it in the docdest</tt> parameter. You will need to invoke the builder assert-env-doc_subpath. Example for mediawiki/core (mediawiki-core-doxygen-publish job):

Architecture
The documentations are ultimately published on doc.wikimedia.org which is a production machine (gallium</tt> as of Sep. 2014). They are generated on labs instance part of the integration</tt> labs project which are not allowed to communicate with production machines.

To solve this, we created an intermediary instance integration-publisher.eqiad.wmflabs</tt>, the doc-publish</tt> macro running on the labs instance will rsync the generated content to the instance under the doc</tt> rsync container in a uniquely named sub directory (reusing ZUUL_UUID). The macro then triggers the publish-doc</tt> job with the unique identifier, it will rsync from the intermediary instance to the production machine, thus publishing the doc.

The integration-publisher.eqiad.wmflabs</tt> rsync daemon is reachable by other integration labs instances since they are in the same project. The production slave gallium is allowed connection since it has a public IP and can reach labs, the other production slave lanthanum has a private IP and thus can not reach labs per policy. Hence all jobs should be tied to the <tt>contintLabsSlave</tt> label.