Topic on Talk:Continuous integration

Proposal for continuous integration (post-Git migration)

7
Summary by Krinkle

This post has been copied to a wiki page for further editing and finalization: Continuous integration/Workflow

Krinkle (talkcontribs)

Hey all,

I've been thinking and chatting about unit testing at random intervals last few weeks and been thinking about the new Gerrit workflow. Here's a short overview of what I think would be a good plan. I believe most (if not all) of these points were already discussed but I couldn't find a definitive plan so I wrote it up here.


EDIT: Check Continuous integration/Workflow instead

Catrope (talkcontribs)

Some comments:

  • There shouldn't be separate linters and separate comments for each lint type, because AIUI one V+1 is enough to clear a revision for merging.
    • Diederik had this idea about a universal lint script that traverses the directory tree and invokes the right linter for each file based on the extension; this sounds like a better idea to me.
  • You'd need a separate review category or a different permissions setup in Gerrit to get the Jenkins-initiates-the-merge workflow to work. Talk to Ryan about that
  • IIRC the OpenStack people told me that the master+cherrypick race condition couldn't occur in practice because Jenkins is single-threaded, but I haven't verified this
Krinkle (talkcontribs)

Some answers:

  • Lint: Having only 1 lintbot makes perfect sense indeed. We could add support for more file extensions as time goes on (and use the same bot for operation repositories, so it also supports stuff like puppet manifests, *.ini files and what not). I'm not sure what you mean by the "V+1 is enough to clear a revision for merging" though. AFAIK the scores do not add up, and the lint check (+1) does not affect merging. If either of those is true then we still need to figure out how to let multiple bots score. Although the lint bots can be merged, there is still the separate jenkinsbot (which serves a different purpose). It goes like:
    lintbot (+/- 1) > human reviewer approving (+/- 2) > jenkings run > jenkins performing or rejecting the merge (not sure how that is reflected in the score)
  • Jenkins listening to merge approvals and initiating them afterwards is possible. Ryan is the one who told me about that concept as he saw another project doing it like this (OpenStack?)
  • Yeah, I wasn't sure wether there can be a difference between master+cherrypick and post-merge at all. But I think it is still useful and important to have a Jenkins project running on the actual master that only includes builds for what was actually merged so that there is a clear linear overview of what the state of the repository is, and as an additional self-assurance.
Krinkle (talkcontribs)

Am I understanding this proposal correctly that there's a wait state for human review before automated testing fully kicks in? ("If it is rejected then the commit story ends until further notice. If it was marked OK, however, then the story continues.") If so, shouldn't we aim to optimize to run as many automated tests as early as possible, to reduce human reviewer workload (flag changes that clearly break things ASAP so they can be rejected immediately)?--Eloquence (talk) 17:10, 22 March 2012 (UTC)

This post was posted by Krinkle, but signed as Eloquence.

Krinkle (talkcontribs)

Lint checks happen right away, however the unit testing:

  • cloning of the mediawiki repo
  • setting up several databases and installing it multiple times for different database backends
  • Executing it in PHP
  • Sending the QUnit test suites to all TestSwarm clients
  • etc.

.. all that should not happen right away due to the security risk of arbitrary code execution (both for PHP and for JS). The Swarm would function like a little bot net (JS can practically take over the browser and make as many requests for as long as it wants to whatever domain), executing arbitrary PHP code on a machine in the production cluster don't sound good either (right now Jenkins runs the unit test on the same machine as that it runs itself).

Even if we could limit the security issues (which I don't believe we can), I think it is reasonable that there should be agreement over a revision before it is tested for.

Eloquence (talkcontribs)

I understand the concern, but if we're creating a world where tests have to wait for developer review, we're doing CI the wrong way around. The whole point of automated testing is to minimize the need for human review, so we should aim to run as many tests as possible as early as possible. Both test performance and security considerations are problems to be gradually resolved in aiming for highest possible test execution for all code that gets submitted, and minimizing the need for human review on code which is obviously broken.

So to have a gated trunk where the gate consists of humans performing reviews before tests get run is in my opinion not an acceptable outcome -- it's making people do busywork which could be avoided.

A few points:

  • I understand that we're very liberal with account creation now, and may at some point want to indeed fully automate the process. This is probably the single most important vector through which any kind of deliberate attack could proceed. Would it therefore be feasible to have tests get run automatically against any changesets authored by devs on a whitelist? We could be fairly liberal about adding people to that whitelist, as long as they have a track record of good faith contribution. This seems like the quickest win to get tests to run before review.
    • As a side note, it's my understanding that we ran the previous CI tests against any revision in trunk, code reviewed or not. Is that not the case?
  • Creating a test environment which provides sufficient security isolation is a solvable problem. True, this may be harder if we use TestSwarm. But we don't have to use TestSwarm, and if we're not comfortable with the security implications of shipping random code to random users, let's not do it. There are alternatives, e.g. SauceLabs (which is optimized for Selenium tests). If we're worried about security implications of shipping random JS code to a swarm of random people, let's investigate alternatives and options to create security isolation of the code that we test.
  • Likewise, we should be able to use Wikimedia Labs for running both unit and integration tests separately from the production cluster.
Krinkle (talkcontribs)

Thanks the the input!

  • I'd support having integration.mediawiki.org be linked to a (yet to be created) WMF Labs instance instead of running it in the production cluster. That would solve the concern of executing code server side
  • SauceLabs (from quick first impression), however, doesn't look like an alternative for TestSwarm but as an alternative for BrowserStack. TestSwarm distributes test suites to connected clients and aggregates results from unit tests. We can simply set up TestSwarm to only accept BrowserStack clients (or SauceLabs clients). That solves two problems: Firstly no longer a problem to have arbitrary JS, secondly it would avoid issues like the the situation we had last year where someone would join the swarm with an unsupported browser that has a user agent that looks like a supported browser and messing up the build reports.