Talk:Continuous integration

Jump to navigation Jump to search

About this board

Proposal for continuous integration (post-Git migration)

Summary by Krinkle

This post has been copied to a wiki page for further editing and finalization: Continuous integration/Workflow

Krinkle (talkcontribs)

Hey all,

I've been thinking and chatting about unit testing at random intervals last few weeks and been thinking about the new Gerrit workflow. Here's a short overview of what I think would be a good plan. I believe most (if not all) of these points were already discussed but I couldn't find a definitive plan so I wrote it up here.

EDIT: Check Continuous integration/Workflow instead

Catrope (talkcontribs)

Some comments:

  • There shouldn't be separate linters and separate comments for each lint type, because AIUI one V+1 is enough to clear a revision for merging.
    • Diederik had this idea about a universal lint script that traverses the directory tree and invokes the right linter for each file based on the extension; this sounds like a better idea to me.
  • You'd need a separate review category or a different permissions setup in Gerrit to get the Jenkins-initiates-the-merge workflow to work. Talk to Ryan about that
  • IIRC the OpenStack people told me that the master+cherrypick race condition couldn't occur in practice because Jenkins is single-threaded, but I haven't verified this
Krinkle (talkcontribs)

Some answers:

  • Lint: Having only 1 lintbot makes perfect sense indeed. We could add support for more file extensions as time goes on (and use the same bot for operation repositories, so it also supports stuff like puppet manifests, *.ini files and what not). I'm not sure what you mean by the "V+1 is enough to clear a revision for merging" though. AFAIK the scores do not add up, and the lint check (+1) does not affect merging. If either of those is true then we still need to figure out how to let multiple bots score. Although the lint bots can be merged, there is still the separate jenkinsbot (which serves a different purpose). It goes like:
    lintbot (+/- 1) > human reviewer approving (+/- 2) > jenkings run > jenkins performing or rejecting the merge (not sure how that is reflected in the score)
  • Jenkins listening to merge approvals and initiating them afterwards is possible. Ryan is the one who told me about that concept as he saw another project doing it like this (OpenStack?)
  • Yeah, I wasn't sure wether there can be a difference between master+cherrypick and post-merge at all. But I think it is still useful and important to have a Jenkins project running on the actual master that only includes builds for what was actually merged so that there is a clear linear overview of what the state of the repository is, and as an additional self-assurance.
Krinkle (talkcontribs)

Am I understanding this proposal correctly that there's a wait state for human review before automated testing fully kicks in? ("If it is rejected then the commit story ends until further notice. If it was marked OK, however, then the story continues.") If so, shouldn't we aim to optimize to run as many automated tests as early as possible, to reduce human reviewer workload (flag changes that clearly break things ASAP so they can be rejected immediately)?--Eloquence (talk) 17:10, 22 March 2012 (UTC)

This post was posted by Krinkle, but signed as Eloquence.

Krinkle (talkcontribs)

Lint checks happen right away, however the unit testing:

  • cloning of the mediawiki repo
  • setting up several databases and installing it multiple times for different database backends
  • Executing it in PHP
  • Sending the QUnit test suites to all TestSwarm clients
  • etc.

.. all that should not happen right away due to the security risk of arbitrary code execution (both for PHP and for JS). The Swarm would function like a little bot net (JS can practically take over the browser and make as many requests for as long as it wants to whatever domain), executing arbitrary PHP code on a machine in the production cluster don't sound good either (right now Jenkins runs the unit test on the same machine as that it runs itself).

Even if we could limit the security issues (which I don't believe we can), I think it is reasonable that there should be agreement over a revision before it is tested for.

Eloquence (talkcontribs)

I understand the concern, but if we're creating a world where tests have to wait for developer review, we're doing CI the wrong way around. The whole point of automated testing is to minimize the need for human review, so we should aim to run as many tests as possible as early as possible. Both test performance and security considerations are problems to be gradually resolved in aiming for highest possible test execution for all code that gets submitted, and minimizing the need for human review on code which is obviously broken.

So to have a gated trunk where the gate consists of humans performing reviews before tests get run is in my opinion not an acceptable outcome -- it's making people do busywork which could be avoided.

A few points:

  • I understand that we're very liberal with account creation now, and may at some point want to indeed fully automate the process. This is probably the single most important vector through which any kind of deliberate attack could proceed. Would it therefore be feasible to have tests get run automatically against any changesets authored by devs on a whitelist? We could be fairly liberal about adding people to that whitelist, as long as they have a track record of good faith contribution. This seems like the quickest win to get tests to run before review.
    • As a side note, it's my understanding that we ran the previous CI tests against any revision in trunk, code reviewed or not. Is that not the case?
  • Creating a test environment which provides sufficient security isolation is a solvable problem. True, this may be harder if we use TestSwarm. But we don't have to use TestSwarm, and if we're not comfortable with the security implications of shipping random code to random users, let's not do it. There are alternatives, e.g. SauceLabs (which is optimized for Selenium tests). If we're worried about security implications of shipping random JS code to a swarm of random people, let's investigate alternatives and options to create security isolation of the code that we test.
  • Likewise, we should be able to use Wikimedia Labs for running both unit and integration tests separately from the production cluster.
Krinkle (talkcontribs)

Thanks the the input!

  • I'd support having be linked to a (yet to be created) WMF Labs instance instead of running it in the production cluster. That would solve the concern of executing code server side
  • SauceLabs (from quick first impression), however, doesn't look like an alternative for TestSwarm but as an alternative for BrowserStack. TestSwarm distributes test suites to connected clients and aggregates results from unit tests. We can simply set up TestSwarm to only accept BrowserStack clients (or SauceLabs clients). That solves two problems: Firstly no longer a problem to have arbitrary JS, secondly it would avoid issues like the the situation we had last year where someone would join the swarm with an unsupported browser that has a user agent that looks like a supported browser and messing up the build reports.

Works on Phab Diffussion/Differential?

MarcoAurelio (talkcontribs)

Sorry if this has been asked before, but I wonder if, with the Gerrit-Migration project in mind, jenkins and all the CI infraestructure is able to work in Phabricator Diffussion or Phabricator Differential. Thanks.

Reply to "Works on Phab Diffussion/Differential?"

Categorisation of unit tested extensions

Seb35 (talkcontribs)
Legoktm (talkcontribs)

Can you explain why you want such a category? Just checking whether an extension has tests doesn't mean much...we'd really want coverage statistics. There's also that we run core's structure tests on many extensions, which adds some coverage as well.

Seb35 (talkcontribs)

I find it is useful to create basic statistics about extensions quality, I did last year some stats about extensions (here – I very badly referenced this page, a bit better now), and extensions with unit testing (even basic) is an indicator about "good quality". I aggree stats about code coverage is better to have finer statistics, but a thredhold "have or not unit tests" give a first approximation -- which could be emphased in Template:Extension for instance to show quality: e.g.

unit tests registration git i18n i18n-qqq user doc

It would be quite easy to read and understand for somebody searching for an extension (the more "check" there is the better it is) and quite easy to maintain on (once an extension has created unit tests it has a "check" forever), and it could encourage extension developers to improve the quality. Obviously I chosed some criteria here but it is open for discussion.

Reply to "Categorisation of unit tested extensions"

Questions and wishlist items regarding the TestSwarm setup

Peachey88 (Flood) (talkcontribs)
  • Can we bring back the leaderboards / projects links from Timo's setup? Either way is not discoverable from the main page right now.
  • It would also be nice for the page to have a Wikimedia-appropriate skin and intro text.
  • Browsers: How about adding Android 3.0/4.0 and all versions of Firefox Mobile?

Looks great overall and look forward to seeing the new JSTesting branch deployed as well.

This post was posted by Peachey88 (Flood), but signed as Eloquence.

Peachey88 (Flood) (talkcontribs)

SeaMonkey (gecko 5.0 win7) isn't tested as well, but I doubt many people use it.

This post was posted by Peachey88 (Flood), but signed as Sumurai8.

Krinkle (talkcontribs)

I think we should keep the install on clean from patches. I have commit access to the TestSwarm repository and there is a wiki and issue tracker for it as well as active pullers for pull requests.

  • The leader boards I did on the Toolserver install do not scale well. Part of them were once part of TestSwarm but were removed for performance reasons as it was causing time-outs on jQuery's install of TestSwarm. I copied them to my install but they are now starting to slow down load time for my install as well. This needs a more cached solution. Probably a QueryCache-like table that is periodically ran, and option to not cache it for small installs. There are proposals on the jQuery Testing Team wiki, and testswarm issue #71 for the scores leaderboard.
  • The list of projects was hardcoded on the Toolserver. TestSwarm doesn't distinguish between projects and users. Every user that is registered and knows it's authentication token can submit jobs. Simple solution would be a config variable with an array of usernames that should be promoted as projects on the home page.
  • No support for skins, i18n or texts yet.
  • User agents are in useragents.sql.
Krinkle (talkcontribs)

Note that TestSwarm does not intend to be like Jenkins.

Features around "overview", "timeline" and "notification" are really a job for Jenkins and there is a TestSwarm-Jenkins plugin in the works that allows Jenkins to fetch build information from TestSwarm. jQuery was using this as well, right now it's offline for further perfection and development as there were some bugs. I think we should install that plugin too as soon as it's usable.

TestSwarm is very much focussed for the distribution and automation of the unit tests, and doing that very well. The front-end doesn't have much priority, although it does have some overviews, it's not very elaborate.

Reply to "Questions and wishlist items regarding the TestSwarm setup"
MWJames (talkcontribs)

As of March 2013 QUnit tests do not run through Jenkins. What is the status/timeline of having QUnit tests run through Jenkins (also for non-core extensions) similar to that of PHPunit?

Krinkle (talkcontribs)

We spent a lot of time coming up with a good workflow last year. This is documented on Continuous integration/Workflow specification.

  • In January we added linting for javascript, and as of last week the output of jshint is also consumed by Jenkins in a "Checkstyle" report (making it easier to view, as opposed to Jenkins' raw text output).
  • Yesterday I added running of QUnit tests (in PhantomJS) for MediaWiki core (announcement). This is being prepared for extensions as we speak, likely next week.
  • Next up is testing in multiple browsers through testswarm-browserstack or grunt-saucelabs.
MWJames (talkcontribs)

Fantastic, thanks for the effort. I'll try to have SMW-core running QUnits tests as soon it is enabled for non-core extensions.

Reply to "Status QUnit integration"

also deploy from Jenkins to beta cluster?

Cmcmahon (talkcontribs)

One thing I would very much like to have is an automated mechanism to update the labs cluster VM wikis reliably to some known version of MW for those configurations. Right now I think it would be best to update those nightly from Jenkins rather than after each commit, or whatever. (I've discussed this with Antoine somewhat)

Ultimately this serves two purposes: for one thing, it offers a chance to bring in the global testing community at will for ongoing testing of new features. For another thing, in the longer term I would like to add some builds downhill from the deploy-to-labs-cluster builds that would run a suite of cross-browser UI tests against the labs cluster environments nightly.

I've put up a QA/testing status page to track this and other items.

Krinkle (talkcontribs)

I like this idea a lot!
I thought about proposing an additional check in this Jenkins hook to also check the codereview status of the revisions. So that it would update the wikis to the "latest reviewed revision that passes the unit tests". However after the Git migration, mediawiki-core/master wiill only contain reviewed commits anyway, so such check is no longer necessary. Nice :)

Lab's "beta" project (renamed from "deployment-prep") was originally meant to show the next release, not alpha or trunk ("master" or "svn trunk").

I don't think that "beta" should be running on trunk. However it should be relatively easy to set up another labs' project just like "beta" (e.g. named "mw-alpha") which would indeed run on trunk.

Both "mw-alpha" and "mw-beta" would be automatically updated by this Jenkins hook.

Which reminds me, right now we don't have a project for REL1_19 in Jenkins (we should!), previously when we still used phpUnderControl we did monitor the release branches.

Cmcmahon (talkcontribs)

My purpose here is to have a reliable world-shareable test environment, running the latest reliable MediaWiki code, in test wikis configured closely to production.

A reliable test environment is important for almost every aspect of the QA/testing work I want to accomplish.

Hashar (talkcontribs)
Krinkle wrote:
Which reminds me, right now we don't have a project for REL1_19 in Jenkins (we should!), previously when we still used phpUnderControl we did monitor the release branches.

I will have to test how Gerrit git plugin works out with branch. IIRC, it just listen for any change proposal made in gerrit, checkout the branch, apply patch then run tests. So we most probably only need one git fetching job to handle all branches.

Reply to "also deploy from Jenkins to beta cluster?"
There are no older topics