User:Sumanah/Launchpad-dev-process

From mediawiki.org

Summary[edit]

In Launchpad, all developers are also code reviewers for about a day a week. All developers write "merge proposals" and attach them to their branches, and their colleagues review those branches and proposals within a day. Code reviewing is not primarily a gate to keep bugs out of code, but a social step, to get a second set of eyes on code, and to spread knowledge around. After code review, and automated testing on an EC2 instance, the developer lands the branch in trunk, a developer QAs it, and it gets deployed onto production in a matter of days or hours.

Differences from MediaWiki[edit]

  • No volunteer reviewers, small volunteer contributor community
  • Review trainees have already seen what a good code review looks like, perhaps need less training
  • Started with backlog of less than a week's worth of commits

Launchpad development team's code review process[edit]

The Help section on Launchpad has a very brief overview of Launchpad's code review process with more information in the Launchpad development wiki. Brad Crittenden also wrote a description of Launchpad's code review workflow on his blog.

In short: Launchpad has about twenty developers, all of whom are paid by Canonical and also do code reviews. Submitters write code, put it in a bazaar branch, and attach a "merge proposal," an English-language document (basically a more thorough commit summary) within Launchpad. The merge proposals are a part of Launchpad. The description is one part, but it also includes a generated diff of the code changes as well as the reviewer comments and votes.

Submitters look for a real-time code review in [irc:#launchpad-dev] (Freenode IRC channel), but if no one's available they put it into a queue and it's reviewed within a day. (It is a submitter's responsibility to find a reviewer.) The reviewer can comments/votes as follows: comment only, approve, needs fixing, needs information, abstain, disapprove, or resubmit. The act of creating a merge proposal adds it to a public queue.

Most of the developers are scheduled to do "on-call review" (OCR) for one day a week, responding to review requests in IRC and taking care of any review backlog.

A few developers, project managers and squad leads, are exempt from on-call review duty. But even so, Launchpad developer Francis Lacoste says, the wait for a review is now usually less than a day, "1.5 days if you don't ping an OCR explicitly. The average review queue is usually around 10 reviews long and probably 15 more in-progress. I don't think we ever want above 30 reviews to go."

Launchpad wasn't always so fast about reviews. In 2007, they realized that their review process was bottlenecking them. At the time, people added their proposals onto a wiki page, and the project managers looked at the backlog and assigned reviews to individual reviewers. Launchpad developer Brad Crittenden says the process was "very troublesome"; "by today's standards, it was archaic."

The total is usually 3-10 reviews per day, per reviewer.

The review system for the Landscape team has different requirements; they require 2 reviews and approvals before something can be considered approved. The machinery in Launchpad (the application) for merge proposals doesn't deal with any of that, so people can put their own culture & rules on top of that.


Transition to new system[edit]

Francis Lacoste: "The OCR transition happened very rapidly. It probably took a few weeks to get everything working smoothly.

"Getting all developers reviewers took longer to accomplish. I'd say that took about a year or a year and a half. We were going from a team of about twelve reviewers to getting ~30 engineers. We were training about 2-4 reviewers at a time. I don't think we ever had more than 4 mentees at a time, 2-3 being more the regular workload."

Brad Crittenden agrees: "Yes, not everyone was ready to start that process. Some were reluctant to become reviewers. And we didn't want to have more than approximately 2-3 at a time in training."

While being mentored, the mentat does reviews, and they're marked "review-type: code*" (not "code") to indicate, "this is a review from a trainee, do not consider it approved till my mentor looks & follows it up with a final review."

It used to be that people who are in Launchpad reviewer training had been usually been active developers for a long time, so they had been on the receiving end of reviews and they knew what a good review looks like. Lacoste notes: "Now that the whole team has review responsibilities, we only have to mentor new people. They usually start the mentoring process after 2-4 months on the team."

Brad mentioned that the Launchpad team also has other community-related processes, and that team members are assigned Community Help Rotation (CHR) for those community tasks; see help.launchpad.net/HelpRotation and dev.launchpad.net/MaintenanceRotationSchedule. CHR has changed over time as the Launchpad team structure has changed, but it includes:

  • IRC support
  • bug triage
  • responding to incoming questions (email queue)
  • cleaning out spam
  • approving & importing new projects, bugbases, translations, etc.

Philosophy of code review[edit]

In general, the Launchpad team thinks code reviewing is not primarily a gate to keep bugs out of code, but a social step, to get a second set of eyes on code, and to do knowledge transfer. Code review is a much more natural way, rather than sending out an email, to keep colleagues informed about changes to areas of their interest; developers everywhere are all drowning in email. (Of course, if someone creates a new test methodology or wants to do something big, there should be appropriate communication about that.)

Historically, at least in large engineering (old-school) nondistributed environments, code reviews look very different. Code review used to mean that everyone got together in a conference room & helped each other. Launchpad has always done individual reviews. Still, learning from each other is one of the greatest benefits of this — just seeing different techniques.

For example, the Launchpad testing structure is a moving target, constantly improved & added to. A code review lets you see how others use their testing infrastructure in ways you might not even have known about. Test results and test matchers — there is not as much communication/documentation about it, so reviewing others' code makes a reviewer a better coder.

One good example: Brad made some API changes. Francis extended what Brad had done & changed how some of it worked. So Francis asked Brad to review those changes. He knew Brad had domain knowledge, & knew Brad would want to keep up with evolution of that part of the code.

It's important to talk to the team, and understand what people are trying to accomplish. And this helps overcome generational differences (people's understanding differs based on when they joined the team).

Writing merge proposals made former Launchpad developer Leonard Richardson a better coder, he said. Crittenden adds: it depends on how you do it. There's a wide variety of quality in merge proposals. Some are meticulous about documenting big changes; some are minimalistic. Brad's in the middle, and tends not to keep very good notes while working on a branch. In contrast, his colleague Curtis Hovey starts writing his merge proposal as he starts a branch. It's a good habit and tends to mean very high quality work, Crittenden says.

To make the good habit easier, Bazaar's Launchpad plugin includes some tool support to open your email client and create a skeleton merge proposal for you to send with your branch. The template tells you to include: Summary, Proposed Fix, Pre-implementation notes, Implementation details, Tests, and Demo and Q&A.

Brad finds the plugin VERY useful, as it helps him remember to get all the sections taken care of. But Launchpad has not been successful in getting adoption widely. Most developers seem to be using the web interface to write merge proposals.

The Launchpad code review process doesn't seem to have a bikeshedding problem. People are fine with rubberstamping reviews, especially small branches. There used to be a tendency to want to say something, as proof you did the review & got to the end, but now there isn't. Richardson suggests that code review bikeshedding decreases over asynchronous & text-based channels, since it takes effort to write something. And minor things that come out of a code review are optional. If the reviewee doesn't agree or thinks they're optional, s/he doesn't have to do it, & reviewer should be ok with that.

One social aspect of reviews: developers recognize each other's work and respect the time and effort it takes to write a branch. So many reviews start with a small note like, "hey, Robert, nice branch, I appreciate the work you put into it." And without being contrived, people do make an effort to point out positive things; that is a part of the team's culture. People work hard on these things, and there is usually something praiseworthy in code being reviewed.

There's a custom of not being too pedantic; cosmetic coding standards, such as the more nitpicky bits of PEP 8, are not something to fuss about in code review, and shouldn't slow down the process. But the downside there is that if you don't enforce them, they aren't part of your culture & standards. So that's a balancing act, and a discussion going on now in the Launchpad dev community.

On-call review has a useful immediacy, and reviewers are encouraged to have conversation in IRC. So often, in a review, you'll see a note to "do foo that we talked about in IRC". This is bad for the historical record, but immediacy is better & more important than what's lost. After all, even though it's helpful to be able to look at old code reviews, very few people do that. The really helpful thing is that, when you're confused or a bug pops up, you can trace things back to individual merge proposals to give perspective on that revision.

Challenges & problems with code review process[edit]

The 20% commitment is indeed heavy. And developers find that having a heavily interrupt-y day means they don't get much code work done that day. On the other hand, they tend to use that day to write blog and wiki posts, and do other community-facing work that doesn't require such sustained attention.

Some reviews take place outside OCR, like intra-team reviews. If Brad is working on an unreleased feature, it's easy for him to ask someone who already knows about it, so he may ask his team lead or colleague to review, and not use the On-Call Review reviewer. He'd do this for efficiency, but of course this is opposed to the goal of bringing people up to speed.

One challenge is that there are areas of the code that few people know well. If the developer or reviewer doesn't ask for a review from these people, things might be missed. But the team is trying to address that, partly by the new team structure where everyone is now expected to work in all application areas of Launchpad.

For UI reviews: for a time Launchpad had a subset of reviewers who were UI-specific. Two very strong personalities were in charge of UI changes, and they ensured consistent presentation across the web app. These two people did most of the UI reviews. The team tried to grow a UI review team, but that never actually worked that well. Developers did not consistently ask for UI reviews, and when they did, the smaller team became a bottleneck. So now Launchpad has no UI review team; everyone tries to review everything to best of their abilities. Tips: https://dev.launchpad.net/UI/Reviews

And similarly, with JavaScript, for a while Launchpad had a small number of people who felt comfortable writing & reviewing JS, but now it has spread to all their teams. Now everyone's expected to be up to speed on our use of JS, and everyone should be able to review it.

That being said, the team has tried to encourage an atmosphere where there's no fault and no shame in someone claiming review work to do, finding themselves over their head, and backing out and asking someone else to do it. In Crittenden's opinion, they don't want someone to do a cursory review because they don't understand the subject matter. Stopping a task and asking for help like that doesn't necessarily come easily to engineers, "but we try to encourage it."

And then there are application- and domain-specific review issues. For example, Soyuz is the portion of the Launchpad codebase that manages the build farm. It was one of the first components in Launchpad, and has lots of complicated legacy code. It has to follow Debian packaging rules and manages how all the builders work. People who aren't as familiar with Soyuz will do Soyuz reviews, "but it creeps them out," and are more apt to hand that review to a Soyuz expert or ask for a second review.

So there's a balance to be struck here. Brad would prefer to see people not do a review for an area they don't understand; of course, the only way they'll learn some areas is by doing reviews. So the Launchpad code review process addresses that by combining knowledge-sharing and review sharing. There's tool support in the merge proposal section of Launchpad to claim a review, and to yield a review after you claim it, reassign it to the Launchpad review team, and subscribe to it to watch for the review. You can do this if you want to learn to be a better reviewer.

As for mentoring — it's really left up to the mentor to handle it as they see fit, and Crittenden says, "we don't really have a good process for that." The mentoring process is not very well documented, and "sorta squishy."

A few lessons:

  • Mentors & mentats (term for people being mentored, "a silly internal thing to Launchpad") need to overlap at least a few hours a day.
  • The training process can be burdensome on the mentor. Many people find that reviewing someone else's review is much more work than simply doing review themselves. The team tries to give mentors a break in between mentoring stretches, and spread the work around.
  • One big factor: geography. Launchpad had one mentat in Australia, & it's hard to be an OCR in that time zone. It took a long time for him to train up, because he didn't have enough reviews to do as practice. The team encouraged people, if they had branches that didn't have time constraints, to put them to him & have reviews done offline. But most people didn't follow up, so it took a long time to get him enough practice.

Merge/Deployment process[edit]

Overviews:

A brief overview of the deployment process:

  • Developer writes code & submits it with merge proposal
  • Reviewer reviews branch, approves
  • Reviewer sends the code to automated testing via Amazon EC2 https://dev.launchpad.net/EC2Test
  • EC2 test suite passes
  • Developer lands branch into trunk, and a tool automatically tags the bug in Launchpad to alert QA to look at it & branch on staging machine
  • A developer checks branch, marks it QA OK and it's ready to be deployed. While Launchpad does have one QA engineer, he is not expected to do all of the QA of individual branches.

Some details:

Anyone on the team can revert the branch. Reversions of something that's been OK'd & merged: rare, but it happens.

For example, tech architect Robert is into efficiency & DB performance, and often has good ideas that neither reviewer nor developer thought of, so sometimes he'll come in & suggest a new approach or changes. In these cases he would add a comment to a merge proposal so it would not go forward, after it's been approved but before landing in trunk.

But the team has a process so that even stuff landed onto trunk can be reverted before deploy, although they try to avoid it. It's part of the deployment strategy, and not considered part of the code review process. The nice QA tools Launchpad folks have created to manage this stuff are part of the Launchpad suite: https://launchpad.net/launchpad-project

Branches landed in trunk are marked in sequential order, and the team can only deploy up to the current/last branch number that's "ok" or "not testable". So there's peer pressure to do QA and to write good code; you don't want to be the person blocking deployment. If one is marked QA BAD, not landable, someone needs to manually revert that.

Usually the developer lands the changes. The reviewer only does it for community members.

(Launchpad doesn't have packaged releases. "Launchpad is not packaged and there are no plans to do so. Launchpad deployment is done straight from Bazaar branches and is quite complex.")

Pre-deploy automated testing (EC2)[edit]

Launchpad uses EC2 for all its premerge testing. Crittenden raves: "The tool support is fantastic. I can just tell EC2 to run tests against this URL (merge proposal) and that's all it needs. It finds the referenced branch and does everything from there." If all tests pass, it goes to the Patch Queue Manager for committing: https://launchpad.net/pqm/

You can grab this codebase: https://dev.launchpad.net/EC2Test . In "utilities" there's the EC2 script that looks elsewhere in the tree; you can look in the path & find it. That bit of code has evolved, and is now sophisticated & more complicated. The tool used to have robustness problems; stuff from EC2 would disappear and you would not get results from testruns. But it's now fairly solid; users haven't had those problems for about a year or so.

Launchpad moved to EC2 because of "the shameful fact that our test suite is so large, unwieldy, & slow." It takes 3.5 hrs to run the full test suite. So EC2 testing addresses a few issues. A reviewer can fire it off & forget about it; before that, each developer had a dedicated test machine, or resources eaten up for several hours. Also, instead of having a really long testing queue on the one test box for PQM, the test infrastructure can parallelize and avoid queueing. Another benefit: developers can manage the EC2 environment & make it as close as possible to the production environment, way more than each dev could customize their dev box.

Canonical currently does not extend commit rights to community members. One fairly nice side-effect: EC2 usage costs money, and Canonical doesn't want to ask community contributors to pay to get code committed, so Canonical reviewers send code for testing and the company foots the bill instead.

Thanks[edit]

Sumana Harihareswara wrote this document, with information provided by Brad Crittenden, Francis Lacoste, and Leonard Richardson; thanks. All errors are the author's. :)