MediaWiki 1.17/Release postmortem

Introduction
We released MediaWiki 1.17 on June 22. In the interests of doing better next time, a small group of us (Tim, Chad, Sam, Sumana, and RobLa) got together to brainstorm what went right and what we need to look at. RobLa then summarized that discussion, and wrote this summary up. Any first person references are probably me (RobLa), and any references to "we" is probably the group above. See the history for this page for the raw notes.

Note: this is specifically about the MediaWiki 1.17.0 release, rather than the 1.17 deployment.

Timeline
Here is the timeline, derived from SVN commit logs:
 * 2010-07-28 - MediaWiki 1.16.0 released
 * 2010-12-07 - REL1_17 branched. This is the branch that MediaWiki 1.17.0 was based on.
 * 2011-02-03 - 1.17wmf1 branched
 * 2011-05-05 - MediaWiki 1.17.0beta1 tagged
 * 2011-06-14 - MediaWiki 1.17.0rc1 released
 * 2011-06-22 - MediaWiki 1.17.0 released

How it went
We started by brainstorming "what went well" and "what to look at". In the initial brainstorming, the original group had many more items in the "what to look at" section than in the "what went well". I then set about organizing things, and settled upon four categories: substance, polish, timing, and process. What became clear was that we felt pretty good about the substance and polish of the release (where positive and negatives balanced out pretty well), but the timing and process categories had the most that we needed to look at.

Substance and polish
As for the substance, it went very well. We had three large features (ResourceLoader, category sorting and the new installer) that complicated this release. As of this writing, it looks like these features are in pretty good shape, and we can be pretty proud of releasing them in the state that they're in. We fixed a lot of bugs (207 noted in the release notes[1]), and made many smaller improvement to the codebase. Everyone was right to be very eager to get this release out.

Things of substance that didn't go so well: our PostgreSQL support suffered until quite late in the process, and our command line installer is incomplete in some frustrating ways. On PostgreSQL: the developers who fixed the last of the bugs aren't people that use PostgreSQL on a day-to-day basis. The folks that normally develop our PostgreSQL support had other engagements, and we don't have a very deep list of people to fall back on. We need to work out a plan for engaging PostgreSQL users as developers in this area, or it will be very difficult to continue support for this DB. The command line interface to the installer just needs a little more time to mature; there are many ways of solving this problem without delaying a release, but I won't get overly prescriptive in this writeup.

The polish of 1.17 was superb. The release notes were well-written, and there hasn't been an urgent need for a rapid 1.17.1 release. We'll do one anyway, since there were a couple of niggly bugs that can be fixed easily enough.

Timing
As noted, the biggest area for improvement is around the timing and release process. It wasn't all bad; we did (just barely) manage to keep the release cycle under one year. Still, that's much longer than our aspiration of quarterly releases, or even the previous historic norm of 2-3 releases per year. Moreover, it has been a long time since branching 1.17, so we already have seven months worth of work backed up for future releases. 1.18 was branched in early May, so in addition to the five months of changes we have backed up for that release, we already have two more months of changes backed up for 1.19.

The biggest thing that delayed this release (and the 1.17 deployment in March) was the code review backlog. That topic has been covered in many earlier threads, but a brief recap: after the 1.16 release, we fell way behind on code review, relying solely on Tim up until that point. We added more reviewers in October, which helped us get the backlog down to a reasonable level by December. We branched, finished off the 1.17-specific review, and deployed. Further minor review work was needed prior to the 1.17 release. With more Wikimedia Foundation developers spending 20% of their time on review, we're optimistic we'll be able to finish off the backlog and stay on top of the review process.

As we drew closer to the 1.17 release, we issued 1.17 beta 1. This beta unintentionally lasted several weeks as we tried to finish off the last of the release blockers. In particular, a security bug we worked on during this time created an awkward situation, since we had to iterate multiple times to fully plug the hole. The good news, though, is that the period was long enough for us to get some good end-user testing and bug reporting prior to the final release.

Process
Process is where we need the most work. The actual logistics of putting up the tarball and other bits are working well (these haven't changed in years), but everything leading up to that point could use a lot of streamlining.

The first issue is purely one of scoping. Right now, we're not terribly deliberate about what goes in and what is out. Other project that are better about sticking to the regular release cycle (e.g. GNOME, Ubuntu) are also better about deciding in the beginning what's in and what's out, and more importantly, having early deadlines for work being done or being dropped. Opinions vary as to what a reasonable release interval is. The range of opinion seems to be anywhere from "multiple times a day" to "every six months"


 * release process doesn't have clear phases like other projects (e.g. Ubuntu)
 * We should expand the checklist then. Make it a bunch of small incremental steps that anyone in the group can tackle.

During this release, we tagged many things "1.17" for backporting to trunk.


 * "1.17" and other similar tags for noting "things to backport still" was very useful -- as long as people remember to untag once they've merged
 * "Who is doing the backports" was in question several times. It switched between Roan, Chad, Tim and random hangers-on.


 * communication/momentum at the end. need daily scrum in last 2 weeks or so?

As with the code review process last year, this year, we're probably too reliant on Tim to not only drive but execute many steps.


 * Release notes -- the process of finding release notes that weren't added and then backporting them was a huge pain for Tim
 * Communication, in the last few weeks, among Tim & Roan & other key personnel?


 * Wasn't this mainly for the backports etc?
 * Yes, but people were backporting stuff without backporting release notes too. This ended up being a huge time-waster that should've been handled by people doing the backports


 * Some unreviewed changes were backported (or directly applied) to the release branch, causing confusion and delay
 * How much could we fairly limit general users backporting stuff during stabilisation?
 * Actually we already have a policy on this (http://www.mediawiki.org/wiki/Commit_access_requests#Guidelines_for_applying_patches - bullet points 4 & 5). Might be time to refresh everyone's memory on the list.

For example, we will probably experiment with other team members (e.g. Chad) performing at least alpha or beta releases.

We have a wonderful Release checklist, but that list was too focused on the last steps before the release. Many steps before the actual publication of the tarball were missing, so they've been added into that docment.

To jog your memory: http://www.mediawiki.org/wiki/Release_checklist


 * Plus Didn't mop up a lot of time from people outside of GenEng

[1] Release notes:  http://www.mediawiki.org/wiki/Release_notes/1.17