MediaWiki 1.17/Release postmortem

From mediawiki.org

We released MediaWiki 1.17 on June 22. In the interests of doing better next time, a small group of us (Tim, Chad, Sam, Sumana, and RobLa) got together to brainstorm what went right and what we need to look at. RobLa then summarized that discussion, and wrote this summary up. Any first person references are probably me (RobLa), and any references to "we" is probably the group above. See the history for this page for the raw notes.

Note: this is specifically about the MediaWiki 1.17.0 release, rather than the 1.17 deployment.

Timeline[edit]

Here is the timeline, derived from SVN commit logs:

  • 2010-07-28 - MediaWiki 1.16.0 released
  • 2010-12-07 - REL1_17 branched. This is the branch that MediaWiki 1.17.0 was based on.
  • 2011-02-03 - 1.17wmf1 branched
  • 2011-05-05 - MediaWiki 1.17.0beta1 tagged
  • 2011-06-14 - MediaWiki 1.17.0rc1 released
  • 2011-06-22 - MediaWiki 1.17.0 released

How it went[edit]

We started by brainstorming "what went well" and "what to look at". In the initial brainstorming, the original group had many more items in the "what to look at" section than in the "what went well". I then set about organizing things, and settled upon four categories: substance, polish, timing, and process. What became clear was that we felt pretty good about the substance and polish of the release (where positive and negatives balanced out pretty well), but the timing and process categories had the most that we needed to look at.

Substance and polish[edit]

As for the substance, it went very well. We had three large features (ResourceLoader, category sorting and the new installer) that complicated this release. As of this writing, it looks like these features are in pretty good shape, and we can be pretty proud of releasing them in the state that they're in. We fixed a lot of bugs (207 noted in the release notes), and made many smaller improvement to the codebase. Everyone was right to be very eager to get this release out.

Things of substance that didn't go so well: our PostgreSQL support suffered until quite late in the process, and our command line installer is incomplete in some frustrating ways. On PostgreSQL: the developers who fixed the last of the bugs aren't people that use PostgreSQL on a day-to-day basis. The folks that normally develop our PostgreSQL support had other engagements, and we don't have a very deep list of people to fall back on. We need to work out a plan for engaging PostgreSQL users as developers in this area, or it will be very difficult to continue support for this DB. The command line interface to the installer just needs a little more time to mature; there are many ways of solving this problem without delaying a release, but I won't get overly prescriptive in this writeup.

The polish of 1.17 was superb. The release notes were well-written, and there hasn't been an urgent need for a rapid 1.17.1 release. We'll do one anyway, since there were a couple of niggly bugs that can be fixed easily enough.

Timing[edit]

As noted, the biggest area for improvement is around the timing and release process. It wasn't all bad; we did (just barely) manage to keep the release cycle under one year. Still, that's much longer than our aspiration of quarterly releases, or even the previous historic norm of 2-3 releases per year. Moreover, it has been a long time since branching 1.17, so we already have seven months worth of work backed up for future releases. 1.18 was branched in early May, so in addition to the five months of changes we have backed up for that release, we already have two more months of changes backed up for 1.19.

The biggest thing that delayed this release (and the 1.17 deployment in March) was the code review backlog. That topic has been covered in many earlier threads, but a brief recap: after the 1.16 release, we fell way behind on code review, relying solely on Tim up until that point. We added more reviewers in October, which helped us get the backlog down to a reasonable level by December. We branched, finished off the 1.17-specific review, and deployed. Further minor review work was needed prior to the 1.17 release. With more Wikimedia Foundation developers spending 20% of their time on review, we're optimistic we'll be able to finish off the backlog and stay on top of the review process.

As we drew closer to the 1.17 release, we issued 1.17 beta 1. This beta unintentionally lasted several weeks as we tried to finish off the last of the release blockers. In particular, a security bug we worked on during this time created an awkward situation, since we had to iterate multiple times to fully plug the hole. The good news, though, is that the period was long enough for us to get some good end-user testing and bug reporting prior to the final release.

Process[edit]

Process is where we need the most work. The actual logistics of putting up the tarball and other bits are working well (these haven't changed in years), but everything leading up to that point could use a lot of streamlining.

The first issue is purely one of scoping. Right now, we're not terribly deliberate about what goes in and what is out. Part of the problem we have here is that opinions vary as to what a reasonable release interval is. The range of opinion seems to be anywhere from "multiple times a day" to "every six months". It's difficult to plan this without getting consensus on this point, and it's difficult to get consensus without first proving that we can get on top of the code review backlog and stay on top of it. If we go with a longer cycle, we can consider adopting a process similar to GNOME[1] or Ubuntu or other project that has a good track record for sticking with a regular releases. The most interesting practices there involve having clear deadlines for proposing new features, deadlines for features being done or pulled, and other date-risk mitigation strategies.

As with the code review process last year, this year, we're probably too reliant on Tim to not only drive but execute many steps. One way we can speed up the process is to document it, making it clear where we are in the process, and more importantly, how people can help. "Help" can mean explicitly doing the work, but it can also be simply "don't do things that delay the release further", or "stop others from delaying the release". We have a wonderful Release checklist, but that list was too focused on the last steps before the release. Many steps before the actual publication of the tarball were missing, so they've been added into that document. More work can be done there. Additionally, we will probably experiment with other team members (e.g. Chad) performing at least alpha or beta releases.

During this release, we tagged many revisions for backporting from trunk with the "1.17" tag. This process was useful, as long as people remember to untag once they've merged. There was some confusion at various times who was responsible for doing this work. It switched a few times between Roan, Chad, Tim and others. Additionally, pretty much everyone felt empowered to tag things for backporting, but there probably wasn't enough discipline in trimming that list back before actually making the change. Some unreviewed changes were backported (or directly applied) to the release branch, causing confusion and delay. We have a policy about backporting [2], but that policy wasn't followed very closely.

The process of finding release notes that weren't added and then backporting them was work that could have been done by people other than Tim, but Tim ended up doing most of this. This is work that needs to happen sooner in the process in a more distributed fashion. Additionally, one way to avoid this extra work is to keep backporting to a minimum in the first place.

This gets to the larger issue of communication and momentum at the end of this process. With timezone differences, it's not sustainable to have daily scrums all of the time, but having scrums during the last couple of weeks or so in the process may help keep things moving to the end.

Recommendations[edit]

This section is intentionally left unfinished. The goal of this was to establish and document what happened. To the extent anything is incorrect or misleading above, corrections are encouraged. Recommendations for new things to try based on lessons learned from this release should be included below:

  • your recommendation here

...and possibly discussed on the talk page (suggestions above may be ruthlessly edited; talk page is better for attribution and preservation).

References[edit]