Git/Gerrit evaluation

This is the homepage for the Gerrit review taskforce created at the 2012 Berlin Hackathon.

Timeline

 * July - build this page out with all serious alternatives
 * July - wikitech-l discussion about alternatives.
 * Week of August 6 - final review discussion

Brion Vibber will lead this evaluation, with help from Chad Horohoe and David Schoonover.

Reference Materials

 * Our Gerrit, in all its glory.
 * List of Gerrit Bugs That Matter
 * In order to properly understand Gerrit, you might want to try performing a few common tasks (please do add more here):
 * Attempt to find a link to the source for a project
 * Search for a username

Criteria by which to judge a code review tool
We need criteria by which to judge the validity of arguments in favor / opposed to Gerrit. It seems to me that these criteria should be derived by the workflow(s) that we want to support. Obviously, the platform team and Ops wants to have a pre-commit review workflow. But looking at other teams, there seem to be other workflows in place as well. For example, the Mobile and Analytics Teams have post-commit review workflows. Ideally, a code review tool should allow different kinds of workflows.

Criteria
(please add / removeas you see fit, this is condensend opinion from the Analytics Team)


 * Support for pre-commit and post-commit review workflows (Unopinionated / does not impose a particular workflow)
 * Easy and clear diffs between commits
 * It has to be beautiful and moddable
 * It has to be intuitive and user-friendly
 * It should make source code easy discoverable
 * It should be fast and responsive
 * Support for pull requests
 * Contrast what needed to be coded vs what was coded
 * It should enable deep integration with bugzilla

The case for Gerrit

 * Primary editor: Chad Horohoe. Others should feel free to contribute.


 * LDAP integration. LDAP is a requirement for tie-ins with Labs and other developer/operations projects. Gerrit supports this out of the box with minimal configuration.
 * Robust ACL. Gerrit has the ability to grant/deny permissions on the repository level and even down to the branch level. This is important because different teams have different workflows for their branches.
 * Additionally, Labs requires a private repository for puppet.
 * Automatic merge on approve, unlike Phabricator that only works with patches (which then must be applied to the repository).
 * Open source, under active development. Using free and open source software is very important to our community, and Gerrit is licensed under the Apache License. Additionally, the Gerrit project has a very active development community that is highly responsive to feedback, questions, and contributions. Google develops Gerrit to support the Android Open Source Project (AOSP), but there are many other contributors, including Qt and OpenStack.
 * Quick release cycle. 2.2.x, 2.3.x and 2.4.x have all been released in the last year (with 2.3 and 2.4 both in the last several months). The branching of 2.5 is currently being discussed.
 * I pushed the Gerrit repository to GitHub as a way of getting some stats about commit activity. The data does indeed show that Gerrit is under active development, with the number of commits per month increasing slightly but steadily over the course of the past year. Ori.livneh (talk) 23:00, 13 July 2012 (UTC)
 * Handles 'pre-commit' style review. One of the major motivations for switching from Subversion was for the ability to review changes before they landed in trunk/master. This has stabilized our codebase and enabled the rolling release cycle to WMF wikis (currently every 2 weeks, working towards every week).
 * Command line access. It's possible to interface with Gerrit from the command line, which makes scripting Gerrit actions relatively easy, and provides a less hideous alternative to the Gerrit UI.
 * SSH handled by gerrit From a security perspective, it's nice that Gerrit doesn't use the system SSH, which means users don't have shell access to the gerrit host, and we don't need hacks like sillyshell.
 * You could simply use a second ssh instance for that. Platonides (talk) 20:44, 13 July 2012 (UTC)
 * Don't want to change a system users _just_ got used to a little bit and introduce another one that will have it's own issues, and scare away beginners who are overwhelmed already by the number of tools and UIs they are supposed to use for labs. Mutante (talk) 09:15, 12 July 2012 (UTC)
 * Git, Gerrit and Bugzilla are all supported by eclipse.
 * This hints that some of the criticism stated is not be 100% relevant.
 * Every time there is a major change in the critical toolsets (bugzilla/git/gerrit/jenkins/labs) this is a major setback in time -- I need to keep abreast of changes in the Java ecosystem (maven/lucene/solr/haddoop/open relavance/carrot/uima/apertium/gate/etc...)

Known issues slated for improvement

 * Slowness due to the current deployment configuration -- should be fixable by moving the database closer to app server (currently in different datacenters, sorry!)
 * Ease of extending Gerrit -- in 2.5, Gerrit is adding a plugin/extension interface that should make it easier for us to add MediaWiki/WMF-specific features we want or need (in fact, "Project Deletion" is already being written as a plugin!)

The case against Gerrit

 * Primary editor: David Schoonover. Others should feel free to contribute.


 * Gerrit accounts must be requested. This slows down all contributions, and is particularly egregious in that it slows down onboarding of new WMF staff. Weeks have been lost waiting for all necessary rights to be given to new staff.
 * Is this an inherent issue or is that something we can reasonably change? What limitations exactly prevent us from allowing open registration? --brion (talk) 17:47, 11 July 2012 (UTC)
 * Alternately, we should be able to create a guest account with password guest for anonymous comments
 * Gerrit does not have an official API. Existing tools work by reverse-engineering Gerrit's internal APIs, which are largely undocumented.
 * It is not possible to alter the sort order in which changes are listed. Changes are listed by last update date, so the changes that have been neglected the longest are also the hardest to see. A feature request for parametrizing the sort order was accepted in 2010, but no progress has been made as of July 2012..
 * I think this is a rather serious issue. Not being able to sort columns in Gerrit coupled with the fact that recently added/commented on patchsets jump to the top of the list makes it easy for older/unreviewed commits to languish in the backlog. I've made it practice when doing code review to start from the bottom of the stack to help combat this, but it's a total PITA and it is not likely common practice. Arthur Richards (talk) 22:01, 13 July 2012 (UTC)
 * Gerrit's developers want to move away from using a relational database as a data store. As a result, they are reluctant to invest time in improving Gerrit's query interface. Such a change would entail a costly migration for us.
 * Gerrit's codebase is written in Java and Prolog(!) and uses Google Web Toolkit. Few (if any) engineers at the foundation are proficient with this stack. Large swaths of code in the Gerrit codebase are undocumented, making it hard to understand or modify.
 * I dont see this as a problem necessarily unless we expect to modify the codebase. As for Prolog and Java, I have coded in both languages (it has been a while using Prolog, but it is fairly easy to pick up if ever this became necessary).  Ssastry (talk) 15:58, 11 July 2012 (UTC)
 * I disagree, its a very very minor point to be sure, but things are better when most people can read/edit the code, since its easier to fix issues when they arise, and customize to our needs. (I also know prolog, and like that language, but it is an obscure language). Bawolff (talk) 19:48, 11 July 2012 (UTC)
 * The ability to add a repository requires administrator privileges on Gerrit. However, administrator privileges also grant unlimited access to gsql, a raw SQL interface to the Gerrit database. As a result, administrator rights cannot be extended beyond a narrow group of people (currently five). This creates a frustrating bottleneck for developers, who must compete for the attention of the overburdened administrator group for routine Git requests.
 * Creating a gerrit project does not require admin access, it requires Project Creation access. Which was granted to a larger group of users. Additionally, there is currently discussions going on to implement a "request queue" type of system so users can say "I want a repo called foo" and the approver can just come along and press a button.
 * Gerrit is slow.  The "Groups" listing takes almost a minute to render.
 * "Gerrit is slow" is partially due to some mistakes on our part (putting the database and app server in different datacenters). This is being worked on and will be resolved soon. The groups listing page is annoyingly slow, but I'm working on profiling the problems here and submitting a fix upstream for it.
 * There is no way to delete projects. A request for this feature was opened in 2009 and has been starred by over a hundred people.
 * In Gerrit 2.5, there will be a new plugin that makes this possible.
 * As of 19:59, 10 July 2012 (UTC) There are 164 issues on the Gerrit bug tracker that are status:accepted and have no owner. Of those, two are set as "critical" and twelve are set as "major".
 * I don't neccesarily see that as a bad thing. Have you seen our bug database? Better they track issues that they don't have resources to or otherwise aren't working on now, then to just simply sweep them under the rug. Bawolff (talk) 19:48, 11 July 2012 (UTC)
 * Post-merge review is impossible. Comments just get lost and bugs are not effective.
 * Lost how? Do they not save, or are there error messages? What do you mean "bugs are not effective"? --brion (talk) 17:49, 11 July 2012 (UTC)
 * I think that may be the same thing I was going to post. Once a change is merged, it's sacred. You can't mark it -2 even if it's a php error. You can provide a comment, but nothing marks it as requiring attention (how many mails reminding fixmes were sent by hexmode?).
 * With a "magic MW VCS" it would be "unmerged", and leave pending for a new merge after it gets fixed. As git doesn't support it (without producing merge conflicts to git users), the most sensible approach would probably be to "freeze" master until a patchset which solves it gets merged. Or even without such strict ruling, if it keeped a track of broken ranges that would be useful (obviously show a big warning when master or a branch is inside such range...).
 * Platonides (talk) 20:36, 13 July 2012 (UTC)
 * post-merge workflows are almost impossible; self-review is a hacky and tedious workaround, see the recent discussion about the operations/puppet repository and the percentage of self-reviews.

border-color: #FCF #F0E #F0E #FCF; color: black !important; text-decoration: none !important;" title="changes/14/3714/1">14/3714/1 label gets exposed in gitweb... Looks like a case number in some Kafkesque trial.
 * It is difficult find or track projects unless someone walks you through it. For example, there is only one very small, hidden link to view the list of projects. This makes the state of any particular branch pretty opaque and potentially creates an issue for managers or others who want to have a window into our code.
 * There is little to no ability for non-technical staff or volunteers to meaningfully contribute (e.g. to processes like code review or simply commenting). Compare to GitHub or even Bugzilla, where FOSS projects see a large amount of comments or other contributions from designers, product managers, or users whether they have familiarity with git or commit access.
 * Can you articulate more clearly what the issue is here? Anyone with an account can comment or even -1/+1 commits. And we have a very large and complex codebase (MediaWiki + extensions + ops stuff + tools and misc), no matter where we host it, which of course makes it harder for people to jump in and comment. Does GitHub offer specific tools that would help engage non-technical folks, or are you just saying "Gerrit's UI sucks" one more time?.--Eloquence (talk) 17:41, 11 July 2012 (UTC)
 * I suspect the main problem comes down to 'accounts must be requested' up above -- you can't make comments until you've been placed in the system, whereas github, bugzilla etc allow open registration. --brion (talk) 17:46, 11 July 2012 (UTC)
 * It's not just having to request an account. Unlike tools such as GitHub, Gitorious, or Bugzilla, it is really really hard for anyone who's not a developer to find the relevant projects, relevant commits, and then contribute comments that are not strictly part of the code review process. For example, I had to go ask RobLa in IRC about how to find a simple list of projects. In GitHub, a repository is not just easier to navigate through the GUI (AKA the issue Erik is talking about) but is very tightly integrated with tools that anyone can use to track what is happening in a project (i.e. watching users and repos) and then contact someone when they have a comment. I find it was actually better when we were on SVN and the only Web interface was the comment-less ViewVC, because it forced conversation either on to Bugzilla or on-wiki. One of the largest weaknesses about the MediaWiki development community is the fact that we have historically not done much to include designers, product people, and users in the collaboration on code. This is how we ended up with the MediaWIki we have today, which everyone agrees has deep issues with usability. We need to strike at the heart of this problem by choosing tools that work for everyone who has a stake in the project, not just people committing or reviewing code from a purely technical standpoint. Steven Walling (WMF) &bull; talk   20:38, 17 July 2012 (UTC)
 * UI sucks profoundly: too wide, clumsy, no visual comparison for image changes (ol' good CodeReview did this!), no "view this file" link, just "download as zip" (WTF????). Almost no flexibility when it comes to searching for anything other than the most recent activity overall.
 * This was a configuration issue, we've had image diffs available for a couple of weeks now.
 * Operaphobic
 * Per the UI sucks, It's not desinged to encurage volenteer development, its geered towards a single project, single team proccess for deployable code. Look at the main "landing page" ... there is little concept of what projects are represented, no per project pages with descriptions, no agragate views of which projects are more active than others, no person icons, no sense who is working on what, too focused on individual commits rather then the final change state represented by a pull request for a particualr feature or refactor. There is a false sense of linear pipeline rather than a distributed collection of interconected activity. There is no big "fork me" button! ( there is no per-project interface views, so not clear where you would put that ). Aside from getting deployable code the tools for feature develop should also encurage expeirmentations and be fun. Github is good at doing this, makes coding and sharing "fun". Gerrit is not fun.  --Mdale (talk) 23:28, 11 July 2012 (UTC)
 * Sometimes, development requires a number of commits before a feature/bug/change/refactoring can be fully worked out. So, code review requires a review of a set of commits rather than individual commits.  I may not be fully aware of Gerrit's abilities, but a commit-based review model gets in the way of such review flows.  Alternatively, it prevents developers from committing something and amending commits till all work on that phase is complete.  This may also encourage work in smaller reviewable chunks, so, I am not going to make this a strong case against gerrit.  I prefer frequent and regular commits to let others work on it or take over partially done work -- and the in-progress commits may not necessarily be reviewable or need to be reviewed.  Ssastry (talk) 03:32, 12 July 2012 (UTC)
 * This is not a hypothetical situation. I am right now reworking some code and I dont see a way to get everything reviewed at once.  I have to either rebase all commits to squash them into one giant commit (which is totally undesirable), or I have to submit all commits as a chain of dependent commits, and hope that I dont have to amend any of the earlier commits as part of a review, or work out an arrangement with the reviewer to accept my commits en masse and that I'll made any fixes as part of a new set of commits which could be reviewed.  Am I missing some Gerrit feature which will make this less of an issue?  There was an email recently on wikitext-l where Mark Bergsma was asked to squash his commits (http://osdir.com/ml/general/2012-07/msg20847.html) -- I personally think this is bad practice.  -- Ssastry (talk) 21:54, 13 July 2012 (UTC)
 * Related comment: dependent commits are problematic because amending an earlier commit breaks the commit sequence and I have to either amend all later commits or break up the commits into independent branches by rebasing (which I used recently when I forgot to use topic branches). -- Ssastry (talk) 21:54, 13 July 2012 (UTC)
 * Can't search for file names / changes that touch a file. 'We no longer have a database table that enumerates every file modified in a change, which makes it impossible to query for the changes that touch a file. Mutante (talk) 09:12, 12 July 2012 (UTC)
 * It's related to Ssastry's comment a bit above this one, but simplified and generalized: Gerrit suggests using lots of amends. The more natural Git way of improving on an idea is adding another commit in the branch. That's how Github works. It allows to see how the idea developed. As far as I understand it, amends are supposed to be used only for fixing commit messages or embarrassingly mistaken commits, but Gerrit takes amends to the extreme. --Amir E. Aharoni (talk) 11:15, 12 July 2012 (UTC)
 * In some cases Gerrit shows unrelated files as files that were changed in this commit. I never quite understood which cases are these exactly, but it has something to do with rebasing. --Amir E. Aharoni (talk) 11:15, 12 July 2012 (UTC)
 * This has to do with rebasing between changes, and work is being done in master right now to resolve this issue.
 * A changeset can't be applied to several branches. If the fix landed on master has to be committed to REL1_19, REL1_20, wmf/1.20wmf7 you need to submit other three independent changesets. Expected: you could request a merge to several branches on one commit, and perform the merge on that same page (assuming it merges fine, which it probably does). Platonides (talk) 20:42, 13 July 2012 (UTC)
 * UI: I wanted to look up one of my older reviews on gerrit so I could paste a link here for documenting the dependent commits amend problem, but my dashboard only shows me the most recent X actions and I cannot go back in history. -- Ssastry (talk) 21:54, 13 July 2012 (UTC)
 * LDAP support in gerrit is kind of afterthought. HTTP, SSH and other interfaces are highly decoupled inside of Gerrit and there is hardly a common, application-wide, notion of the user and user authentication. That's why SSH supports only keys stored in some SSH-specific store; that's why issue 1124: Use LDAP for ssh keys is not easy to solve.
 * Baroque permissions. It is difficult to find out why something is not working and which permissions are exactly needed to achieve something without defailed analysis of global, group and per-project permissions.
 * Gerrit defeats git capability to support off-line and detached work. While, with some effort it is possible to download refs for all changes pending in gerrit, it is impossible to  review and comment on the code off-line and send review results upstream. So, no off-line work. Also difficult to work from command-line only (only some operations are possible via SSH). See also Git/Conversion.
 * Confusing naming of git refs: You have to use  (or   is supposed to take it over in future - TBC) to submit a change (  seems to be born to simplify this for end user). Then the patchset gets named   but you can still push changes to  . Looks like the need to hash   directory and ability to clone the Gerrit-specific ref tree  forced Gerrit developers to expose that confusing naming to the end users. Pity that <span xmlns="http://www.w3.org/1999/xhtml" class="change plainlinks" style="padding: 0px 4px; font-size: 70%; font-weight: normal; border: 1px solid; background-color: #FAF;
 * produces a messy git history, impossible to read e.g. with gitk or bisect, due to the significant amount of (useless) merges.
 * Gerrit's email notifications are close to useless. There's a lot of repetition between mails, little context about the changes and no way to actually get the unified diff. Extending it to do it (something that could be a three-line shell script with git) seems non-trivial.

Overview of Alternatives
This section is a light overview of all of the options we're considering or should consider.

Integrated options
Integrated option means it manages the repositories as well as the review process.


 * Github
 * Gitorious
 * Gitlabs
 * Gerrit

Standalone code review tools
These tools profess to do the job of code review completely independently of repository management
 * Phabricator - PHP-based tool from Facebook with integrated issue tracker, wiki, etc. The review component (Differential) is the piece we'd be interested in.
 * Comparison of Gerrit and Phabricator by a mediawiki dev
 * Barkeep
 * Review Board

Standalone repo management tools
These tools don't provide code review, but only provide access control for the repository.
 * Gitosis
 * Gitolite

Git repo viewers

 * gitweb (default with Git and Gerrit)
 * cgit

Serious Alternatives
This section is to enumerate the set of complete alternatives to Gerrit. If there is an option you believe deserves serious consideration, please add a new section for it.

How GitHub would be configured for Wikimedia use
Deploy and development repositories:
 * Maintain 2 repositories: one "deploy" repository that has very few admins, and another "development" repository that has a lot more trusted developers.
 * Code gets into the deploy repository only via pull requests from the development repository -- this can only be done when "trusted" developers/admins do necessary code reviews and issue pull requests to the deploy repository.
 * Periodic downstream rebasing/merging from the deploy repo into the development repo, if necessary (to account for any direct commits into the deploy repo).

To ensure bad code does not get into the deploy repository, development repository might have to be managed as follows:
 * Topic branches are used for code that requires review.
 * Commits on the master branch that are not reviewed, and cannot be part of a pull request are moved to a branch.
 * Problematic commits into the development repo either get fixed or reverted.
 * Protocols required for who can issue pull requests into the deploy repository.

Code gets into the development repository in 2 ways:
 * trusted developers (staff, volunteers, aliens) get commit rights and can directly push code.
 * anyone else in the world (who dont have commit access to the development repo) forks the development repo and issues a pull request into the development repo.
 * pull requests are used to do code review.

Reviewing development repo commits:
 * Via github comments -- drawback is that there is no way to track reviews. Given current review requirements via Gerrit, unlikely to be an acceptable model.
 * Via an external standalone review tool like barkeep.

The case for Github

 * Better usability all around
 * Anyone can participate in development -- creating user accounts, forking, commenting, submitting issues, etc. are all easy and open to anyone and removes WMF from the loop as far as these areas are concerned.
 * Commit rights determine who gets to push code to the repo.
 * Open source projects on GitHub are accessible to thousands of the most skilled, professional Web developers around, making it the best location to recruit new participants without spending donor funds on outreach
 * Already includes outstanding materials for introducing developers to git, providing better documentation than we currently have
 * Strong cross-platform support (at the desktop and browser level)

The case against Github
Drawbacks:
 * Depending on review requirements, could require a standalone review tool like barkeep to track reviews.
 * Reviews are not enforced, unlike Gerrit.
 * Depends on developers acting trustworthily and following protocols.
 * Requires MW contributors to get an account at a third party site which is time consuming
 * (discussion) Really? How is that time consuming compared to having to manually request an LDAP account and waiting for someone to review and create it. At GitHub you create an account in under a minute and off you go without needing dedicated user rights. Krinkle (talk) 05:08, 18 July 2012 (UTC)
 * I for one cannot read the github terms of service in under a minute, let alone also understand them. 81.10.140.176 12:58, 18 July 2012 (UTC)
 * Unnecessarily bringing in a further foreign authority is against hacker ethics, as hacker ethics call for decentralization.
 * Github's terms of service contain items that allow to disconnect users at GitHub's sole discretion: “If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption.”
 * github.com's web ui itself is not free software nor open source. We want to use free tools to develop free software, so use alternatives like gitlabs hosted on our own infrastructure. (independent from the gerrit evaluation). There is an Enterprise product to have github on your own servers. but we should not pay if there are free alternatives. Mutante (talk) 09:37, 12 July 2012 (UTC)
 * While I understand and appreciate the impetus behind this, 'free tools to develop free software' principle also means all mediawiki developers would have to stop using macs, ibooks, iphones, ipads during development. That means, Wikipedia could not be supported on iphones.  No using Amazon EC2 either.  I am being facetious, but only to make a point.  You have to draw the line somewhere.  To me, more pertinent is the ownership of code, data, and interactions on the site.  There is something to be the said for the collaborative and social nature of Github.  -- Ssastry (talk) 12:16, 12 July 2012 (UTC)
 * It's one thing to allow people to work with/for us using non-free tools (e.g. many Wikimedia Foundation employees prefer using some non-free stuff because they feel more productive and happier using those tools). It's another thing to make use of non-free software (and agreement to a corporate EULA) a requirement to work with us. The latter is much dicier, especially for volunteers. -- RobLa-WMF (talk) 18:54, 18 July 2012 (UTC)
 * This is not inevitably a minus, but at least a flag: GitHub recently received $100M in VC funding. Companies that receive large amounts of funding often pivot in order to meet investor objectives, so there may be large and sudden changes in GitHub's strategy affecting open source developers. While one can't strictly speak of vendor lock-in given that Git itself is open, moving our entire tooling to GitHub and then moving away from it if they do something we don't like is likely to be very difficult (given how difficult it is today to move away from Gerrit, which is entirely open source). While we can at least fix Gerrit if it bugs us, we will have limited recourse with GitHub. IMO it would be strategically unwise and inconsistent with our values to support a semi-open ecosystem and put our eggs in the GitHub basket.--Eloquence (talk) 17:37, 17 July 2012 (UTC)
 * Note though that because of the nature of Git, we don't truly rely on GitHub for anything. The repository can be pushed anywhere else, so in case of an emergency decision to move away from GitHub there is nothing that keeps one from doing that. "Everybody" has a local mirror of all public history. I agree, though, that it would be inconsistent with our values. Krinkle (talk) 05:13, 18 July 2012 (UTC)
 * It appears that github does not like giving access to bots (TOS: “You must be a human.”). We'd have to work around this item in a legally sound way to get things like L10n-bot.
 * No true pseudonyms on github. Github at least knows your real name, even if it might not show it (TOS: “You must provide your legal full name, [...]”)