GitLab consultation

Outcome
2020-10-23: The working group decided that migrating our code repositories from Gerrit to a self-hosted Gitlab Community Edition installation is the right decision. GitLab is a portal for more details.

Why
For the past two years, our developer satisfaction survey has shown that there is some level of dissatisfaction with Gerrit, our code review system. This dissatisfaction is particularly evident for our volunteer communities. The evident dissatisfaction with code review, coupled with an internal review of our CI tooling and practice makes this an opportune moment to revisit our code review choices.

While Gerrit’s workflow is in many respects best-in-class, its interface suffers from usability deficits, and its workflow differs from mainstream industry practices. This creates barriers to entry for the community and slows onboarding for WMF technical staff. In addition, there are a growing number of individuals and teams (both staff and non-staff) who are opting to forgo the use of Gerrit and instead use a third-party hosted option such as GitHub. Reasons vary for the choice to use third-party hosting but, based on informal communication, there are 3 main groupings:


 * lower friction to create new repositories
 * easier setup and self-service of Continuous Integration configuration
 * more familiarity with pull-request style workflows

All these explanations point to friction in our existing code-review system slowing development rather than fostering it. The choice to use third-party code-hosting hurts our collaboration (both internal and external), adds to the confusion of onboarding, and makes it more difficult to maintain code standards across repositories. At the same time, there is a requirement that all software which is deployed to Wikimedia production is hosted and deployed from Gerrit.

If we fail to address the real usability problems that users have with Gerrit, people will continue to launch and build projects on whatever system it is they prefer—Wikimedia's GitHub already contains 152 projects, the Research team has 127 projects.

Gerrit improvements
This raises the question: if Gerrit has identifiable problems, why can't we solve those problems in Gerrit? Gerrit is open source (Apache licensed) software; modifications are a simple matter of programming.

We are unique in the community of Gerrit users which include large companies such as SAP, Ericsson, Qualcomm, and Google. Google, in particular, is singular in their use of Gerrit for projects like Android and Chromium. To support these large, open projects multi-site capabilities are needed; however, much of that work is either closed-source or does not support multi-site writes (correction as of 2020-10-28: the documentation of the multi-site plugin now indicates support for multi-site writes using a global ref database).

Upstream has improved the UI in recent releases, and releases have become more frequent; however, upgrade path documentation is often lacking. The migration from Gerrit 2 to Gerrit 3, for example, required several upstream patchsets to avoid the recommended path of several days of downtime. This is the effort required to maintain the status quo. Even small improvements require effort and time as, often, our use-case is very different from the remainder of the Gerrit community.

What
The Wikimedia Release Engineering team has taken an initial look at GitLab.

GitLab is a capable and scalable code review system written in Ruby. GitLab is available for self-hosting, as required for parity with the rest of our development tooling infrastructure and to alleviate concerns about data privacy or usage restrictions of third-party hosting. As GitLab offers an MIT licensed community edition (CE), it adheres to the Foundation's guiding principle of Freedom and open source.

Finally, GitLab was evaluated as part of the Release Engineering team's Continuous Integration working group. While GitLab's continuous integration systems were found to be adequate for our needs, code review was out of scope for the charter of that working group. Therefore GitLab's capabilities were weighed against the likelihood of introducing social friction and added complexity—integrating GitLab's CI with Gerrit. Replacing Gerrit with GitLab would significantly change the equation of our CI tooling—allowing us to use an industry-standard workflow and self-serve CI tooling built into GitLab. For these reasons we'd like to limit the scope of this discussion to evaluation of whether it’s feasible and advisable to move from Gerrit to GitLab for code review. Things within scope include: code review workflows and processes; tools and bots involved in the code review process; merge, branch, and deployment strategies; as well as unforeseen blockers to adopting a new code review system within our technical community. For the avoidance of doubt, continuous integration (CI) and task/project tracking are out of scope of this evaluation.

Definitions and comparisons
A few useful terms and definitions for the comparison of GitLab and Gerrit code review:


 * Mainline branch
 * A single, shared branch that acts as the current state of the repo.


 * Integration
 * The process of combining a new piece of code into a mainline codebase.


 * Patchset
 * The method of integration in Gerrit. A single commit under review. It is very common to amend a commit during the code review process.


 * Merge request
 * The method of integration in GitLab. A request to merge one branch into another.


 * Feature branching
 * Put all work for a feature on its own branch, integrate into mainline when the feature is complete. These branches are often short-lived.


 * Continuous integration
 * Developers do mainline integration as soon as they have a healthy commit they can share.


 * Squashing
 * Making a series of Git commits into a single git commit.


 * Fast-forward
 * Merging git commit history retaining git commit history without a merge commit.

Decision making
Using the RACI responsibility assignment framework:

Responsible (the primary driver of this decision):


 * Tyler Cipriani, Engineering Manager for the Wikimedia Release Engineering team

Accountable (the final decider based on the Responsible’s recommendation):


 * Grant Ingersoll, CTO of the Wikimedia Foundation

Consulted (those who will have a 2-way communication relationship with this evaluation):


 * Technical community members and organizations that use Gerrit (via on-wiki and mailing list requests)
 * Wikimedia Foundation Technology and Product departments management and staff
 * TechCom members

Informed (those who will have a 1-way communication relationship):


 * Broader Wikimedia movement (via the consultation’s talk page, mailing list messages to wikitech-l@ and wikitech-ambassadors@, and the TechNews newsletter)

Working Group
The Working Group is responsible for engaging with the discussion, surfacing and weighting feedback from consulted groups, providing recommendations, and directly informing the outcome of the consultation. Members of the Working Group:


 * SRE/Service Ops - Effie Mouzeli
 * Release Engineering - Brennen Bearnes
 * Security - Chase Pettet and Scott Bassett
 * Technical Engagement - Andre Klapper
 * Product - Daniel Cipolleti
 * Wikimedia Germany (WMDE) - Lucas Werkmeister and Conny Kawohl
 * HalloWelt - Markus Glaser
 * General Community - Derk-Jan Hartman
 * TranslateWiki - Niklas Laxström

Communications
Communication between the various groups involved in this consultation will be important to achieving an equitable outcome of this process.

Announcements
Announcements will be made on Wiki, via mailing lists, and via the TechNews newsletter. It will be announced when each stage of the process outlined below begins and ends. Additionally, a summary of feedback will be amalgamated at the end of the entire consultation process.

With those being consulted
Feedback will be solicited on Wiki, via mailing lists, and via TechNews. Feedback will be collected for the duration of the consultation period via on-wiki talk pages. Members of the working group will strive to engage in discussions that happen on-wiki during the consultation period. At the end of the entire process a summary of the feedback collected during the consultation will be shared.

When
As this decision has secondary effects on our continuous integration system, which is also planned to be replaced in the near future, we hope to follow the below timeline:

Evaluation Considerations
The Wikimedia technical community has many workflows and participants from all over the world with differing levels of technical knowledge and expertise. To arrive at an equitable decision concerning the adoption of GitLab there's a need to weigh valid concerns against the longevity of the technical projects that the community stewards while also considering the needs of all contributors. These are the considerations that will be used in the evaluation. The evaluation will be completed by the members of the Working Group initially and shared during the consultation period. Topics of discussion include:


 * Workflows
 * ACL and privilege policy
 * Repository creation and deletion/archival
 * Workflows and Code Review
 * Branching and merging
 * Deployment
 * Bot workflows (e.g.: localization, bot comments)
 * UI/UX
 * Administration
 * WMF Privacy Policy
 * Software ecosystem
 * Operations (backups, upgrades, maintenance)


 * Upstream
 * Free/Open Source Software
 * Other Free Knowledge/Software community users

FAQ

 * Would we use GitLab Enterprise or Community Edition?
 * Community Edition (CE). It is the Free Software release of GitLab that runs optional non-free software such as Google Recaptcha to block abuse, which we do not plan to use. This is similar to other Free Software/Culture groups who use GitLab, for example: Debian, GNOME, and KDE (each of which use the non-free Recaptcha tool).
 * Why is Continuous Integration (CI) out of scope?
 * CI is out of scope because that infrastructure is secondary to code review. In other words, ideally the details of how CI is set up and maintained are not important to developers on a day to day basis.
 * Why is GitHub not considered?
 * GitHub would be the first tool required to participate in the Wikimedia technical community that would be non Free Software and non self-hosted.
 * GitHub also does not meet all of our needs; for example, GitHub grants little control of metadata, no influence over privacy policy/data retention, sanctions and bans, little control over backups and data integrity checks, and no long-term guaranteed access to underlying repository settings and configuration.
 * What happens to repositories developed on GitHub if we move to GitLab?
 * Given that GitLab provides a very similar workflow and feature set, we will strongly encourage all developers to use GitLab instead of GitHub for all development. Repositories will still be mirrored to GitHub, for visibility purposes.
 * What about using other GitLab built in features like Issues, Wikis, Pages, etc if we decide to use GitLab for code review?
 * This consultation is focused only on code review, not issue tracking or replacing other functionalities already provided to Wikimedia developers. To pre-empt a segmentation in our issue tracking we would turn off issue tracking on GitLab. In addition we would turn off repository wikis, GitLab Pages, and other features overlapping with currently provided tooling.