Topic on Talk:GitLab consultation

What are the major differences?

10
DKinzler (WMF) (talkcontribs)

Having never used GitLab, I have no idea how to decide whether it would be a good idea to use it. What features does it offer that Gerrit is lacking? Which features is it lacking that we use on Gerrit? What does it do differently?

A consultation should be about making an informed choice. I'm lacking information. I can of course go and google, but wouldn't it be good to have an overview here, so we can discuss it directly, and add to it as relevant things come up? Some kind of decision matrix would perhaps be useful.

I can find a couple of comparisons online, but non of them is talking about the things that would be most relevant to us.

Kizule (talkcontribs)

Hello @DKinzler (WMF), this is very good question which I wanted to ask also. :)

SBassett (WMF) (talkcontribs)

@DKinzler (WMF) - there is a labs test instance currently set up at https://gitlab-test.wmcloud.org/ where many folks have been playing around, which might be helpful to you in assessing differences between gitlab and gerrit. It's a volatile instance though, so there is no expectation for any data to persist. Several features such as issue-tracking are also likely to be disabled for any Wikimedia installation - I believe the working group is attempting to compile these soon so as to help guide expectations.

DKinzler (WMF) (talkcontribs)

@SBassett Thank you, I'll have a look. Still, a side-by-side overview would be extremely helpful for this consultation.

Tgr (WMF) (talkcontribs)

+1 that it would be nice to see a feature comparison. Also, the test instance does not have any CI set up I believe, and CI UX is a pretty crucial factor - it is a big part of the first experience of new code contributors.

BBearnes (WMF) (talkcontribs)

Also, the test instance does not have any CI set up I believe

On that one, see the .gitlab-ci.yml on this merge request for a rough working example on mediawiki/core. There's a job runner instance configured and users should be able to define CI pipelines for any project using docker images from the WMF registry.

Hashar (talkcontribs)

There are a few drivers listed on the GitLab consultation page, roughly that boils down to:

easiness to create new project

In Gitlab, as soon as you are connected there is a nice shinny button that lets you create a new project. It can be placed under your personal namespace or one of your groups namespaces. https://gitlab-test.wmcloud.org/projects/new , as `hashar` and being a member of the `release-engineering` group, I can create a new `wikioid` project such as:


https://gitlab-test.wmcloud.org/hashar/wikioid https://gitlab-test.wmcloud.org/release-engineering/wikioid

The first will be managed by myself, the second by any member of the group. The access list is set on creation.

In Gerrit, creating a project is a global capability. It lets ones create a project anywhere in the hierarchy, for example under another team hierarchy. It can't really be given to anyone since the hierarchy is shared by everyone unlike Gitlab which namespace it per person/group. So essentially Gerrit platform is shepherd by a restricted group, which is very typical in the corporate world. Our process is:

The fix in Gerrit would be to change the project creation capability to be namespace based instead of global. Possibly mapping hierarchies to groups and personal users. So theoretically if I am in the LDAP group `releng` I would be granted the right to create a project under `/releng/` and additionally under a new hierarchy such as `user/hashar/`. And that would address that specific concern.


easier setup and self-service of Continuous Integration configuration

Also known as self serve CI. Similar to how on Github you can add Travis integration and immediately benefit from CI, on our infrastructure it is shepherd just like for Gerrit projects creation. The CI configuration is done independently from Gerrit, it relies on some standardized entry points such as running npm test.

Which mean that to benefit from CI you need to first know it exists, reach out to the proper people (release engineering) and get it configured. An advantage though is that the maintenance of CI is babysitted by a team and it is more or less consistent across repositories. That is especially true for the MediaWiki core, extensions and skins deployed to Wikimedia production for which we enforce a set of rules and do not let developers deviate from it.

We at some point had an ambitious plan to overhaul our current CI:

  • One of the outcome is the deployment pipeline which still r equires central configuration and initial setup for the entry point (running Blubber to craft container) and is geared toward automatically packaging a repository as a Docker container we can then deploy.
  • The other intent was to shift the CI workload from WMCS instances to a Kubernetes cluster. The primary reason is the workload is a bit challenging to the WMCS infrastructure, it often comes in spikes, is CPU intensive and sometime caused the infra to crawl to its knees. There are other limitations here and there as well. Eventually that got de-prioritized in favor of another project, there is only a limited amount of things you can do at a point in time given our limited resources.
  • The Zuul version we use is dated, the next major one does come with self serve CI. I can't remember exactly why I haven't got the upgrade prioritized, most probably I wanted to have the new Zuul to be fully based on Kubernetes and when that plan fall down, the Zuul upgrade went with it.


workflow familiarity

For the old timers that have been used to Gerrit for years and years, it comes to a second nature. Amending commits, having the git hook setup and pushing to `refs/for/<branch>` is trivial enough once you get in that stance.

Wikimedia employees and contractors are hopefully onboarded by their new teammates that would help them get on the rails and explains the basics of Gerrit. There are a few that would rather stick to Github since they are familiar with or because they never got trained to use Gerrit in the first place. For outsiders, that is a bit more challenging, surely we have plenty of documentation available be it Gerrit own documentation https://gerrit.wikimedia.org/r/Documentation/intro-user.html or our tutorials at https://www.mediawiki.org/wiki/Gerrit

The GitHub/Gitlab and Gerrit workflows are exactly the same on the functional level: get one or more commits send in a staging area, get reviews, modify your commits, get them approved and ultimately merged to make them available to others.

The implementations though are "slightly" different.


In gitlab/github you fork a repository, clone it, send your commits in a branch then head to the web interface to emit a merge request to the upstream repository. One is on its own personal space with whatever branches they want and until the code is ready it is essentially isolated.

Code updates are done in your local branch which get pushed to your forked repository and that ultimately update the merge requests.

The way your commits are associated with a merge request is by using the quadruplet made of:

  • your forked repository
  • your branch in the forked repository
  • the target repository
  • the branch in the target repository


In Gerrit, you clone the upstream repository, craft your commits . You send them directly to the upstream repository (instead of your fork) targeting a special reference which has the target branch: refs/for/master. That creates a change for each commit and they are directly in the upstream repository.

Code updates are done by retrieving the change (or series of change), amending and sending again to the same special reference (refs/for/master).

The way commits are associated with changes uses a triplet made of:

  • The repository
  • The target branch
  • The Change-Id meta header

Gerrit merely just skip the need to fork the repository and your personal changes / work in progress are effectively shared in the same repository as upstream.

The two workflows are really the same. The Gerrit one is just a bit more intimidating when you comes from Github/Gitlab. Gerrit has an advantage: it deals mostly with individual commits, its drawback is that retrieving a series of change might prove to be difficult when some commits in the series got updated. Where as in Github/gitlab they are grouped in a single branch and afaik commits can't be individually changed outside of the branch.


A note is that you can also create branches on repositories for anything that requires a significant amount of work. That is used in some cases but is certainly not generalized.

Hashar (talkcontribs)

I only took the few entries that are listed at GitLab consultation. There are obviously a lot more differences.

Gitlab is essentially a clone of Github and offers a full suite of development tooling such as hosting releases, issue tracker, CI, wiki, spin a live environment based on a branch, design assets, webhosting (like github pages).

Gerrit only covers git hosting and code reviewing. It allows extremely fine permission settings, has a rather simple infrastructure: one big java process, some managed caches and Lucene indices, git repositories.

BBearnes (WMF) (talkcontribs)

In gitlab/github you fork a repository, clone it, send your commits in a branch then head to the web interface to emit a merge request to the upstream repository. One is on its own personal space with whatever branches they want and until the code is ready it is essentially isolated. Code updates are done in your local branch which get pushed to your forked repository and that ultimately update the merge requests.

It's incomplete, but note the docs at GitLab consultation/Workflows - we'd probably adopt a model similar to that used by KDE and many projects on GitHub, where regular contributors are able to create branches directly on the mainline repository and thus skip the forking step for most work. KDE's convention of branches named like work/user/feature-name seems a good one.

DKinzler (WMF) (talkcontribs)

Is the idea that we would start using feature branches that contain weeks and months of work, and then need to be consolidated with the main line?

I was very happy to see that we were slowly moving to smaller and smaller patches going directly into core. I'd hate to see that trend reversed.

On that note - what about code review / approval? Can we keep the refine -> approve -> check -> merge workflow?

Reply to "What are the major differences?"