Jump to content

Talk:GitLab/2020 consultation

Add topic
From mediawiki.org
(Redirected from Talk:GitLab consultation)

Outsourcing and requirements

[edit]

I strongly suggest not to repeat the mistake we did with Phabricator, i.e. to just hope that all the features we needed would appear at some point. Make a list of everything we need and pay some established business to develop it in the open, in fully free software, before the migration starts for real.

This applies to any software, but especially for an open core software like GitLab, where you're walking on shifting sands and you're bound to get surprises on the way.

Make sure you have a suitable budget, which will most likely be in the order of magnitude of 1 M$. (The Phabricator migration cost WMF much more than that, if you consider the employee time it required.) Tell the community beforehand that you have the resources to actually handle the migration. Share an actual detailed comparison of all the requirements and use cases considered, the budget for each area (based on real bids), who else is planning to use that software for that thing and what the alternatives are. Nemo 04:46, 3 September 2020 (UTC)Reply

@Nemo bis Heja, thanks for the comment!
In 2014, everyone was invited to test Phabricator in a test instance and to create tickets about potentially missing functionality. A lot of these tickets were resolved, some were considered as not crucial to have before the migration, and quite a few got implemented after the migration. One of the harder parts of product management is making the cut when to consider something "good enough" and "in total better than what we had before" (in Bugzilla). Covering also use cases which are way less common would have postponed having a better issue tracking system for the vast majority of folks by a long time (see :w:en:Pareto principle).
(Furthermore, thinking back on who and how many people worked on the Phabricator migration for how long and salary levels in 2014 I honestly don't think that the 2014 migration cost "much more than 1M$".)
I very much agree with your last sentence. AKlapper (WMF) (talk) 05:40, 3 September 2020 (UTC)Reply

Software freedom

[edit]

The consultation mentions «guiding principle of Freedom and open source». Good, but it is still missing a clear statement that only solutions based on fully free software will be considered, including whatever SaaSS they rely on. Is it considered obvious? If so, no reason to avoid spelling it out. Is it not obvious? Then we need a frank conversation about it.

This is not a theoretical issue; we already know that some things can be problematic, for instance ReCaptcha. In other software migrations by WMF in the past, features to handle spam and abuse proved inadequate well too late in the process and we ended up disabling of core workflows for months, sometimes years.

Please don't ignore what other people are doing, write a summary of other free software orgs experiences and studies; otherwise, the only available resource remains https://libreplanet.org/wiki/Fsf_2019_forge_evaluation (which is rather negative about GitLab) and it will look like you're dismissing freedom as a concern, even if it's not true. Nemo 05:07, 3 September 2020 (UTC)Reply

While I don't fully agree with both some evaluation criteria chosen by the FSF and the relevance of some of the listed bullet points for Wikimedia, the specific topic of using ReCaptcha in some situations bothers me too.
Regarding ReCaptcha, related upstream ticket is probably https://gitlab.com/gitlab-org/gitlab/-/issues/21998, also https://gitlab.com/gitlab-org/gitlab/-/issues/25312 and https://gitlab.com/gitlab-org/gitlab/-/issues/26254. AKlapper (WMF) (talk) 18:41, 4 September 2020 (UTC)Reply
Re: ReCaptcha, it doesn't appear to be enabled by default in the Community Edition. Thanks for raising the larger point about needing to mitigate spam/abuse without relying on unacceptably invasive 3rd-party services. BBearnes (WMF) (talk) 22:15, 8 September 2020 (UTC)Reply

Tangent: GitHub open core?

[edit]

Personally, as far as usability for development independent of the Travis ecosystem, I'm fine with any code review system, be that GitLab, Gerrit, or GitHub. But I understand why vendor lock-in and lack of access to source (plus some other items) rules out GitHub. Silly question: does anyone know, though, whether GitHub plans to go head-to-head with GitLab with an open core model? Their business approach seems to be oriented around their platform via freemium, so I'm guessing not, but wondered if that had been explored. I realize the question here is to GitLab or not to GitLab, but wanted to understand that assumption. ABaso (WMF) (talk) 11:29, 3 September 2020 (UTC)Reply

I think Microsoft intends to use github to drive customers to their cloud business. MModell (WMF) (talk) 10:59, 5 September 2020 (UTC)Reply
GitHub has an on premise offer, they ship you a VM to run on your own infrastructure, but the code is still proprietary. That is a pretty common request in big corporation which have specific requirements to isolate different business unit from each other, enforce network policies etc.
I don't think Github compares with Gitlab business wise. Github is now part of Microsoft offering and will surely be closely integrated with their cloud (as Mukunda stated). I don't think they have much incentive to become open source. Fun thing, gitlab is going public on November 18th 2020. Antoine "hashar" Musso (talk) 20:12, 15 September 2020 (UTC)Reply

Tangent: can teams dependent on GitHub cut over?

[edit]

I know CI is out of scope, but for the very small number of WMF teams dependent on GitHub, if we adopt GitLab, can they cut over to GitLab and still operate efficiently? If not, it seems like we may have some exceptional cases. And I personally think that's fine, but I'd like to check my assumption that cutting over isn't too big a deal for them. ABaso (WMF) (talk) 11:34, 3 September 2020 (UTC)Reply

Determining whether those teams can operate effectively in GitLab is a good use of the GitLab test instance: https://gitlab-test.wmcloud.org/
There is a CI runner there. I was able to setup CI job for one of my repos to experiment with GitLab: https://gitlab-test.wmcloud.org/release-engineering/tools-release/-/pipelines/12 TCipriani (WMF) (talk) 18:58, 3 September 2020 (UTC)Reply
Nice. ABaso (WMF) (talk) 11:01, 5 September 2020 (UTC)Reply

Support topic labels that span multiple repositories

[edit]

I'm pretty confident people can learn to use GitLab just fine, as the shift to a pull request model is doable. Personally, I find the mental model of pull requests tricky at times, but have come to be okay with it; I know many love it!

But I digress. One potential wrinkle is support for source code topic labels that span multiple repos. I had heard there's a workaround for GitLab to achieve this useful functionality of Gerrit, but would someone please shed some light on how this would work? ABaso (WMF) (talk) 11:46, 3 September 2020 (UTC)Reply

I took the liberty to rephrase this topic title (from General support on code review, but one potential requirement risk). Antoine "hashar" Musso (talk) 22:00, 15 September 2020 (UTC)Reply
Yeah, Gerrit's global topic / hashtag feature is very useful for complex feature development, especially when it involves public interface changes in MediaWiki core. I imagine it would be very useful for dashboards / workflows for volunteer patch review too, if we were to properly resource volunteer support. Tgr (WMF) (talk) 03:12, 1 October 2020 (UTC)Reply

Packages (Composer etc)

[edit]

I know GitLab provides a built in package repository. Will we be using this?

Also, will it be possible to ping external package repositories (Packagist etc.) from GitLab, to keep packages up to date elsewhere? Sam Wilson 01:06, 4 September 2020 (UTC)Reply

Both of these questions might be a touch out of scope for the code review discussion, but then again they do seem to impinge on deployment, for some values of "deployment". Which is just to say that I don't have detailed answers but did want to make sure they're addressed all the same...

I know GitLab provides a built in package repository. Will we be using this?

We don't currently have any plans around that feature. I believe it's available in the Community Edition, and I can imagine finding uses for it at some point in the future, but it's not something we've discussed or tested extensively.

Also, will it be possible to ping external package repositories (Packagist etc.) from GitLab, to keep packages up to date elsewhere?

While I haven't explored this in detail, I think the answer is probably yes, one way or another. There are probably security policy considerations around credential storage and the like, but at minimum it's something you could do in a CI job if it wasn't supported any other way. BBearnes (WMF) (talk) 16:06, 8 September 2020 (UTC)Reply
Thanks Brennen, that's sort of what I thought, but it's good to get it clarified. Certainly, the status quo with Gerrit is that it's frustrating to update Packagist, so it doesn't matter if it's also a bit tricky in Gitlab! :-) Sam Wilson 02:19, 9 September 2020 (UTC)Reply
A bit out of topic, but we can simplify the publishing to packagist.org. Packages hosting is really a different topic, I will give some context below.
Publishing to repositories will be equally tricky until we tackle the issue and add whatever logic is needed to update packagist after a tag is created :] That is definitely doable right now with:
  • Gerrit might be enhanced with webhooks but I don't see how credentials get handled.
  • we could have a CI job running after a tag is created which would poke packagist.org and trigger a crawl of the repository. I guess the devil is all about creating a user for this, add the user to the repository we maintain on packagist and craft the job that does a POST to packagist.org.
We had some discussions about self hosting our packages (eg running our own instances of packagist.org, npmjs.com) but instead went with snapshoting dependencies in vendor.git repository or relying on lock files when building Docker images. We have one for Java/maven world ( https://wikitech.wikimedia.org/wiki/Archiva ) and use that to deploy to production.
Having our own package hosting might let us deploy to production from them instead of relying on vendor.git snapshot, but with the move toward Docker images and lock files I guess it has less interest now. For composer/npm, we would surely have to also publish to packagist.org / npmjs.com which are the default indices saving people the trouble to have to refer to our reposiories. So I guess we collectively gave up on the idea :] Antoine "hashar" Musso (talk) 20:42, 15 September 2020 (UTC)Reply
As a quick update re: Packagist specifically, I just noticed there's a Packagist integration out of the box. BBearnes (WMF) (talk) 22:30, 30 September 2020 (UTC)Reply

Not convinced; let's stay with Gerrit

[edit]
I am a (not particularly experienced) MediaWiki core and extension developer with +2 capabilities. I have been involved in code review for some years contributing patches and reviewing other's. I am also a Gerrit Manager which grants me some admin-level capabilities like creating new repos and groups. My experience with Gerrit is overall positive, and with all due respect I do not find the arguments offered for the migration particularly compelling or strong enough. Some thoughts:
  • GitLab seems in design a copy of GitHub, also in their workflow; that includes Pull Requests, which are very confusing and hard to work with in my experience. This sentiment seems to be shared by others below and would be a disruptive change.
  • The numbers does not seem to support migration either: as stated in File:Developer Satisfaction Survey - 2020 vs 2019 comparison.png code review satisfaction was higher in the last survey. While File:Developer Satisfaction Survey - 2020 - Averages by role.png indicates that volunteers might be unsatisfied (missing question: for how long have you been contributing to code to Gerrit? Usually new developers have difficulties because the docs ain't great), overall the graphics show that our Code Review model is satisfactory. If we take the data from File:Developer Satisfaction Survey - 2020 - Code Review - satisfaction.png 63 out of 75 respondents (84 %) is safisfied with Gerrit Code Review. And that was before we updated Gerrit.
  • We recently upgraded Gerrit to version 3. See phab:T254158. I don't think we should throw away the huge ammount of work done by so many people that had to be involved into making that happen without giving this new Gerrit at least a try. Let's wait a year or two and see how things go with our new install?
  • We have built and maintain a huge infraestructure around Gerrit (and Phabricator) over the years: plugins, tools, integration. All that would need to be created ex novo, rewritten or ditched. This will require huge ammounts of work too to restore previous functionality.
  • I am also not convinced about the alleged friction creating new repositories. I count 26 people able to create repositories (more if you add non-overlapping ldap/gerritadmin folks). However requests at Gerrit/New repositories/Requests are being processed mostly by QChris and sometimes myself only. With the number of people able to create repositories it should not be a problem to find someone able to get one for you. Gerrit allows creating repositories and groups from the UI directly too, and this has bettered in the new Gerrit version.
  • Also, while the documents mention GitLab would be self-hosted; I think I should mention that as a GitLab user I've just received an email from them telling me that they're cutting CI/CD minutes to non-paid users from 2,000 minutes per month to 400 minutes per month; and charging 10 bucks per 1,000 extra minutes. Would that affect us?
Overall while GitLab might be more visually appealing, I do not think the "looks better" argument is enough, nor that at this stage a migration is justified and therefore respectfully oppose this proposal. —MarcoAurelio (talk) 18:20, 7 September 2020 (UTC)Reply
I'll pick up some points and share my views and opinions. :)
Re "which are very confusing and hard to work with in my experience.": I'd say that Github is quite popular these days when it comes to hosting code (having seen things like Sourceforge, Google Code, or Gitorious). It feels like a lot of people outside Wikimedia will say "confusing and hard to work with" about Gerrit instead. See mw:Gerrit/Tutorial (which has been no fun to update) for some obstacles: For example, why am I supposed to install software like "git-review" in addition to git itself?
Re "We recently upgraded Gerrit to version 3": "The pure "We invested into this in the past" argumentation feels like a psychological trap when deciding about the future: Past is past anyway, you cannot change it. In consequence this either argues that the longer a system has been running the less it should ever be replaced by another system, or in consequence this argues if there are any vague thoughts of re-evaluation around then you should already start to let existing systems rot and not maintain them (IIRC Gerrit 2.x was unsupported so the update to 3.x was needed, no matter what comes after 3.x).
Re "We have built and maintain a huge infrastructure around Gerrit (and Phabricator) over the years": Until 2014 we had built and maintained a huge infrastructure around Bugzilla, including hacks like Bingle or Bugello etc, plus I remember breakage of tools such as GerritBot a few times when we upgraded Bugzilla to a minor (!) new version. The maintenance burden of all that complexity was one argument why to drop Bugzilla, RT, Mingle, Trello, and go for Phabricator. (If we succeeded in reducing that complexity is another discussion though.)
Re "I am also not convinced about the alleged friction creating new repositories.": Personally speaking, I gave up requesting repos for my (very limited, as I'm not a dev) stuff in Wikimedia Gerrit. I want to quickly drop a code repo (of which in my case I'm likely the only user) in public, without having to ask someone first and wait. Which is what some people have been doing in Category:Extensions_which_host_their_code_in-wiki (and will continue to do if they don't want to learn using a version control system which would allow others to pull code changes programmatically, but I guess that the current repo request process adds to that reluctance). AKlapper (WMF) (talk) 08:40, 8 September 2020 (UTC)Reply
I've never been a fan of the "trending" or "popularity" argument which always fell flat to me otherwise let's just start using GitHub instead of crafting our own pseudo GitHub install while breaking things in the process? We should be using what is best for us and fullfils our goals. When 84% of the respondents to the survey showed satisfaction with the (old) Code Review system I'd say we are achieving that goal.
Actually and as a volunteer my main concern with Code Review is not with Gerrit but the time it takes sometimes to have a patch reviewed even in mediawiki/core.git (a sentiment that is shared by many of my fellow volunteer code contributors in talks I had with them); and that's something that neither Gerrit, Differential, GitHub, GitLab or BitBucket to name a few can fix. We've had initiatives like the Code Review Office Hours and the WMDE's technical advice IRC sessions; both of them sadly gone but that were helpful for volunteers to find the right someone to review their patches, get second opinions and even real-time code review for complex patches. If we rescued some of those, maybe volunteer satisfaction would increase again.
As Ariel said above, git-review is optional and you can just simply git push HEAD:refs/for/$branch instead. In any case getting git-review up and running is just a matter of pip install git-review; hardly a burdensome task IMHO. But also, since some time ago, Gerrit does allow you to create and edit patches directly via the UI so you don't even need to install git to contribute minor or simple patches.
This is also not about being stagnant. I pushed for the update from Gerrit 2 to Gerrit 3 several times, and I'm pushing for the upgrade from Mailman 2 to Mailman 3 too. What makes little to no sense to me is the timing of this proposal when not even a year has elapsed since the Gerrit 2 to 3 upgrade, and people seem to be happy with the new version of Gerrit, it's renewed UI and new features. I think it'd be more sensible to wait and see how things go before making breaking changes.
I am also not sure I follow you when you mention the difficulties to deleting repos. Gerrit repositories can be deleted by clicking a "delete project" button in the UI but such option is restricted to Gerrit Managers and Administrators for obvious security reasons. Maybe Git(Hub|Lab) allows individual users to delete individual repos they "own" but that's a pretty minor thing to care about IMHO. Creating projects in Phabricator is several ordinals less dangerous than deleting a repository and (rightly) not everyone is allowed to do so.
Appart from that, it is our policy not to delete repositories but to archive them. I guess it's not only due to licencing issues, but also to preserve historical knowledge. Any project owner can mark a gerrit repo as Read Only to archive it. While it is true that GitHub/GitLab might allow specific users to delete specific repos, in my experience as one of the most active users in the phab:tag/projects-cleanup repo is that deletions are rarely requested or even granted.
Best regards. —MarcoAurelio (talk) 10:01, 8 September 2020 (UTC)Reply
> When 84% of the respondents to the survey showed satisfaction with the (old) Code Review system I'd say we are achieving that goal.
About the 84% number -- I can't remember if 3 was explicitly defined but if I understand it, that score is neutral, neither satisfied nor dissatisfied. So for me, I'd look at the 4 and 5 scores as an indicator of whether we are doing well, and then it's 57.3% of respondents or 43 people who feel this way.
A broader point is that the survey doesn't have answers from those who didn't fill it out, because they never became contributors. Personally I don't think gerrit is the main problem here but I also don't doubt that the Gerrit UX and workflow (different from the pull request model that many developers are familiar with) has turned away casual contributors from contributing to Wikimedia projects.
> and that's something that neither Gerrit, Differential, GitHub, GitLab or BitBucket to name a few can fix.
One technical improvement that GitLab has that I think could help make a difference with volunteer code review is the ability to use its Web IDE to quickly type up an inline suggestion for a block of code, which the submitter of the patch can then approve. It also seems that in general multiple people collaborating on the same change request is easier done in GitLab than it is via Gerrit.
> We've had initiatives like the Code Review Office Hours and the WMDE's technical advice IRC sessions; both of them sadly gone but that were helpful for volunteers to find the right someone to review their patches, get second opinions and even real-time code review for complex patches. If we rescued some of those, maybe volunteer satisfaction would increase again.
Yes, I think this is something we should be talking about as well, just not sure where exactly to start that conversation. KHarlan (WMF) (talk) 10:30, 8 September 2020 (UTC)Reply
About the 84% number -- I can't remember if 3 was explicitly defined but if I understand it, that score is neutral, neither satisfied nor dissatisfied.
Checked the 2020 survey: The question was, "Satisfaction with code review", it was a required question, and 1 was defined as, "Very dissatisfied" 5 was defined as, "Very satisfied"
My assumption was 3 was neither satisfied nor dissatisfied -- almost skipping the question. TCipriani (WMF) (talk) 22:55, 8 September 2020 (UTC)Reply
Thanks for the reply! Hmm, I didn't mention deleting repos. Was that part maybe meant as a reply to another comment? AKlapper (WMF) (talk) 10:15, 8 September 2020 (UTC)Reply
Sorry, I understood the last paragraph of your comment above that you wanted an easy way to delete/archive repositories. Apologies if I misunderstood. —MarcoAurelio (talk) 10:18, 8 September 2020 (UTC)Reply
Ah, I see now! I should have written "drop code into new a repo" instead. Sorry, my fault! AKlapper (WMF) (talk) 10:40, 8 September 2020 (UTC)Reply
> I gave up requesting repos for my (very limited, as I'm not a dev) stuff in Wikimedia Gerrit. I want to quickly drop a code repo (of which in my case I'm likely the only user) in public, without having to ask someone first and wait.
But do you really expect that in Gitlab all users will just "quickly create new repos" at their own will without having to ask and wait for anyone? Isn't it much more likely that we will have some "Wikimedia" organization umbrella like on Github and inherit privileges and we don't want people to create repos with random names, just like we don't want people to just create random project tags on Phabricator? Mutante (talk) 16:30, 18 September 2020 (UTC)Reply
Right. I don't expect (and don't want) that everyone can create random repos in GitLab, but given Groups in GitLab I could imagine a repo creation self-service to broader number of people than currently in Gerrit (plus a more obvious UI to offer that). AKlapper (WMF) (talk) 16:44, 18 September 2020 (UTC)Reply
A quick counterpoint: I've never installed git-review, and I know other folks who have never bothered either. We could add documentation of workflows with just git, where by "we" I mean $someone. ArielGlenn (talk) 08:45, 8 September 2020 (UTC)Reply
I too have never used git-review. Using gerrit from the CLI is really as simple as remembering one command:
git push origin HEAD:refs/for/$branch/$topic
It's the gerrit GUI which leaves a lot to be desired. MModell (WMF) (talk) 21:04, 8 September 2020 (UTC)Reply
The workflow is documented in Gerrit own documentation https://gerrit.wikimedia.org/r/Documentation/intro-user.html which covers uploading a change, a new patchset or setting a topic.
The push is rather and I often relies on it.
git-review is merely an helper to not have to remember the url or chain command. To cherry pick a change I would use git-review -x 12345 instead of copy pasting the command from the Gerrit web UI or using the long git fetch origin refs/45/12345/8 && git cherry-pick FETCH_HEAD. It also comes with an helper to easily compare two patchset: git-review --compare 12345,2 which would fetch patchset 12345,2 and the latest patchset of change 12345, rebase both of them against the tip of the branch and show their difference. Though really, comparing patches is usually better done via the web ui :]
If I remember properly, the main stopper for using git-review is having to install python install git-review itself and having it in the command PATH. For the record the installation is extensively documented at https://www.mediawiki.org/wiki/Gerrit/git-review Antoine "hashar" Musso (talk) 07:32, 18 September 2020 (UTC)Reply
Thanks for mentioning this, I probably should have written in my previous comment that I'm aware that it is possible not to use git-review.
Let me elaborate on the rabbit hole:
It is possible to document more "you can also do XYZ instead" options. This makes instruction pages longer and requires new contributors to make more decisions themselves, instead of following one list of instructions. Longer pages often lead to someone creating a "the original page got too long, so here is a nutshell version" page. Then often one of these several pages does not receive updates and gets outdated and/or out of sync. Furthermore, I've seen it several times that some folks run into the "new" page, they think that something what they consider important is missing, they add it, the page gets longer and longer, and the whole game starts again. Welcome to what I personally call "the vicious documentation circle". AKlapper (WMF) (talk) 08:52, 8 September 2020 (UTC)Reply
Yeah, that's why I used the universally recognized handwaving $someone instead of "you" (or of course "me"). Maintaining docs is hard, and kudos to those who do it. ArielGlenn (talk) 09:02, 8 September 2020 (UTC)Reply
For the record:
Also, while the documents mention GitLab would be self-hosted; I think I should mention that as a GitLab user I've just received an email from them telling me that they're cutting CI/CD minutes to non-paid users from 2,000 minutes per month to 400 minutes per month; and charging 10 bucks per 1,000 extra minutes. Would that affect us?
No it wouldn't. We'd be self-hosting GitLab CE and run CI jobs on our own hardware, just as we do now. Greg (WMF) (talk) 05:10, 14 September 2020 (UTC)Reply

GitLab seems in design a copy of GitHub, also in their workflow; that includes Pull Requests, which are very confusing and hard to work with in my experience. This sentiment seems to be shared by others below and would be a disruptive change.

I quite agree with this sentiment. The pull request workflow is not only confusing but is also several orders of magnitude inferior to Gerrit advanced workflow. – Ammarpad (talk) 19:34, 7 September 2020 (UTC)Reply
> The pull request workflow is not only confusing but is also several orders of magnitude inferior to Gerrit advanced workflow.
@Ammarpad Do you mind elaborating on this, i.e. what you find inferior / confusing? (Maybe in a different topic on this page) KHarlan (WMF) (talk) 09:25, 8 September 2020 (UTC)Reply
@KHarlan (WMF) Sorry, I hadn't seen this (and not sure where to reply apart from here). I am glad TCipriani (WMF) wrote what he linked below. That's partly what I'd have said. In short, the 'feature branching' is very confusing, extremely demanding on patch author ( create branch, upload, delete branch, refork next time) and to reviewers the need to review multiple-commits for each review.
Any of these on its own, is inferior to the patchset workflow of gerrit, and in combination I am not sure what to call them.
The latter part, about obvious disadvantages and even problematic Gitlab workflow has been explored more by @Michael in Talk:GitLab/2020 consultation#h-Squash_Merge_considered_harmful-2020-09-29T19:27:00.000Z just today. – Ammarpad (talk) 07:31, 30 September 2020 (UTC)Reply
Tried to capture a little bit of the concern about merge requests in the Integration: Merge requests and patchsets topic. TCipriani (WMF) (talk) 15:41, 9 September 2020 (UTC)Reply
I want to add some extra stats from the qualitative section of the Dev. Satisfaction Survey from 2019 for your consideration. I did not analyze that portion of the 2020 survey so I don't have numbers for those.
There were about 23 negative comments on gerrit. You can see the sanitized comments and their counts here under the "developing" section: Developer Satisfaction Survey/2019#Analysis (Survey)
There were about 11 positive comments about gerrit. I went back to the original data to count these. JHuneidi (WMF) (talk) 23:17, 16 September 2020 (UTC)Reply

Integration: Merge requests and patchsets

[edit]
One aspect of a migration to GitLab that has been touched on in other discussions is that of integration.
Integration is the process of combining a new piece of code into a mainline codebase. Our mainline codebase under Gerrit and, presumably, under GitLab will be to a mainline branch: a shared sequence of commits that record changes.
The mechanism by which code is integrated under Gerrit is a patchset. A patchset is a single commit that represents the difference between itself and a target branch, typically the target branch is the mainline branch. The mechanisms of combining patchsets with mainline vary by repo but a patchset may either be merged (creating a merge commit) or fast-forwarded (no merge-commit) meaning the patchset is the only commit added to the target branch. In addition to a git commit SHA1, each patchset has a "Change-Id" which is an ID that is searchable in Gerrit and points to a given patchset. Patchsets may be chained. When one commit is the parent of another commit pushing those commits to Gerrit creates a chain of patchsets. The child patchsets may not be merged independently without the parent patchsets having merged. The mechanism to update a patchset is to push a new commit (or chain of commits) with the same Change-Ids as those patchsets you're wishing to update to the special refs/for/[target branch] reference.
The mechanism by which code is integrated under GitLab is the merge request. Each merge request consists of a source branch and a destination branch. The source branch contains 1 or more commits not contained in the destination branch along with a description of the intent of the merge request. The mechanism of combining merge requests is a combination of the repository settings and the merge-requests settings. A merge request may either be squashed (creating a single commit) or each commit may be left seperate. Additionally, a merge-request may be combined by merging (creating a merge-commit) or fast-forwarded: adding the commits in the merge request to the tip of the mainline branch without creating a merge commit. The mechanism to update a merge request is to push a new commit to the source branch or to --force push to the source branch. Generally, force pushing a source branch is not recommended as review discussion may become confusing.
The branching pattern most frequently used with merge-requests is feature branching; that is, putting all work for a particular feature into a branch and integrating that branch with the mainline branch when the feature is complete.
The branching pattern most frequently used with patchsets is what Martin Fowler has termed continuous integration with reviewed commits. That is, there is no expectation that a patchset implements a full feature before integrating it with the mainline branch, only that it is healthy and has had its commits reviewed.
The branching pattern is not necessarily tightly coupled with the tooling, for example, a merge-request could be created with a single commit that itself does not implement an entire feature: this is a merge-request that is not using feature branching. Each tool does, however, encourage using their respective branching mechanisms.
There are various aspects to explore here:
  1. Workflow/branching pattern changes
  2. Review changes
  3. Integration frequency changes
  4. Necessary tooling changes TCipriani (WMF) (talk) 00:27, 9 September 2020 (UTC)Reply
Per working group discussion I've added a few definitions from this topic to the main page. TCipriani (WMF) (talk) 17:04, 21 September 2020 (UTC)Reply
For context, the previous discussion in Talk:Wikimedia Release Engineering Team/GitLab#h-Feature_requests-2020-07-01T15:48:00.000Z is probably relevant here. It describes some of the current use cases about strings of patches and potential ways of having similar workflows in Gitlab (but it looks like currently there isn't an obvious way to implement a similar workflow). GLederrey (WMF) (talk) 12:31, 14 September 2020 (UTC)Reply
The way I've seen this work in Gitlab and Github is that CI tooling will automatically check the head of the branch. You can choose to test the head with or without a rebase. If three patches are submitted at once, only the final patch is tested. If a new patch is submitted as a test is running, that test is canceled and the new patch is tested.
Testing can also be configured to gate the branch merge to master.
The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master. Adamw (talk) 09:23, 18 September 2020 (UTC)Reply

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

This emphasizes a really important point here: Should every commit to the master / main branch represent a known-good, deployable state (as far as we're capable of achieving that)? We do our best to achieve that currently, at least on repos like core, which does seem like it militates in favor of squashing commits by default. BBearnes (WMF) (talk) 20:40, 21 September 2020 (UTC)Reply
To me squashing depends on what those patches were. If the patch chain is the following, it should probably be squashed:
Primary feature implementation -> typo in Foo.php -> correct comments in Bar
If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting:
Refactor to allow implementation -> Implement feature EBernhardson (WMF) (talk) 20:54, 21 September 2020 (UTC)Reply
Unfortunately, in my experience, in the real world in systems where force-rewrite of open PRs isn't available (most FLOSS GitHub and GitLab repos), people end up mushing multiple feature commits and fixups into the same chain.
A 'simple' example, with committer shown in square brackets ahead of each commit:

[A] First feature > [B] typo fix > [B] addition of test suite > [C] Second, dependent feature > [A] failing expansion and modification of test suite based on feedback from the second feature > [C] fix of first feature

Squashing this stack is pretty bad, throwing away the separation of the two main features, and the authorship blame. Not squashing this stack is also pretty bad, littering the history with nonsense commits, making git bisect vastly harder, and creating toxic, test-failing branch points.
Theoretically you can dump the MR, git rebase -i the stack to make history "sensible" with just two commits, and then re-push it as pair of a MRs (one with the first feature commit, the other with the first and second), the second of which screams "DO NOT MERGE UNTIL YOU MERGE MR X FIRST!" manually, but this loses this history of the discussion on what's best to do, still loses the kudos/blame of some of the contributors, and is an epic extra piece of work.
Of course, GitLab could be extended (either by us or upstream) to add features to manage this (turning the 'squash' button into a complex form to let the merge select arbitrary squash/fixup/rebase actions on a per-commit basis), but that's a huge undertaking, taking GitLab quite far away from the simple branch model it's built around so upstream may well not be keen, and said code has to be written and supported by someone.
----This workflow is one that I personally do up to a few times a day, pretty much every day. It's the core of MW's development model. I know that a few areas of our codebase don't use this model and don't have the complexity of multi-feature inter-relation development, but they're the simple exceptions, and it feels like we're focussing on toy development rather than the main stream of our work in all the analysis. It's not an "oh well" to lose it, it's going to be pretty seriously disruptive. Jdforrester (WMF) (talk) 11:19, 22 September 2020 (UTC)Reply
I haven't run into the issue of force-rewrite on open PRs being disabled, but indeed that would make my current experiments with finding a reasonable workflow completely useless. If the only option in a PR is to continually add more code that will be squashed into a single patch, I worry the development history and general experience of performing code review is going to suffer for anything of complexity. EBernhardson (WMF) (talk) 15:02, 22 September 2020 (UTC)Reply

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting:

Refactor to allow implementation -> Implement featureGood point, in this case with a squash workflow the feature would have to be split into two branches. Adamw (talk) 07:54, 22 September 2020 (UTC)Reply
How does that work though? As far as I can tell gitlab has no affordance to split a PR into two branches. If branch A is the refactor, and branch B is the new feature, then as far as gitlab is concerned a PR on B is a PR for A+B and it can be merged without consideration of the A PR. EBernhardson (WMF) (talk) 14:53, 22 September 2020 (UTC)Reply
There is a feature in the premium version for merge request dependencies that is needed here.
I'm not entirely satisfied with any other mechanisms (aside from merge request dependencies) for splitting merge-requests and having them depend on one another. The "smartest" thing I could think to do is to have dependent merge requests target other merge-request branches. For example, I split !4 into !4 and !5. In !5 I targeted the master branch and in !4 I targeted the work/thcipriani/beautiful-soup-dependency branch (the branch from !5). After merging !4 the merge showed up in !5 rather than in master where it could cause issues. I suppose that's desirable in terms of behavior, but there are a few problems with this:
  1. History becomes messy. Maybe this could have been avoided had I used some other options in merging.
  2. It's non-obvious that it's not merged to master
  3. I wasn't prevented from merging the dependent patchset, it merely mitigated any risk of merging it
With the general advice on getting speedy code review being to split your patchsets it'd be nice to have this be a more supported path. It's noteworthy that there are many open issues about creating a merge-request splitting tool. TCipriani (WMF) (talk) 17:19, 22 September 2020 (UTC)Reply
We're just talking about the gitlab UI, I think? From the commandline, let's say you have patches (1, 2, 3, 4) that make up a branch "A", and you want to split (1, 2) into its own merge request. To do that, check out patch 2 then "git branch <new name>" or "git checkout -b", and push that.
Agreed that stacking merge requests can get tricky--but you can usually get the desired effect by carefully choosing the merge target for your PR. If I have branches A and B stacked on each other, then A will be merged to master but B will be "merged" to A. This prevents the UI from showing all of the branch A commits as if they were part of B. Adamw (talk) 18:25, 22 September 2020 (UTC)Reply
Let me add a workflow that SRE uses in gerrit and is pertinent I believe to the integration topic.
An SRE pushes a topic branch in the puppet repo. Every single one of the commits in the topic branch needs to be merged and deployed individually, after having been reviewed (hopefully) and CI has +2ed it. Rebasing might be needed but it's also expected in the current workflow. The reason for that is that every single one of those commits has state changing consequences for at least part of the server fleet and the SRE in question is expected to merge, "deploy" it and perhaps even trigger multiple puppet runs (alternatively they can also wait for the full 30mins that currently puppet changes to reliably be distributed to the entire fleet).
The most recent example I can think of is https://gerrit.wikimedia.org/r/q/topic:%22623773%22+(status:open%20OR%20status:merged).
How will SRE have to adapt that workflow for gitlab? Create a separate MR per change? Using a single MR clearly doesn't cut it (right?), but on the other hand having to go through the process of manually creating 4 or 5 MRs for something that is automatic in Gerrit isn't great either. AKosiaris (WMF) (talk) 20:20, 22 September 2020 (UTC)Reply
I made a concrete example of this on our gitlab-test instance
Of Note
  • I used merge request labels in the place of topics
  • This is a series of patchsets, but they have no semantic relationship to one-another
  • My interaction with this repo was purely through the git client and no other programs
From my side the steps were:
  1. Create my work locally as a series of commits
  2. Use push options to make a merge-request for each patchset
This looked like:
$ echo '5' > COUNTDOWN
$ git commit -a -m 'Start countdown (1/5)'
$ echo '4' > COUNTDOWN
$ git commit -a -m 'Decrement countdown (2/5)'
...
$ git push \
  -o merge_request.create \
  -o merge_request.target=production \
  -o merge_request.remove_source_branch \
  -o merge_request.title="COUNTDOWN (1/5)" \
  -o merge_request.label='T1234' \
  gitlab-test \
  HEAD~4:work/thcipriani/T1234
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 327 bytes | 327.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote:
remote: ========================================================================
remote:
remote:     A test instance of GitLab for the Wikimedia technical community.
remote:   Data may disappear. Consider everything here potentially public, and
remote:                     do not push sensitive material.
remote:
remote: ========================================================================
remote:
remote: View merge request for work/thcipriani/T1234:
remote:   https://gitlab-test.wmcloud.org/thcipriani/operations-puppet/-/merge_requests/1
remote:
To gitlab-test.wmcloud.org:thcipriani/operations-puppet.git
 * [new branch]            HEAD~4 -> work/thcipriani/T1234
As is already mentioned, this could be wrapped in a friendlier interface. TCipriani (WMF) (talk) 13:53, 23 September 2020 (UTC)Reply
I 've iterated a bit on your take. Mostly a for loop to go through all the changes. What I got
is at https://gitlab-test.wmcloud.org/akosiarisgroup/simplegroupstuff/-/merge_requests?label_name%5B%5D=T42
$ for i in 5 4 3 2 1 ; do \
git push -o merge_request.create \
-o merge_request.target=main \
-o merge_request.remove_source_branch \
-o merge_request.title="We are leaving together ($i/5)" \
-o merge_request.description="And still we stand tall ($i/5)" \
-o merge_request.label="T42" \
origin HEAD~$((i - 1)):refs/heads/Europe$((i - 1)) ; done
Couple of notes:
  • The gitlab label for this specific use case seems to supplant the gerrit topic branch adequately
  • CI is run on every MR, which is what we want
  • While some merge_request git push options are static, some others like title and description aren't. A wrapper tool will need to extract them from the git commit messages I guess. That adds a bit to the complexity of the tool that needs to be written, but it's arguably doable and probably maintainable. It will be however akin to git-review (does anyone cringe already?) albeit only for those that require this workflow
  • The big issue I see is the fact we don't got dependent MRs in this case. So any of the later MRs can be merged at any point in time by mistake causing the expected mayhem. And it seems like this is a Paid feature and not a Community edition feature per Wikimedia Release Engineering Team/GitLab/Features, which isn't a great sign. The notes in that table say "Important for our workflows, but can be re-implemented as a CI job.". Not sure what that means? We 'll create CI that checks something, but what exactly? A "Depends-on"? That's opt-in (as in the user must actively write it), it will probably not safeguard us much. AKosiaris (WMF) (talk) 14:51, 23 September 2020 (UTC)Reply
I would imagine that doing a wrapper that emulates `git review` behavior in a way that creates a separate merge request for each commit wouldn't be too hard. The real issue is lack of nice UI in GitLab to automatically rebase merge requests on top of the target branch. Nikerabbit (talk) 06:48, 23 September 2020 (UTC)Reply
Gitlab will show you when a merge request is going to conflict with master, and a successful merge includes rebase. Is there a reason we need to explicitly rebase the merge request before merging, or maybe that's just a habit carried over from our Gerrit workflow? Adamw (talk) 06:54, 23 September 2020 (UTC)Reply
If the merge requests depend on the previous one, at least the merge base needs to be updated.
If there are conflicts during a rebase, I would like the test pipeline to run again on the rebased merge request before it is merged to the master. Nikerabbit (talk) 08:16, 23 September 2020 (UTC)Reply
I just want to chime in to agree that losing / making-more-complicated the ability to separate out commits into logical units of work seems like a bad thing for our ongoing code health. Workflow forces pushing us towards squashing only semi-related commits together when we merge looks unambiguously bad.
I'd be less concerned if we had the premium merge dependencies, though it looks like they lack some of the convenience features of gerrit's topic chains.
Writing a `git review`-like tool seems like a maybe-viable compromise... but if we're writing custom tooling and will expect any contributor who makes more than an utterly trivial change to be using it, are we gaining so much from a migration any more? DLynch (WMF) (talk) 15:15, 28 September 2020 (UTC)Reply
Given the importance of this stack-of-patches workflow to many current developers, I would have liked to see a more concrete plan, including timeline, for implementing this in WMF's gitlab migration. Especially as the required gitlab features for this are not expected to be included in the Community Edition of gitlab WMF is planning to use? More clarify would be helpful. cscott (talk) 17:01, 26 October 2020 (UTC)Reply
In terms of timeline the GitLab/Roadmap speaks to the chronology of events. Tying the chronology to real world times: the "Utility project migration" heading in that roadmap is where we hope to be in 8 months. I've called out the dependent patchsets explicitly in that step.
We've raised this with GitLab as well. So far they've provided some workarounds that are a bit clunky. I'd encourage developers that care about this feature to poke the upstream ticket as a signal about the feature's importance: https://gitlab.com/gitlab-org/gitlab/-/issues/251227 TCipriani (WMF) (talk) 19:44, 26 October 2020 (UTC)Reply

Sweet!

[edit]

It's about time that the WMF is considering trashing the dumpster fire that is Gerrit.

I have tried to contribute three separate times to MediaWiki Core. Every time I've tried, I've spent more time arguing with Gerrit than actually writing code. You will notice according to my Gerrit stats that I've submitted 13 patch sets while I've abandoned 6. Of those six, I would estimate 4 were Gerrit or git-review screwing up in some way. Side bar, my favorite merged commit was 298121, which required a force merge because Jenkins gave up halfway through.

The best workflow I've found to prevent any Gerrit issues is to maintain two repositories. One "clean copy" for each Gerrit changeset that I apply patches to, and one dirty copy that I actually write my code. Otherwise, something gets screwed up. And by this workflow, I get to give up many modern Git features, like shelving changes or the simple "pull, branch, change, push" workflow that any sane git repository provides. This problem would be immediately resolved by moving to GitLab.

There was also a brief discussion about moving XTools away from GitHub and onto Gerrit when I began rewriting it. I was opposed as I wanted XTools to actually be accessible when someone wanted to contribute. This does appear to be working, as we receive pull requests regularly on GitHub. I have no confidence this would be the case were we using Gerrit.

So yes, the sooner we ditch Gerrit the better. It will lower the bar to new contributors and allow a proper code review workflow to happen. ~ Matthewrbowker Talk to me 05:12, 10 September 2020 (UTC)Reply

Try before you buy?

[edit]
A few observations from discussions here and on mailing lists:
  • Gerrit is a (mostly) known thing already (as in -- we know what it looks like, how it functions, what the strengths and weaknesses are, although we could debate them)
  • Not a ton of people use GitLab in their daily workflow, so it's harder to make judgments about UX, strengths and weaknesses for code review, etc.
  • A lot of people use or have used GitHub and are familiar with its code review model, the UX, etc
  • It seems that many people assume (maybe not correctly?) that GitLab would be more or less just like GitHub
Given the above, it seems like it would be useful to have some small pilot projects that use the test instance to help us better assess how well GitLab's code review tooling would work for us. So far https://gitlab-test.wmcloud.org/explore shows 1 merge request among the repos that have been created. I think if we (i.e. collectively most everyone who has commented or cared about this proposed move, especially those of us who are skeptical of switching to GitLab) tried to use the test instance for a few tasks, for core or an extension, it would provide a lot more concrete information on the strengths/weaknesses of using it. @EGardner (WMF) and @ATomasevich (WMF) did a similar workflow with GitHub, while working on the Vue port of MachineVision; you could make your merge request in the GitLab test instance for a feature / bug you're working on, then at the end clean it up and resubmit to Gerrit for a final merge. It's duplication of work but could provide us useful experience IMHO.
In theory it wouldn't be too hard to have quibble running on merge requests for core or extensions, I don't think.
Anyway, curious to hear what others think about doing something like this, and how we'd go about organizing it. KHarlan (WMF) (talk) 08:26, 10 September 2020 (UTC)Reply
This seems like a great way to surface more of the experience. I'd be happy to try it out for stuff we're working on within RelEng, though I'm guessing it'd be more useful for folks who work directly on MediaWiki and extensions to trial some code reviews.
Let me know if I can help in any way from the administrative end, or if bugs crop up in the configuration / setup. BBearnes (WMF) (talk) 17:23, 11 September 2020 (UTC)Reply
I second this proposal, and would happily volunteer to be a guinea pig at some point in the future (assuming I'm working on a feature amenable to this approach). There are aspects of Gerrit I appreciate, but it was nice to return to the more familiar GitHub/GitLab workflow for the MachineVision project. For one thing, I think that the tools GitLab will provide for actually browsing and reading through code are much better than what Gerrit offers. I have never been a fan of the Gerrit / GitTiles integration, and generally rely on GitHub mirrors when I actually want to navigate a codebase. EGardner (WMF) (talk) 17:53, 10 September 2020 (UTC)Reply
If you could be a guinea pig during the consultation window that'd be very helpful -- happy to setup CI/repos/ACLs: whatever's needed over the short term. TCipriani (WMF) (talk) 15:46, 28 September 2020 (UTC)Reply
Would you actually have to pay for GitLab even if we use our own servers to host/run it? If so, how much? Do we have to pay for Gerrit? —MarcoAurelio (talk) 10:33, 11 September 2020 (UTC)Reply
@MarcoAurelio the proposal would be to use the test instance we already have set up https://gitlab-test.wmcloud.org/ . AIUI we don't pay a license for GitLab as we are using the community edition, and neither do we pay a license for Gerrit. The time/resources needed to maintain the Gerrit/Jenkins/Zuul setup is probably another story though :) (and I have nod idea how that would compare with maintaining code review + CI tooling for GitLab) KHarlan (WMF) (talk) 12:20, 11 September 2020 (UTC)Reply
@KHarlan (WMF): Thanks for clarifying. Have a nice day. —MarcoAurelio (talk) 13:37, 11 September 2020 (UTC)Reply
So, a rough proposal for how to proceed here:
  1. We add fresh clones and at least basic CI on gitlab-test for:
    1. mediawiki/core
    2. some appropriate extension
  2. Interested parties use these for reviewing a handful of real changes and experiment with workflow
It seems like there are a couple of broadly viable workflows:
  1. Developer pushes to a branch on the primary copy of the repo which then becomes a merge request
  2. Developer forks the primary copy of the repo to their account, pushes to a branch on their fork, and creates a merge request from there
In practice #1 might be the default for known, trusted contributors who have write access to the repo (the equivalent of +2 rights), while #2 might be the path for new volunteer contributions. (More fine-grained distinctions are possible with roles, but I haven't explored how that works in depth yet.)
Within those broad approaches, we've identified some unanswered questions:
  • How should we configure merge commits?
    • Merge commit for every merged MR
    • Merge commit with semi-linear history
    • Fast-forward merge, no merge commits, rebase required if no FF is possible
  • Should multi-commit MRs be squashed into a single commit by default?
    • Should this be mandated or optional?
  • Matters of culture/convention
    • How granular should MRs be in general?
    • There's a feature that allows pushing changes to a MR if you're allowed to merge, even if it's on another user's fork - echoing the ability in Gerrit to update someone else's patchset. Worth enabling by default?
I can imagine most of those answers varying by project/team, but they seem like the sort of thing that's worth exploring on the test instance.
We got a basic CI pipeline with Quibble working for mediawiki/core earlier. If this seems like a fruitful way to proceed, we can start with a clean copy of the repo and do the same for an extension.
Thoughts? Volunteers to work through a few changes on gitlab-test? What would make a good extension to use for this exercise? BBearnes (WMF) (talk) 23:04, 16 September 2020 (UTC)Reply
Thanks Brennen.
Given this is an important part of the conversation and we want people to engage and test out and help us determine how we can make GitLab workflows work best for us, I've cross-posted this on the old talk page's "feature requests" thread. Greg (WMF) (talk) 17:08, 17 September 2020 (UTC)Reply
My team is trying something like this at the moment. We've forked https://gitlab.com/wmde/mediawiki-extensions-VisualEditor/ , and are actively developing on branches. We're collecting our impressions internally and will come back here to report our shared conclusions. Adamw (talk) 09:18, 18 September 2020 (UTC)Reply
We've started roughing out some workflow documentation at GitLab consultation/Workflows. At the moment it's much more basic than I'd like, I think partly because it's hard to make preemptive recommendations about optimal workflows for people coming from Gerrit without more experimentation. BBearnes (WMF) (talk) 20:44, 25 September 2020 (UTC)Reply

Confidential merge requests?

[edit]

Gitlab appears to support confidential merge requests, but only within the context of its issue-tracking component, at least that's my reading of https://docs.gitlab.com/ee/user/project/issues/confidential_issues.html#merge-requests-for-confidential-issues and having tried to find this feature or similar outside of the issue-tracking component within the wmfcloud test instance. I understand that Gitlab's issue-tracking will likely be disabled for any potential, initial Wikimedia deployment of Gitlab. It would be nice if such a feature could be supported in some way outside of the issue-tracking component - even if it were just a simple way to hide merge requests or security feature branches behind some kind of group-based authorization. In a dream world, it would be great to implement features along the lines of Gitlab automatically protecting merge requests and branches related to protected Phabricator bugs, security-related keyword AI to detect suspicious requests, commits, branches, etc. and protect them, but all of that would depend upon our installation of Gitlab supporting the basic protection of confidential merge requests and branches. (I understand that projects/repositories can be made private, but that likely wouldn't be the default for nearly all Wikimedia code and is a blunt instrument regardless.) SBassett (WMF) (talk) 16:12, 11 September 2020 (UTC)Reply

Just so I can clarify: this is a request to find a replacement feature for the current practice of attaching .patch files to the private security tasks in Phabricator, right? Greg (WMF) (talk) 16:20, 11 September 2020 (UTC)Reply
That would be one issue such a feature could solve or at least improve upon. Another would be to better handle the accidental pushing of change sets to gerrit before security issues have been made public (which sadly happens more than I'd like it to) as there is no convenient way in gerrit to do this AFAIK. And I would imagine such a feature could potentially improve the largely manual security patch deployment process which is quite vulnerable to human error. SBassett (WMF) (talk) 16:30, 11 September 2020 (UTC)Reply
Gerrit allows private patches but so far we've never enabled the feature on our gerrit install. The docs mention some pitfalls but shall we give them a try in the meanwhile? —MarcoAurelio (talk) 20:25, 11 September 2020 (UTC)Reply
It is private in the sense of not being too public. As soon as a private change is merged, the flag is dropped and the merged patchset becomes public. The private flag is not inherited: public children of a private change expose the private change.
The easiest is probably to have a security branch which is restricted to a subset of people. Then there might be an ACL misconfiguration which ends up exposing the branch. Upstream recommends to use a copy of the repository with repo wide restrictions.
Anyway, how to embargo security patches is not much a tooling issue as it is a workflow one. It has to be carefully taken thought about. Antoine "hashar" Musso (talk) 19:26, 15 September 2020 (UTC)Reply
For now I'd recommend against changing production security patch workflows and supporting services. If the outcome of this consultation is sticking with Gerrit we can certainly look into it. Greg (WMF) (talk) 07:54, 12 September 2020 (UTC)Reply
Only for completeness: There is also https://docs.gitlab.com/ce/user/project/protected_branches.html , but that is neither a protected MR. AKlapper (WMF) (talk) 16:22, 11 September 2020 (UTC)Reply
An approach that comes to mind is to maintain separate forks for security work. Likely cumbersome for various reasons, but maybe worth thinking about. BBearnes (WMF) (talk) 17:09, 11 September 2020 (UTC)Reply
I will plan to reach out to other FLOSS groups (gnome, kde, free desktop, debian, fedora et al) to see how they are addressing (or plan to) these issues. SBassett (WMF) (talk) 16:21, 14 September 2020 (UTC)Reply

Lowering the barrier to contributing

[edit]

While Gerrit may have features that are arguably superior in terms of code review (depending on your workflow), to me, it poses too great of a barrier to contributing, and is a constant source of confusion. I've been using it for 4 years and I still find myself occasionally having to ask for help. I can't help but wonder just how many volunteer developers we've lost because of this. Let's say as a new developer I wanted to fix a simple typo, or add a new line to a config file -- why do I need to read a manual on how to do this? Unless our goal is to increase the barrier to contributing, I'd say there's really no contest here... GitLab/GitHub/BitBucket are all scores more user-friendly. Sure, once you are familiar with Gerrit, its powerful features start to shine, but I think we should do our best to foster open-source development by keeping the barrier to contributing as low as possible, just like we try to do on the wiki. It's for these reasons that I would never host my own Wikimedia tools on Gerrit.

That said, if we do stay with Gerrit, I think there are some small improvements we could make to improve the user experience. For instance, I had +2 rights when I first started using Gerrit. On my first attempt at reviewing code, I of course hit the pretty blue "Code Review +2" button, as it would seem that would 'start' the code review process. Two members of my team at WMF did the same thing when they first joined. I think the button should instead say "+2 Merge", and perhaps have a confirmation modal. Or, say the build gets stuck. You might see another pretty blue "Submit" button. I would have expected that to re-submit the jobs, or something, not merge and bypass CI entirely! Again, "Merge" might be the better wording. It's weird that all the buttons have tooltips except the one that actually can cause problems, and the problematic buttons are so easy and inviting to click on. These are just minor examples. I also struggle to navigate the codebase through the UI, can't ever remember how to follow projects, not to mention those secret commands to control CI via comments... the list goes on and on. Left to my own devices, I always use the GitHub mirrors to browse and share code.

I hope my wording does come off as too strong. A lot of people have put immense work into Gerrit, and I know it works exceedingly well for some people. Perhaps GitLab seems like a toy to some. I suppose it's just a trade-off between power and usability, and I hope we don't neglect the usability aspect when making our final decision. MusikAnimal talk 16:09, 13 September 2020 (UTC)Reply

I fully agree that we should lower the barrier to contributing, but we should be conscious about the trade-offs. If we switch
  • productivity of some developers, like me, would likely decrease temporarily as we learn and adapt.
  • productivity of some developers, like me, could possibly decrease permanently, if GitLab does not support certain kind of workflows as fluently.
In addition, a lower barrier to entry has to be balanced with managing the incoming stream of contributions, not all of them valuable. We know from Wikipedia that it can only work if sufficient tooling and resourcing is present to filter out spam, vandalism and improve contributions which do not quite meet the requirements. Are we prepared to fight the spam, vandalism and drive-by contributions that are not mergeable without further work? Do we have sufficient guidance for contributions so that they can work with us, and not (unknowingly) against us?
I don't have answers to any of these questions, but I hope that there will be by the end of this consultation. Personally, I will try to figure out the first part, how much would my productivity be affected by the switch. Nikerabbit (talk) 15:52, 14 September 2020 (UTC)Reply
From my limited experience here, managing the flow of inbound work is already a significant issue at least for our team. This involves making hard choices and trying to balance resources. On Platform Engineering, we've tried to adopt processes that give clear interfaces for other teams but the volume is already quite high.
I do not at all mean to discount this point, I think it's valuable and prescient but above all something we should already have impetus to address. Building up a better experience both for our internal teams and external contributors should absolutely be a focus.
I'm not sure if it's possible but it might be worth reviewing the practices of other large scale groups and seeing what we can adopt or if there is a willingness to knowledge share with us. I know our own team had an excellent experience working with Envoy recently to contribute upstream changes. WDoran (WMF) (talk) 18:52, 15 September 2020 (UTC)Reply
I am pretty convinced it is a social problem rather than a tooling issue. We had the same problem under the CVS/Subversion area, new commits were send to a mailing list and reviewed after the fact. In 2008, Brion sprinted the Extension:CodeReview (GitHub was just starting at that time) which at least make it easier to process the backlog. I came back as a volunteer in 2010 and went on a review frenzy, but we still had glitches.
Others would correct me, the main incentive was to switch to git. Gerrit came with the nice addition of holding the flood of patches as pending changes which nicely fitted MediaWiki: patches were on hold until reviewed thus protecting production.
Gerrit surely has its flaws, but I don't think the review issue is a tooling issue it is entirely social and related to our "bad" (but improving) development practices and community as whole.
For the tooling consultation, we might be able to look at repositories maintained by Wikimedia on GitHub and see whether the reviews are better handled there. But the corpus of repositories is vastly different (in my experience interactions for a given Github repository are mostly from a single wmf team). Antoine "hashar" Musso (talk) 19:45, 15 September 2020 (UTC)Reply
Will GitLab login require a Wikimedia developer account, like Gerrit does? If so I think that alone would cut out a lot of drive-by garbage, at least spam and vandalism. I can't imagine it'd be much worse than what we see on Phabricator, no? Even if there was an approval process to get access, that might be okay... my issue is good-faith, competent developers (volunteer and staff alike) who already have access still struggle to use the software. It's not just about making patches, but participating in code review, and doing basic things like watching projects and navigating the code, or even finding the command to clone a repository (though downloading an individual patch I think is easy enough to figure out). Or say I click on a Change-Id, it forwards me to the patch, and all of a sudden by browser's history is polluted with redirects making it hard to get back to the previous page. It's all the little things, that together combined with the confusing CI system can turn routine tasks into headaches. This all is of course just my opinion/experience. I am fairly confident these days with Gerrit, but it took a long time for me to get here. MusikAnimal talk 21:08, 15 September 2020 (UTC)Reply

Will GitLab login require a Wikimedia developer account, like Gerrit does?

Yeah, that's the plan.
(Edit: Well, that's my assumption as to what the plan would be. Specifics will need work, but GitLab CE supports LDAP.) BBearnes (WMF) (talk) 00:02, 16 September 2020 (UTC)Reply
Like others, I'm worried we are misidentifying the problem here. I agree in theory that we should prioritize a low barrier of entry and good learning curve above power-user-friendliness - both for pragmatic reasons (we can always use more hands, and the Wikimedia open source projects seem very far below the potential that being a top10 website and the top free knowledge management tool should grant them) and because it fits well with our values of openness and equity.
In practice, though, I agree with Hashar that the main bottleneck is human. This is something the "why" section of the consultation doesn't engage with as well as it should - yes, surveys have shown code review to be the biggest pain point, but we don't have any good reason to think Gerrit was the main reason for that. Resoundingly, the biggest complaint is the lack of reviewer response; the WMF has so far chosen not to invest significant resources into fixing that. So I worry that 1) this will be a distraction (we feel good that we are now doing something about developer retention, so addressing the real problem is delayed even further); 2) maybe even harmful if GitLab is worse at supporting efficient code review (one thing Gerrit excels at is finding patches; as such it's reasonably okay at supporting our somewhat unusual situation of a huge pile of repos with unclear or lacking ownership, and some repos which are too large for repo-level ownership to be meaningful); 3) it will just lead to more churn (if you have a social system with a limited capacity for supporting newcomers which is already overloaded, and you make the technical means of joining that system easier, you'll end up with the same amount of successfully integrating users but much more deflected ones, who have negative experiences with the Wikimedia developer community and it will be harder to reach them later once we improved things).
To phrase things more actionably, I'd really like to see Gerrit and GitLab compared specifically in terms of their ability to support code review if it remains a largely voluntary activity, not incentivized or rewarded by management. Will it become easier or harder to find unreviewed patches accross repos, by various criteria like "recently registered user" or "productive volunteer contributor"? Will it be easier or harder to track code review health on a global or repo level? Will code review take less or more time? Tgr (WMF) (talk) 03:59, 1 October 2020 (UTC)Reply
I'd add that CI is IMO the one area where tooling can efficiently support code reviewers - tests and linters basically provide automated code review, and they reduce the reviewer burden as long as they provide it in a comprehensible format. This something our current system is really bad at - patch authors need to figure out what went wrong by parsing dozens of pages of console logs, a terrible experience for new developers (and an annoyance for experienced ones). I'm not sure how much that is an issue with Gerrit though. It had the ability for years to filter out bot noise from review conversations, for example, and we haven't bothered to make use of it until recently. Since recently it has the ability to assign test errors to specific lines and show them in context, and there is no organized, resourced effort to convert our test tooling. So again I don't know if the switch would address the real issue there. Does GitLab even support inline CI comments? From speed-skimming the docs, my impression is it does not (interactive CI debugging OTOH sounds like a really cool feature, but it is not for beginners). Making sure all of our major test/lint tools play nice with Gerrit features like inline comments and fix suggestions could IMO be more impactful for new developer retention while being a less ambitious (ie. less risky) project. Tgr (WMF) (talk) 05:35, 1 October 2020 (UTC)Reply
We have the SonarCloud job reporting inline comments for issues it detects (via https://github.com/kostajh/sonarqubebot ). For other linters, we would need the glue that process a linter report and emit the comments. That is T209149 Have linters/tests results show up as comments in files on gerrit. Antoine "hashar" Musso (talk) 14:21, 1 October 2020 (UTC)Reply
@Hashar yes, and it is not on any team's roadmap (much less on the annual plan) to do so. Kosta has done an amazing job with SonarCloud, and there is a working group doing great work, but it's mostly a personal effort that is happening due to the dedication of the participants, and to the extent they can find free time for it. Meanwhile we are considering this moonshot project to address a problem when there are bigger problems that could be addressed with far less effort.
I don't want to downplay Gerrit's UX weaknesses, it is certainly a serious problem for developer retention. I find the arguments that we should at some point migrate away from it convincing, and as a superficial first impression GitLab seems like a decent place to move to. But given there are problems which are more severe and can be addressed with less cost and less risk, it feels a bit like a prioritization fail. Tgr (WMF) (talk) 19:19, 1 October 2020 (UTC)Reply
I have no comment on all the nuances described elsewhere on this talk, but I can say that Gerrit is a huge bar to contributing. I don't understand any of it (to be fair, I haven't tried, and don't intend to learn) -- I know two commands and I get by on them. So maybe it's not the biggest bar in practice, but it's a psychological / "can I really be bothered" bar. Verses just knowing what to do, and being able to spend your time on the code rather than on learning Gerrit. Most devs, especially volunteer ones, will not be exclusively contributing to MW. And I would hypothesise it's likely most other projects they contribute to are on GitHub, or using the GH flow. Hence it's more intuitive and a lower barrier to entry.
I think it would certainly help improve contributions. Admittedly, last I used GitLab I didn't have that much love for it (many years ago now), but it is certainly a big improvement, and I think it's better in the long term. I do not think Gerrit is sustainable if we think about the years ahead, when I think these kinds of tools will become more and more forgotten. My opinion: the quicker MediaWiki moves on from Gerrit, the better. And I hope one day something is done about phab too, although that is more a preference rather than a problem.
Btw, respect for everyone who has made Gerrit work this long and tried to abstract away the barrier to entry. Not trying to diminish that work, by any means. But I think there's only so far you can go. ProcrastinatingReader (talk) 16:39, 17 October 2020 (UTC)Reply

Can we wait for a while?

[edit]

I want to second what @Tgr has said in the previous feedback. We just upgraded gerrit to gerrit3 and it the first time that they actually had UX in their product. Even though there's still a lot to improve but I find the current gerrit much better than the previous ones. Amir Sarabadani (WMDE) (talk) 06:21, 15 September 2020 (UTC)Reply

How much do you think we should wait and what should we wait for?
Gerrit 3 upgrade happened on 2020-06-27. At the end of September, that will be 3 months. I feel like that is sufficient for active developers to form their opinion. I'm not sure how many new developers have started contributing after the upgrade, but I think that would be the group we are most interested in to hear from.
Another Wikimedia Developer Satisfaction Survey would be nice, but it does not align with the consultation timeline. Interpolating from previous years, next such survey would be in first quarter of 2021. Nikerabbit (talk) 13:33, 15 September 2020 (UTC)Reply
> How much do you think we should wait and what should we wait for?
I have no idea, I don't think I know enough to say with confidence how much is okay and how much is not. Amir Sarabadani (WMDE) (talk) 07:51, 17 September 2020 (UTC)Reply
Personally, I'm not sure more time would be needed here. I assume that most of the people who are familiar with Gerrit already and are active contributors (and thus likely to be aware of this consultation) are already aware of the upgrade and the included improvements. Similarly, I assume the people who do not fit in that group will not be able to become active users of Gerrit enough to make a real comparison (or, rather, that number is very small). Greg (WMF) (talk) 16:29, 17 September 2020 (UTC)Reply
My opinion is that the frosting color changed, but the functionality is the same. Gerrit 3 did nothing to support feature branches, for example. Adamw (talk) 09:14, 18 September 2020 (UTC)Reply
What do you find in GitHub/GitLab feature branches that is lacking in a gerrit patchset (or a stack of gerrit patchsets)? KHarlan (WMF) (talk) 09:17, 18 September 2020 (UTC)Reply
> What do you find in GitHub/GitLab feature branches that is lacking in a gerrit patchset (or a stack of gerrit patchsets)?
Thanks for the prompt, I'll expand a bit:
  • It's difficult for multiple people to collaborate on a Gerrit topic. Adding new patches goes against our amend model, so the "correct" workflow is to amend patches in the middle of a chain. However, the consequence of rewriting history is that Gerrit doesn't have the information needed to prevent you from doing bad things, such as overwriting someone else's amended patchset with your own changes to the same patch. Therefore, what I've seen is that we limit collaboration (only one person coding), or workaround (create temporary patches that the branch "owner" must squash manually.
  • As EMcnaughton mentioned, the conversation becomes fragmented across multiple patches, often repeating the same points. A feature branch is a very natural scope for conversation, a patch not necessarily so.
  • Feature branches are merged atomically. Gerrit patches are merged individually. In Gerrit, it's possible to accidentally get into a situation where half of a feature is merged and the remainder is pending discussion. With feature branches, you can create the same outcome by splitting the branch into a predecessor branch which is safe to merge independently.
  • Visualizing branching can be very helpful. Git includes this functionality, but AFAIK none of our Gerrit web tooling can do this. For example: https://gitlab.com/wmde/mediawiki-extensions-VisualEditor/-/network/wmde-alpha-deploy Adamw (talk) 09:51, 18 September 2020 (UTC)Reply
  • Rebasing an entire branch is very loud under Gerrit, test results are sent for every patch. In a merge-request paradigm we would only alert for the branch head. Adamw (talk) 10:36, 18 September 2020 (UTC)Reply

Let's do this!

[edit]

As a Debian Developer I use Debian's Gitlab (https://salsa.debian.org/public) regularly and it's just so much more pleasant to work with compared to Gerrit and our CI systems. Creating repos, pull requests, defining test pipelines, everything is intuitive and muscle-memory friendly. Conversely, after 4 years of using Gerrit on a daily basis I still forget to "Publish edits" at times. It seems I'm not alone, judging by the experience @Matthewrbowker and others shared here.

To add another piece of anecdotal evidence: when I'm sharing repos with folks outside WMF I give them our Github mirrors instead of the Gitiles equivalent (eg: https://gerrit.wikimedia.org/g/operations/software/fifo-log-demux) because the clone URL on Gitiles is broken, and has been for about 2 years: https://phabricator.wikimedia.org/T206049. Although I suspect this could be fixed, it's not clear to me why we should deal with these sort of issues instead of going with a system that just looks and feels better under all points of view, and is well known and understood by people outside our community. ERocca (WMF) (talk) 15:49, 15 September 2020 (UTC)Reply

+1 non-intuitive experience - though I have only been here a year+ I don't think that a year+ should be how long it takes to become comfortable with an integral system that you interact with on a continuous basis.
+1 to external integration, as an outside to gerrit I still use our github mirrors to navigate code because of familiarity. Gerrit is not a system which is widely adopted and creates a physical obstacle for onboarding for volunteers and staff alike. Though be no means insurmountable it requires a conscious shift in developers expectations. WDoran (WMF) (talk) 18:44, 15 September 2020 (UTC)Reply
Every system has its own little learning curve. My main editor was Microsoft Edit when I was a teenager, when I got introduced to vi I happily dismissed it after a few course but something stuck to me: it is in the POSIX norm and is thus available on any UNIX like system. A few years later as a telecom professional I was pleased to get vi on a network router. I eventually started to use it on a daily basis and almost 30 years later it is now my editor of choice. I should have a look at a modern editor such as Microsoft Visual Studio, a far descendant of my childhood and beloved Microsoft Edit.
Gerrit is a complicated system for sure, it has a rather unique workflow (albeit not that different than the Github request) and for sure our instance UI has been stuck in the past until we finally got it upgraded in June 2020. It definitely has a steep learning curve, since habits and all tutorials usually tells you to just git push origin which is possible on Github cause you act on your personal repository. On Gerrit the repository is shared, and surely you can't change a reference branch that is collaboratively updated by everyone, so instead of forking a repository, there is that unique flow of pushing to a special reference to create/update a change.
Is that complicated? Not really: under the hood that works very very likely to the Github pull request model. Is that unfamiliar? If you are used to Github for sure it is definitely confusing, but the first time you tried contributing to a github repository, you might have tried to push your first feature branch directly to the canonical repository only to be rejected and left wondering what to do. That was my first experience with Github and it took me a while to figure out the flow: I was forced to create a full fork, push it to it, head to the web interface to file a pull request which kind of forced me to write a summary/copy paste my carefully crafted message. Was it non-intuitive to me? Yes for sure.
My point is that each system has a learning curve which might be steeper. The vi editor when used in a terminal is definitely a lot of fun in that regard since you can not trivially quit it.
A year is certainly a lot, when we moved from subversion to git and Gerrit I was in the task force to train and level up people on git and Gerrit. It surely proven to be challenging for a few, but eventually I believe most people adjusted just fine after a couple weeks, or maybe a month or so. For sure the support request dried out rather quickly. We were all new to the system, made a lot of mistakes and eventually became familiar with it.
I guess the summary is having good tutorials and trainings are important for any new system and on boarding is something we should work on (and that applies to any system we currently have or will have in the future).
As for code browsing. Our Gerrit had two other different browsing system that did not pleased us. In 2014 we had a plan to migrate to Phabricator code review system (Differential) and Phabricator has its own code browsing system. Eventually that plan fall apart. We replaced Gerrit code browser with gitiles (which is used by upstream) and that seemed to address most of our concerns. We haven't quite looked back at it though leading to oddities such as Emanuele request ( https://phabricator.wikimedia.org/T206049 ) idling for two years. Gitlabs at least seems to address that point (or just use Github which we mirror to). Antoine "hashar" Musso (talk) 21:58, 15 September 2020 (UTC)Reply
Gerrit fails hard when working with feature branches, I would say it's a very different model than a pull request where incremental patches can be appended to a branch, and branches can be stacked on top of one another or provisionally merged for experimental deployment.
Here's an example of a complicated, experimental branch-and-merge which was almost trivial in Gitlab, but would have caused my untimely demise if I had tried it under Gerrit. I did try and found myself churning through the entire patch chain many times, and unable to show how some branches are independent and others dependent.
I believe that Gerrit has been stifling our development workflow. Adamw (talk) 09:12, 18 September 2020 (UTC)Reply
I guess just like you did on gitlab:
  • Create branches
  • Send patches for each of the branches
  • Get them reviewed merged
  • Craft a merge commit for all the branches and send the merge commit for review. Yes the merge action becomes its own change that itself has to be reviewed.
Gerrit comes with per commit review, but that does not prevent one from using feature branches. Antoine "hashar" Musso (talk) 14:16, 1 October 2020 (UTC)Reply
I am sorry, but you might want to explain what happens in that feature branch and what was your workflow cause that link does not help me at all understand. AKosiaris (WMF) (talk) 15:44, 22 September 2020 (UTC)Reply
We have several proof-of-concept features built on top of upstream master. Each branch is kept independent if possible (based on origin/master), or stacked head-to-tail if there are dependencies between the features. Then, we create an ephemeral octopus merge putting all the features together, so we can deploy to an internal testing server.
None of that is fun with Gerrit. Stacking branches on top of each other creates a long chain of patchsets, and updating any one change requires rebasing and pushing all the other changes, creating a flood of IRC and CI messages. If I'm working on feature "B", and a colleague is working on "A", we're likely to destroy each other's patchsets because Gerrit can't tell the difference between a meaningless rebase of a parent patch, and a substantive edit. Adamw (talk) 18:17, 22 September 2020 (UTC)Reply
It's now clearer to me, thanks for explaining it. I 've experienced the gerrit pain regarding constant rebasing as well, so I sympathize. That being said, I 'll admit I have no idea how that works better in Gitlab, I 'll try and expose myself to it in our test instance AKosiaris (WMF) (talk) 19:00, 22 September 2020 (UTC)Reply
Gitiles has one thing going for it: it is fairly fast (for large files, can be much faster than Github) which is important for a repo browser. But otherwise it is a pretty crappy experience, yeah. Github is on an entirely different level (recently they added code navigation where you can just click on any method name to jump to the function definition, which is a killer feature). GitLab is of course not as good but still seems pretty pleasant at a first glance (they copied many obvious wins from Github, like multi-line permalinks; and in general the layout and navigation is much better). Even if we ended up staying with Gerrit, setting up a GitLab instance just as a repo browser and replacing Gitiles links with GitLab links would be an improvement.
Re: Adamw, yeah, the inability to support octopus merges is an occasional pain point in Gerrit (when I write a patch that needs a number of independent supporting changes which I want to make into separate patches for cleaner git history, Gerrit forces me to serialize them in a single patch tower, which forces a particular order on reviewers for no good reason, and also increases the number of rebases).
Gerrit can tell the difference between a rebase and an edit (it will say so in the comments), you could probably filter out rebases from email/IRC if you wanted. Tgr (WMF) (talk) 06:21, 1 October 2020 (UTC)Reply

How to best preserve the current commit message standards in the GitLab system?

[edit]

As a third-party developer who works with the MediaWiki codebase on a daily basis (and occasionally contributes back), I am very grateful for the commit message guidelines that have been present and enforced for most of the MediaWiki codebase for several years now. I find it very helpful both when browsing historical changesets to understand the exact context and rationale of a given change, and when contributing code via Gerrit, where it helps reviewers understand the patch being submitted. Crucially, in the Gerrit review interface, the commit message is displayed prominently and can be reviewed like any other code, which helps in maintaining the aforementioned high quality standards and results in a clean and understandable revision history.

Unfortunately, GitLab's PR facilities seem to suffer from the same shortcoming as GitHub's in that they treat the commit message(s) as a separate entity from the pull request title and description. A Gerrit patch, when merged, will generate a single commit in the revision history that will have a commit message that was clearly visible to (and thus most likely reviewed by) the reviewer(s) of that patch; a merged GitLab PR, by contrast, is liable to pollute the revision history with multiple irrelevant checkpoint commits whose commit messages cannot be reviewed via the web interface. This seems like a step back for me, and I worry the quality of commit messages and the structure of the revision history will suffer as a result. It seems the only way to get the same output here requires reviewers to take the extra step of browsing the commit view for each PR they review, make sure it matches the PR title/description, and ask contributors to squash commits as needed. This is lots of manual work both for the reviewers and the contributors, while it is significantly simpler in the Gerrit/git-review workflow.

So, my question is—how can we make sure our commit messages and revision histories continue to be as readable and useful as they are today in a PR-centric world? TK-999 (talk) 19:39, 17 September 2020 (UTC)Reply

So, my question is—how can we make sure our commit messages and revision histories continue to be as readable and useful as they are today in a PR-centric world?

This is an important question, thanks for framing it so concisely.
I do want to qualify this understanding a bit:

It seems the only way to get the same output here requires reviewers to take the extra step of browsing the commit view for each PR they review, make sure it matches the PR title/description, and ask contributors to squash commits as needed. This is lots of manual work both for the reviewers and the contributors, while it is significantly simpler in the Gerrit/git-review workflow.

It's worth noting that GitLab provides the option to squash commits and edit the commit message on merge, as well as some per-repo settings about how merges should be performed (merge commits, optionally with MR description included, fast-forward merges only with mandated rebasing, etc.). These aren't equivalent features to the Gerrit approach, per se, but they're probably useful tools if we adapt our practices to this model. See also the Try before you buy? topic elsewhere in this discussion. BBearnes (WMF) (talk) 20:54, 17 September 2020 (UTC)Reply

Lowering the barrier to contribution is part of the mission

[edit]

I believe reducing barriers to contribution (which Gerrit is, despite its great parts) is worth any potential productivity impact. Building an industry best practice replacement for ORES would be a lot faster if we used closed-source industry tooling like AWS or SaaS model management systems. However we don't because how we build things is itself a reflection of our mission: open source tools because we believe in open knowledge.

Similarly, reducing the barriers to volunteer contribution is also a reflection of our mission. It might reduce productivity, but I believe that is just the price we pay to make Wikipedia as open (to contributors, to lurkers, to students, etc) as possible. CAlbon (WMF) (talk) 19:56, 17 September 2020 (UTC)Reply

I think it's probably variable whether switching from gerrit to gitlab would have a productivity cost. I see gerrit as reducing my productivity whereas your comment implies others see it as increasing theirs. (I currently use github, gitlab and gerrit on a daily basis) EMcnaughton (WMF) (talk) 22:05, 17 September 2020 (UTC)Reply
Indeed, we have to balance the cost of anything we do here against the effect it has on staff productivity. We wouldn't, for example, tell every member of staff to spend 50% of their time on volunteer outreach. ESanders (WMF) (talk) 00:06, 18 September 2020 (UTC)Reply
Switching from gerrit to (any other system) will definitely have a high productivity cost (at first). Any change of an existing UI will slow down development overall while users have to spend time on learning a new system. The question is just how long that slow-down takes and at which point advantages of a new system are starting to outweight that impact. Mutante (talk) 16:40, 18 September 2020 (UTC)Reply
The cost in not just learning a new UI (many will already be comfortable with GitHub/Lab UIs already), it's a different workflow. See Talk:Wikimedia Release Engineering Team/GitLab#h-Feature_requests-2020-07-01T15:48:00.000Z. ESanders (WMF) (talk) 18:01, 18 September 2020 (UTC)Reply
(and to repeat it here: if you have concerns about workflows, please help us do some experimentation using our test instance: https://www.mediawiki.org/w/index.php?title=Talk%3AGitLab/2020%20consultation#c-BBearnes_%28WMF%29-2020-09-16T23%3A04%3A00.000Z-Try_before_you_buy%3F ) Greg (WMF) (talk) 18:23, 18 September 2020 (UTC)Reply
Indeed - I second the general point that easing the barrier to entry for new code contributors is important and is part of the mission.
(I came across this consultation via the fediverse and wave hello and best wishes to my past colleagues! (I used to work at Wikimedia Foundation, and was one of the people who helped with the Subversion-to-Git transition.)) Sumana Harihareswara (talk) 15:16, 23 September 2020 (UTC)Reply
I was an outreachy intern for the round 20 and I would like to share my experience with Gerrit.
Initially the project (gdrive-to-commons) for which I was selected was hosted on Github. My first task was to migrate it to Gerrit. I had no prior experience with Gerrit and the whole UI and workflow seemed to be very strange initially.
However, When I went through the Gerrit documentation and followed the tutorial, it took me just a couple of hours to understand the workflow and get accustomed to the UI.
After using Gerrit continously for 3 months, I have developed a fondness for the platform. I really like the code review procedure, in many ways, I even find it supreior to Github. For example, Unless commits are squished before merging, it is a mammoth task to trace back the changes made on a Github repository. The Patch Set based review system seems really intuitive. Every patch set has comments which help in understanding what change was made and why. I also believe since there is no concept of forking the repository, It does save a lot of memory and that makes a lot of sense to me.
Personally, I would like to WMF to continue with Gerrit. I don't think Gerrit creates a barrier to volunteer contributions at all. Abbasidaniyal (talk) 16:19, 24 September 2020 (UTC)Reply

Re: Why "not" Gerrit?

[edit]

I think some folks have rightly pointed out the reasons to not switch but I think those have to viewed in context to the future - how long can we keep using Gerrit with its annoyingly high learning curve is the primary question? Is the transitionary, painful period for present Gerrit developers worth the potential new contributors - I believe it has a nuanced answer, but I still think it's one that's positive. Some people like GitHub and some people like GitLab, but what we do know is that a lot of people definitely don't like Gerrit, and I think we are in a good place to start moving towards something better (atleast relatively). QEDK (talkenwiki) 20:58, 17 September 2020 (UTC)Reply

As someone who has used gerrit for around 5 years, github for a bit longer & gitlab for a lot less I still find gerrit the hardest to work with by a strong margin - I suspect I still don't know how use it 'properly' but the number one thing I hate is the way discussions on commits don't really flow. Since I'm often involved in upstreaming I often link to upstream PRs, with screenshots & all the UI niceness around discussions on PRs (I prefer the github UI but will go with gitlab at a pinch), and try to redirect discussion there rather than try to parse it out of gerrit.
I do like the +1 & +2 system in gerrit & the download links EMcnaughton (WMF) (talk) 21:43, 17 September 2020 (UTC)Reply
As a counterpoint, I also use Github and Gerrit regularly and find Gerrit much easier for actually managing my commits. It would be nice if the discussion system was better, but you can always use Phabricator. ESanders (WMF) (talk) 00:10, 18 September 2020 (UTC)Reply
I find working with Gerrit much easier than working with Github. If you forced me to start using Github tomorrow it would be an "annoyingly high learning curve". Mutante (talk) 16:38, 18 September 2020 (UTC)Reply
Isn't that to some extent muscle memory? I'm asking because I still have to open Gerrit/Tutorial for all the commands to run, every time I plan to push something for review into Gerrit.
It is the same in GitLab for me, but the number of steps isn't very different: If I don't fork I'd end up in GitLab with git checkout -b mybranch origin/master, edit, git add, git commit, and git push origin mybranch, and then create a merge request in the web UI. AKlapper (WMF) (talk) 16:35, 21 September 2020 (UTC)Reply
Personally, the annoyingly high learning curve of gitlab isn't just about muscle memory (but it is in part!). It is about a completely different workflow for patches beyond a certain level of complexity. EBernhardson (WMF) (talk) 15:46, 22 September 2020 (UTC)Reply
Gerrit concepts map to git concepts very well, so if you understand git, Gerrit is super intuitive to use (not the UI, of course, but the patch wrangling part). Gerrit is basically just a git repo with a fancy web interface; changesets are basically just reflogs etc. The same is true for Github as well, except the Gerrit workflow models git rebase, and the Github workflow models git merge, and rebase is the one you actually want for any large projects because otherwise history becomes a mess and your ability to understand the codebase suffers. So with Github, you end up using a merge-like workflow while making sure that internally it's actually not merge-based at all. That mismatch makes it unintuitive, IMO. Tgr (WMF) (talk) 07:56, 1 October 2020 (UTC)Reply
I don't really agree with what Tgr said. If you're familiar with Git, Gerrit goes and adds gitreview on top of that to make it all that more complicated. I personally don't think it's "too" complicated but the notion that Gerrit is somehow more in line with standard Git workflow is misplaced. (And I'm saying this as someone who became aware of both Gerrit/GitLab/GitHub in the last 5 years.)
If anything, GitLab/GitHub makes simple commits or even multiple simple commits that much easier to handle. For more complicated scenarios, I believe the experience is pretty much on par but fixing even a few of the myriad issues with using Gerrit makes it that much easier for our new folks. Furthermore, I don't agree that GitHub has to be a merge-based workflow, it's just more common, it differs from maintainer to maintainer and some of the repos even have settings to disallow merges in protected branches, most OSS repos will require you to rebase your pull request commits before they are squashed/rebased in. QEDK (talkenwiki) 07:56, 2 October 2020 (UTC)Reply
Evety git-review command (except for the one setting up the commit hook) is a convenience wrapper around a basic git command. git-review -x does a cherry-pick. git-review -d does a checkout. git-review without an option does a push. It's more things to memorize, sure (especially with the shortkeys not being particularly sensible), but conceptually it is still a simple git command. You don't have to juggle multiple repositories, either (although in the end repositories and remotes are also simple concepts if you properly understand git's graph model).
As for rebase workflows in Github, the UX didn't really support them the last time I checked (you can do it, but you won't have a sensible audit trail, old comments won't really be useful...). And GitLab Comunity Edition does not seem to support stacked merge requests, much less rebasing them, but that has already been discussed in plenty other threads. Tgr (WMF) (talk) 02:34, 5 October 2020 (UTC)Reply
You're definitely correct in saying so. But I believe the short-term cost of transitioning is much lower than the long-term churn rate of potential new contributors because of Gerrit. QEDK (talkenwiki) 08:55, 19 September 2020 (UTC)Reply
I think the main difference is that we know Gerrit pretty well, so we are sort of able to estimate how big an investment it would need to fix certain issues (rather big). With GitLab we don't know yet how many issues and how much friction we'll encounter in the first few years after adopting it, so it's easy to underestimate the costs. Nemo 15:58, 22 September 2020 (UTC)Reply
I find GitHub to have a much more pleasing, inviting interface & workflow compared to Gerrit. However, once things become a little more complex, I (currently - possibly biased because I still used it way more often) heavily favour Gerrit.
It's easy to advertise a GitHub (-like) interface & workflow as being more inclusive & newcomer-friendly. It obviously is, but that means nothing unless those contributions end up getting merged (if not, we're only creating more frustration.) Being more friendly to newcomers is not a selling point until we can be assured that the workflow has no negative implications for repo maintainers (or are outweighed by the positives.)
I.e. pushing commit after commit onto a feature branch sure does seem simpler than carefully having to carefully maintain, amend & rebase a few dependant patches, but it's simply moving the cost up to the repository maintainer: code review becomes harder (code spread all over the place), and history becomes broken (unless commits get squashed after all.)
Someone has to pay for complexity: if it's not newcomers, then it's the maintainers (and if they are not willing or able to do it, the extra patches still aren't going anywhere.) I'm not currently sure how much of this "newcomer cost" GitLab would actually remove, and how much it would simply move the burden... Mmullie (WMF) (talk) 14:13, 28 September 2020 (UTC)Reply

Breaking links?

[edit]

Would our switching away from gerrit for work involve us actually shutting down the gerrit instance at gerrit.wikimedia.org? If the plan is to just switch it to read-only, the rest of this comment can be ignored.

I ask because there's a lot of links out there to patches on our gerrit instance, and it'd be unfortunate if we just 404'd them all, particularly since they often contain useful discussions and context for how a particular changeset reached its final form.

You can fetch this information from the `refs/changes` part of the gerrit repo, so presumably at least that part of the repo could be preserved. That said, although the information is there it's not particularly usable to have to browse through JSON blobs in hidden metadata commits like this.

It seems like a useful requirement for this (or any other) switch would be a static dump of our existing gerrit instance for posterity. 2600:6C40:5800:12FA:B01F:CBA1:751F:D130 (talk) 23:01, 17 September 2020 (UTC)Reply

Sorry, that was me. Forgot I had a fresh Chrome install so I'm not logged in anywhere. :D DLynch (WMF) (talk) 23:02, 17 September 2020 (UTC)Reply
Support for this. Gerrit links are all over the place, please let's not break them. Also all the comments on Gerrit changesets are valuable information that should be kept, pleaaaase. Mutante (talk) 23:53, 17 September 2020 (UTC)Reply
All the Gerrit comments are stored directly into the git repository. I crafted an example for this reply, a couple inline comments in the web interface at https://gerrit.wikimedia.org/r/c/test/gerrit-ping/+/610907/1//COMMIT_MSG are stored as an entry under refs/changes/07/610907/meta which can be seen at https://gerrit.wikimedia.org/r/plugins/gitiles/test/gerrit-ping/+/17d1ffec299ff8284e7fdb80d55aa3ba70847d23%5E%21/
One could theorically just spawn Gerrit on their local machine and browse the change comment from there. Or potentially some utility could be rewritten just for that purpose.
As we did for the Bugzilla to Phabricator migration, it is surely possible to extract all those comments from Gerrit and reproduce them as merge requests in Gitlab. For URL it might not be as trivial though, since a Gerrit change number / patchset would not translate 1/1 to a Gitlab merge request id :-\ If we migrate to Gitlab, most probably we would keep a read-only Gerrit instance just for the purpose of easily accessing history, much like we are still keeping around our old codereview tool: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/statuschanges Antoine "hashar" Musso (talk) 07:17, 18 September 2020 (UTC)Reply
We are actually working to undeploy the CodeReview extension: https://phabricator.wikimedia.org/T116948
We've done a dump of the data and are waiting on url redirects: https://phabricator.wikimedia.org/T205361
https://static-codereview.wikimedia.org/MediaWiki/1212.html
So, doing something similar for Gerrit seems reasonable (a static dump of a crawl of Gerrit, with url rewrites). Greg (WMF) (talk) 17:14, 21 September 2020 (UTC)Reply
My concern was that this would be considered too much work / too hard (as you already said for URL it's not trivial) and in the end it wouldn't happen. But if that's not the case and existing URLs will keep working that's great. Mutante (talk) 16:36, 18 September 2020 (UTC)Reply
I would agree, keeping links working to a static copy (as was done bugzilla back when) or somesuch would be very valuable. Especially as tasks/patchsets can linger here for a long time and then actually bubble up and get worked on. CPettet (WMF) (talk) 16:49, 21 September 2020 (UTC)Reply
One of the trickier parts is partial change IDs - those are used quite a bit in commit summaries (like this). Would be nice if those could keep working. Tgr (WMF) (talk) 05:46, 1 October 2020 (UTC)Reply

Be stay with "Gerrit"

[edit]

I am not going into details but I suggest "Gerrit". Voluntary like me heavily depends "Gerrit" interface and workflows. I really like its new version. It will not easy for some of us to migrate easily. Jayprakash12345 (talk) 02:38, 18 September 2020 (UTC)Reply

Can you please go into details after all? Who do you mean when you refer to “voluntary like me” – any volunteers, or specifically volunteers already familiar with Gerrit? We’ve heard from other volunteers that our use of Gerrit interferes with their ability to contribute. Lucas Werkmeister (WMDE) (talk) 10:52, 18 September 2020 (UTC)Reply
The "already familiar with" seems to apply to any software that could be used ever. Users who are already familiar with something will consider it "easy" and those who have never used something before will find it hard. A common fallacy seems to be that people think there is some universal truth to what is "easy" and what is not that isn't related to being familiar with it. There are plenty of people who have not used gitlab before. What, imho, is hurting the most is if the interface keeps changing too often or right after users got familiar with an existing tool. Mutante (talk) 16:19, 18 September 2020 (UTC)Reply

Packaging all changes under one commit

[edit]

One thing that I've really liked about Gerrit is the git commit --amend && git review -R workflow, where even though I've made multiple changes based on comments, I can look back at a particular commit and identify exactly what was changed and which file were affected in a particular change without having to even get out of the current context (command line, Gitiles, Github). The PR workflow on the other hand allows (and in fact encourages) users to make multiple commits for one PR, where one change can get spread over multiple commits, some of them affecting other files as the code review progresses (and/or the user realizing any mistakes they have made). This can make it sort of difficult to isolate particular changes since I don't think we will have much context as to which PR a particular commit relates to outside the Gitlab instance. Sohom Datta (talk) 16:14, 18 September 2020 (UTC)Reply

Please see the "squash merge" comments elsewhere on this page, I think it offers a compelling substitute. At the same time, it makes it easier to follow the evolution of a larger patch by looking at the original feature branch. Adamw (talk) 19:33, 21 September 2020 (UTC)Reply
Aren't feature branches usually deleted after they are merged? Nikerabbit (talk) 14:25, 22 September 2020 (UTC)Reply
Feature branches are typically deleted after a merge (there's a checkbox for this on the merge request in GitLab), but their history is still viewable on the merge request itself. See here for an example. BBearnes (WMF) (talk) 16:33, 22 September 2020 (UTC)Reply
That's only visible in GitLab itself, right? If so, that's a less than ideal solution for the case of doing squash merge on big changes. For example, you cannot git bisect them. Nikerabbit (talk) 06:54, 23 September 2020 (UTC)Reply
Squash on merge is wonderful. You lose a little detail but you definitely get a similar behavior. Without squash on merge I would have thrown my computer out the window many times looking for a particular feature change. CAlbon (WMF) (talk) 14:44, 22 September 2020 (UTC)Reply

Improving Gerrit

[edit]
One topic that has come up in this consultation is Gerrit improvements. Why can't we make improvements to Gerrit to fix our problems with it? Isn't Gerrit upstream making improvements?
I've added some additional rationale that I felt was missing from this proposal to a subsection to the "Why" section of the document. This hopefully explains my perspective as someone involved with both the GitLab proposal and as someone who's involved with the maintenance of our Gerrit install.
The core of the issue is that we are rather singular in our use of Gerrit. The other companies running Gerrit are Gerrit developers -- this is the level of familiarity required to perform administrative tasks and add features. Maintenance and upgrades (in my limited experience) require non-trivial interaction with upstream to perform.
Simple improvements like avatars, renaming users, or renaming repos are either not support at all or implemented via plugins with varying levels of support. Bigger improvements I'd like to see like two-factor authentication and anti-vandalism tools are not on the upstream roadmap at all as many installs are either protected by Google's authentication or behind firewalls. We are the only large open install using LDAP authentication which I worry about constantly.
Further, upstream technical decisions (as is likely true for any project) are sometimes questionable. One technical decision that, collectively, the Gerrit community is still realizing the ramifications of is forgoing a traditional RDBMS in favor of NoteDB. This has led to multiple problems ("external id already in use" for example) and is somewhat broken for our use by dint of an upstream bug that has been open for 2 years at this point.
All of the above is to say that while I see Gerrit improving in its UI and UX, administration is difficult and key features are lacking and I see little reason to believe they'll ever be addressed. Should we stay with Gerrit, we need a non-trivial dedication of resources to drive forward any new features we'd like to see. TCipriani (WMF) (talk) 18:14, 18 September 2020 (UTC)Reply
Thanks for this context and rationale, @TCipriani (WMF). Do you feel like we could get to the point where our Gerrit instance has the features used on https://gerrit-review.googlesource.com or is this an example of what you're talking about where there is not a lot of documentation or guidance for non-Gerrit developers to reach this level of polish and integration? Specifically, looking at a patchset from their gerrit instance https://gerrit-review.googlesource.com/c/gerrit/+/282695 there are some things that I think could make the developer / user experience quite a bit nicer for us:
  • Using a "Checks" tab to report back the Jenkins job results, with the ability to re-run
This is part of the checks plugin as I understand it, which is part of a CI system that upstream has been using for a bit. I think we'd have to write our own plugin if we didn't move to that CI system to get our own version of that.
  • Separating "Code Style" from "Code Review" in the votes
This is a custom label, and could be done quickly/easily.
Actionsets are a cool feature that are present in an as yet unreleased version.
  • the performance seems substantially better on their instance (sorry for such a vague metric, looking at console it seems like there are a few hundred ms of difference in various loading times but it seems to add up subjectively into a more pleasant experience when navigating files and patches)
This could be a few things:
  1. This could be a Gerrit instance closer to you than our instance in Virginia
  2. There is an experimental cache that was recently discussed for the aforementioned unreleased version
These features are a good illustration that most of the other large gerrit installs are run by gerrit developers/have proprietary features (in the case of a multi-DC Gerrit, which is speculation from me). We don't run unreleased versions or experimental caches. TCipriani (WMF) (talk) 13:46, 30 September 2020 (UTC)Reply
"We are the only large open install using LDAP authentication which I worry about constantly." +1 CPettet (WMF) (talk) 16:47, 21 September 2020 (UTC)Reply
We really want to move authentication to OAuth ( https://phabricator.wikimedia.org/T147864 ) so we can benefit from two factor authentication, global deactivation of account etc. Eventually that will mean using https://idp.wikimedia.org/ a single sign-on portal based on Apache CAS which was recently introduced. Antoine "hashar" Musso (talk) 14:53, 30 September 2020 (UTC)Reply
Could you shine some light on what specific security/anti-vandalism improvements you are thinking of? A topic further above seemed to indicate that the most common of those features that GitLab offers are proprietary and thus unusable to us. Michael Große (WMDE) (talk) 18:22, 29 September 2020 (UTC)Reply
Could you shine some light on what specific security/anti-vandalism improvements you are thinking of?
Sure! As I mentioned many of the comparable Gerrit installs are behind firewalls, so there aren't a lot of features that are built for instances where registration is open.
  • Up until recently banning a user required logging out all users. This took 5 years to get fixed upstream.
  • There are no plugins for any form of 2-factor authentication.
  • There is currently no upstream tool to revert all the actions of a single user.
  • There is no upstream tool to revert the actions of a single IP.
  • There is no tool to see all the activity of a single user or IP.
  • Upstream plugins that enable rate-limiting only allows rate-limit SSH actions and not HTTPs actions; however, you can achieve the same things via SSH and HTTPs.
  • There's no mechanism for individual users to report abuse, and likewise no abuse queue.
  • There is no "sync" mechanism between our ldap and Gerrit: gerrit fetches information from ldap on first login and that's it. If we delete a user in ldap, we must also delete that user in Gerrit.
the most common of those features that GitLab offers are proprietary and thus unusable to us.
This refers to using recaptcha for registration? If we continue to use ldap (I'm assuming we would), then we wouldn't register people via this mechanism but rather via Wikitech. TCipriani (WMF) (talk) 20:49, 29 September 2020 (UTC)Reply
Being able to edit and delete your own comments would be very useful. ESanders (WMF) (talk) 13:48, 30 September 2020 (UTC)Reply
There is no tool to see all the activity of a single user or IP.
Yeah, lack of any kind of user audit functionality is a big problem not only in terms of anti-abuse but also for things like community management (is this user a good CR+2 candidate?), mentoring and productivity (would be nice to be able to do a quick review of what work I did this week or month, but in Gerrit that's pretty much impossible). Upstream is not super interested in fixing, either.
(In general, in my very superficial impression Gerrit upstream is rather unresponsive.) Tgr (WMF) (talk) 06:41, 1 October 2020 (UTC)Reply

Gerrit workflow

[edit]
I prefer the Gerrit workflow, where I can clone a repository, create changes & commit them and then push the change for review. I don't even have to create a branch! It is a really effective workflow for users who create patches against multiple repositories without being very involved in the development of that repository. A Github workflow (and from what I understand a GitLab workflow too) requires the user to fork the repository, create a branch and then a merge/pull request in the UI. If they (like I am prone to) forgot to fork first, they'll have to fork first, adjust the origin of their local repository and then go through with the other steps. This seems more work than with Gerrit currently.
It especially impacts users who do a lot of maintenance work on extensions and skins. I'd rather not have to maintain dozens upon dozens of forks just because I'm trying to deprecate a feature (for which all production usages have to be removed first, across the dozen of repositories used by WMF's MediaWiki). Mainframe98 talk 18:22, 18 September 2020 (UTC)Reply
GitLab also allows a workflow which does not require to first fork a repo before creating a branch. That require permissions though (e.g. being in a developer group). See e.g. https://wiki.gnome.org/GitLab#GitLab_workflow_for_code_contribution AKlapper (WMF) (talk) 12:38, 19 September 2020 (UTC)Reply
If existing developers (perhaps trusted-contributors) are granted such permissions (with master protected) then that would make things easier, especially if, like mentioned creating a branch from the CLI is an option. It seems a bit wasteful to have multiple branches littering the repository, as it makes the distinction between feature branch/merge request branch and canonical branches (e.g release branches, collaboration feature branches) less distinct than it is now. Mainframe98 talk 14:32, 19 September 2020 (UTC)Reply
I concur with @Mainframe98 here that I prefer Gerrit's workflow. I've used gitlab in the past to contribute patches to other projects and I found the UI difficult at first and still kind of find it difficult.
I must admit I do like that gitlab includes a user interface to actually browse the code but I dislike it's workflow. I like Gerrit's workflow because you just create one commit rather than a merge commit with a ton of different commits because you decided to edit each file using the web ui. Gerrit creates one commit whilst using the webui, so your not like creating lots of commit.
I also like the fact that gerrit shows all recent commits regardless of projects as the main page (doesn't look possible in gitlab?). This allows really anyone to go to status:open and either provide input or merge without needing to know a project to look at.
Users get their own dashboard with Gerrit (I don't think this is possible with gitlab).
Gerrit also provides the command you need to use to amend you commit (if you want to do it locally) or want to test it out. It's also easy to create a change (e.g git push HEAD:refs/for/<branch>).
You also don't have to fork in order to contribute granted when you use the inline editor it forks automatically for you. But the disadvantage is you have a duplicate repo, the advantage is the user can do what ever they want with the repo. So like for example for MediaWiki forking that more than 1 times would use a lot of storage (please do tell me if I got this wrong).
Gerrit makes the life of the reviewer and owner easier by creating a simple user interface. Being able to comment, being able to contribute easily through the interface and also being able to find comments that you found you needed to address but missed it.
Wikimedia has had a great relationship with gerrit upstream. I'm sure if we engage we could get the features/ui changes done that our users request. Gerrit is very extensible via plugins and when it's not, upstream are quite accommodating into adding the extension/plugin points if you give a valid usecase. My work with the Gerrit Frontend Team has been quite good, they are willing to accept any requests as long as it has a usecase (though they've rejected some stuff). Paladox (talk) 21:14, 18 September 2020 (UTC)Reply
@Paladox Could you elaborate the actual problem with "creating lots of commits"? I'm asking because you can squash them into a single commit when merging.
Forking: As I wrote in my previous comment in this thread there are workflows which do not require forking.
Showing all recent commits regardless of projects: GitLab offers that across projects too, within a group. See for example https://gitlab.gnome.org/groups/GNOME/-/merge_requests?state=opened.
Dashboard: Which specific functionality do you use and expect from your "own dashboard"? It's hard to discuss if we don't know which problem a dashboards solves for you.
(Furthermore, I personally disagree that Gerrit "creates a simple user interface". Random example: Try to create a code change within the web interface, without reading the docs. Hint: It is possible.) AKlapper (WMF) (talk) 13:12, 19 September 2020 (UTC)Reply
A commit is meant to be atomic in nature, most code reviewers prefer multiple commits with each commit dealing with one thing only - in fact, the Gerrit way of having patchsets with one commit doing a lot of things is in the minority and only works because of the patchset-based workflow.
Furthermore, Andre's point in the brackets is what's most important, (almost?) every newcomer to Gerrit has found it opaque in nature and contrarian to the pull-request-based workflows that most new contributors are used to. QEDK (talkenwiki) 18:37, 19 September 2020 (UTC)Reply
I strongly concur with@Paladox. Squash merge is not useful if you have multiple atomic commits that depend on each other, e.g., two small refactoring commits and one commit that does the behavior change. If there are any changes needed than you need to either 1. Create follow-up commits that survive the merge and lead to a very messy and useless git history or 2. Create follow-up commits and squash them all on merge leading to a single huge commit that does a ton of things or 3. Rewrite git history in your merge branch for which GitLab has absolutely crappy UI support. I'll might create an extra section for just that.
The dashboard in Gerrit shows me the status of: my WIP patches, my outgoing patches await review by peers, my incoming patches from peers awaiting my review. I use this quite a lot. Can GitLab do that?
As a newcomer you have a bit of a learning curve for the first month or two. That's it. Afterward you get enjoy the benefits of a much better code review experience for years. You also get to work with a git history that is actually meaningful and usable. I think that's worth it. Michael Große (WMDE) (talk) 18:15, 29 September 2020 (UTC)Reply
Squash merge is not useful if you have multiple atomic commits that depend on each other, e.g., two small refactoring commits and one commit that does the behavior change. If there are any changes needed than you need to either 1. Create follow-up commits that survive the merge and lead to a very messy and useless git history or 2. Create follow-up commits and squash them all on merge leading to a single huge commit that does a ton of things or 3. Rewrite git history in your merge branch for which GitLab has absolutely crappy UI support. I'll might create an extra section for just that.
In the commit history for the change you suggest (2 refactoring commits, 1 behavior change commit) the git history created by GitLab (if you choose to merge rather than squash) is more semantically rich than the information in the git history created by Gerrit.
In Gerrit the fact that the behavioral change was the onus for the refactoring is lost and there's no way to represent it in the git history since changes are all separate patchesets. Either you use the default merge strategy (merge-if-necessary ­which is what we use for mediawiki/core) which may or may not (depending on the state of master vs the state of these 3 patchset) create 3 separate merge commits for each of these 3 changes OR (alternatively) add these 3 commits in order in history. There's no information in git history indicating these 3 changes were developed together, that information only exists in the Gerrit UI (it's likely findable also on the individual refs/changes/).
I think rewriting git history is only a bad thing for a shared branch (like a release branch or a mainline branch) I don't think rewriting git history is a bad thing for merge requests. In the GitLab UI you can compare revisions after a --force push in a merge request, just like you can in gerrit when you git commit --amend and push to the refs/for/x special ref. TCipriani (WMF) (talk) 22:41, 29 September 2020 (UTC)Reply
Re Dashboards:
Apart from my still open question which specific functionality you use and expect from your "own dashboard", in my understanding it is only possible to filter across repos via a label (like /dashboard/merge_requests?state=opened&label_name[]=LABELNAME) or within a group, but not on a custom subset of repos. The term "dashboard" in GitLab terminology seems to be about metrics. AKlapper (WMF) (talk) 12:00, 23 September 2020 (UTC)Reply

Some notes from KDE

[edit]

(With my KDE hat on - not everything applies to Wikimedia - hope it's a helpful perspective nontheless.)

We migrated our code review and tasks from Phabricator to GitLab. Migration of more pieces from Phabricator and other systems to GitLab will follow. We've been in a trial-period for a while where individual projects could move and help tweak our configuration and point out bugs or missing features before everyone migrated. Last week as part of our annual conference we celebrated the migration as an important milestone towards modernising our infrastructure and making it more accessible as well as consolidating a number of systems. A few things I observed:

  • Our developers are very happy. Several of them pointed out that they usually hate change and were very skeptical about the move but are now very happy the transition was made. They're looking forward to integrating more of our systems.
  • GitLab has been a very responsive upstream (and even came to the celebration ;-)). We've asked for several critical things to be moved into the community edition that were necessary for us to make the switch and this happened.
  • Our designers, promo people, office assistant and others were very happy they could finally contribute in a system they understood. It was called out explicitly that this seamless cross-team collaboration is a key thing that was missing before. Lydia Pintscher (WMDE) (talk) 18:47, 18 September 2020 (UTC)Reply

Workflow report: force-push

[edit]

I tried to replicate a Gerrit-like workflow, where you force-push new versions of a single-commit change, in mediawiki/core!5 on the test instance. Since the test instance might go away, I’ll copy/summarize my experience here. (Also, I didn’t yet know how comment threading in GitLab works, so you can watch me figure that out in real time over there ^^)

I like that there is a “compare with previous version” link, like a PS(n-1)..PS(n) diff in Gerrit, which is something that GitHub doesn’t have, as far as I’m aware. To me, this means that a force-push-based workflow is at least not completely unfeasible.

That said:

  • I don’t see any way in the UI to compare a larger range of pushes, say PS2..PS5 (supposing that PS2 was the last patch set you looked at). In fact, so far I haven’t even found a way to do this by hand-editing the URL.
  • The commit message is not considered part of the commit (unlike the COMMIT_MSG “file” in Gerrit), so the second force-push, where I removed a Change-Id from the commit message, shows up as “no changes”.

Overall, my impression is that force-pushing is doable, but likely not the best way to go, or not how GitLab “wants” you to do it. I’d like to try out other workflows as well, e. g. a “squash everything on merge” one. Lucas Werkmeister (WMDE) (talk) 17:30, 21 September 2020 (UTC)Reply

This is a helpful investigation, good idea!
My understanding is that rewriting git history is not recommended anywhere outside of the Gerrit workflow. Of course, rewriting is fine locally but as soon as the changes are pushed to a public repo and potentially downloaded by others, rewriting causes problems. I agree that if we were to switch to Gitlab, we should also change our workflows to the natively supported ones rather than try to make our old workflow fit the new system. Adamw (talk) 19:30, 21 September 2020 (UTC)Reply
Pushing each revision as a new commit, while squashing everything on merge, is a replication of the Gerrit workflow:
  • Gerrit patch sets = GitLab commits
  • Merging a Gerrit patch = Squash merging a GitLab pull request
In both cases, you have the ability to review across "patch sets." No force-pushing is needed, and it's in my opinion a good thing to get rid of it as rewriting history is generally a Bad Idea™ in git, and the requirement to force-push is one of the primary reasons I hate the Gerrit workflow. Skizzerz 20:27, 21 September 2020 (UTC)Reply
I have never force-pushed with Gerrit. What are you talking about? Nikerabbit (talk) 07:40, 22 September 2020 (UTC)Reply
Force-push is the Gitlab equivalent of pushing amended patches for review in Gerrit, but yes to be most accurate we can call the Gerrit workflow "rewriting history". Adamw (talk) 07:50, 22 September 2020 (UTC)Reply
Even the merge request workflow is rewriting the history if you use the squash or rebase option. Nikerabbit (talk) 12:59, 22 September 2020 (UTC)Reply
The comment that the commit message is not a part of the merge-request is a good note.
I don’t see any way in the UI to compare a larger range of pushes, say PS2..PS5 (supposing that PS2 was the last patch set you looked at). In fact, so far I haven’t even found a way to do this by hand-editing the URL.
FWIW, I have that the changes tab has versions drop downs: https://gitlab-test.wmcloud.org/mediawiki/core/-/merge_requests/5/diffs?diff_id=19&start_sha=8c643bedd7f9e191c8041d3eee682075b4803040
Another example is a https://gitlab-test.wmcloud.org/release-engineering/tools-release/-/merge_requests/2/diffs?diff_id=11 where I added some code review to the mix.
I liked that the comparison between mainline and each revision retained comments on specific code lines even after a force-push. TCipriani (WMF) (talk) 22:14, 21 September 2020 (UTC)Reply
For me as a developer, this workflow question is the most important one.
I been playing around with the GitLab test instance today. I really like it... up until to the point I tried to get actual development done.
It is not practical to properly chunk out big changes properly for review with GitLab without history rewriting (i.e. force push). My requirements are:
  • chaining dependent changes together
  • independent iteration of those changes based on code review
  • approving and preferably merging changes independently
Gerrit deals with the problem by having append-only patchset versions and nice way to link them together (no force-push needed).
GitLab seems to primarily support doing it at merge-time. The UI has very limited support for it (only squash and rebase or merge commit). Full power of git rebase -i is needed. If we only allow history rewriting during merge-time, we have multiple issues: the person doing the merge now has the burden to rewrite the history, which in Gerrit is done by the author (except trivial rebases), and they must be able to push directly to the target branch (this would likely be disabled for master branch) and thirdly, the worst thing in my opinion, the result would not be reviewed. All this goes away if we allow history rewriting and force-push updates to the merge request (with sufficient UI support so that reviewers can do their job).
Things get even worse if we also want to independently merge the changes. This means they have to be split into multiple merge requests. I found no support for rebasing merge request chains. In my testing you either have to work backwards and merge latest merge request to earlier one first (frankly, nobody does that), or doing manual force-push for each merge request. Gerrit provides a handy button that does it for most of the simple cases. When you merge a first merge request from a chain in GitLab, the UI gives up. You can change the merge target manually (away from the now [hopefully] non-existing target branch) in the UI, but then, if you squashed or rebased the first merge request, it forces you do do a manual rebase and and force-push to update the subsequent merge requests.
GitLab seems a good solution for work that produces reasonable sized merge requests. However, the suitability of GitLab for me depends on its support for non-trivial history rewriting, which is required to develop and review work that consists of chained changes. Moreover, the only way to do that in GitLab is force-push to a merge request. So the suitability of GitLab essentially hinges on its support for force-push type of workflow.
Lack of a good support for such workflow would, in my opinion, lead to:
  • messy histories with irrelevant and untestable commits, and/or
  • slower development and code review burnout due to difficult to review mega-changes, and/or
  • slower development due to having to wait for author or code-reviewer to complete previous steps before being able to proceed. Nikerabbit (talk) 14:18, 22 September 2020 (UTC)Reply
As an update to my previous comment, the force-push workflow does look promising to me. Discussions and comments do not disappear as if they did not exist – they are just marked that they are based on an outdated version of the merge request. I kind of even like this more than Gerrit, where, if you need to iterate, it's unclear whether to continue the discussion in the old patchset, or start a new one in a new patchset. In GitLab you would logically continue the existing discussion until it is marked as resolved. There is even an option to require resolving all discussions before merging. Also the navigation through issues is better than in Gerrit, imho.
As pointed out above, there is an UI to compare different versions of merge requests. I didn't use it as much, so it is not clear whether it is as intuitive and powerful as Gerrit's version. For example, I am not sure yet whether it shows what differences are due to a rebase. I need to explore this more.
I currently think that I could adopt my workflow to GitLab, but I would really miss one feature from gerrit: the rebase button with an option to choose a target. Without it, working on chained merge requests is cumbersome. Nikerabbit (talk) 07:13, 23 September 2020 (UTC)Reply
I prefer chronological work branches. I think "commit --amend" is an awkward workflow because it leaves you without a local record of your change history. Storing a history of changes seems like a core feature of a VCS, but Gerrit seeks to emulate this with server-side comparison across patchsets. So I'd rather see forced pushes used only for rebases and niche applications rather than importing the complete Gerrit workflow into GitLab.
It's true that instead of having two levels of prospective change management (patchsets and topic branches) we will have one: potentially enormous merge requests of feature branches. I don't think that's infeasible. Simplifying the model in this way is what developers were asking for in the developer satisfaction survey. Tim Starling (talk) 00:36, 24 September 2020 (UTC)Reply
For what is worth, I 've mostly given up on chronological work branches. I just commit early and often and then very liberally and aggressively edit git history (mostly with interactive git rebase and heavy use squash/fixup/ammend) to produce a set of series of commits that just makes sense (to me) at least as logical unit. Then push for review.
One thing that I would need force push for would be if I realized I got some typo somewhere after creating the MR or realizing I 've made some bad decision when grouping the commits. Not the end of the world if I couldn't do it, but it would be nice to be able to fix them nonethless.
As far as topic branches go, I think that Gerrit topic branches can be in some cases (not sure how many) replaced by gitlab labels (at least I managed to do so in my limited test and for the workflow I was testing). AKosiaris (WMF) (talk) 08:48, 25 September 2020 (UTC)Reply
Pre-gerrit, I was strongly averse to the use of --amend and non-chronological commit history, but gerrit effectively changed my habit on that (once you start using --amend to fix problems based on review, you start using rebase more heavily if you a chain of gerrit patches and reordering commits now only feels normal and natural).
So, in the gerrit workflow, it makes sense, but I imagine once we transition to gitlab, we are more likely to move back to chronological commits (assuming a chain of commits in a pull-request are squashed pre-merge). But, anyway, I imagine we'll have to reconfigure our workflow habits in a gitlab world. SSastry (WMF) (talk) 15:12, 25 September 2020 (UTC)Reply

What are the major differences?

[edit]

Having never used GitLab, I have no idea how to decide whether it would be a good idea to use it. What features does it offer that Gerrit is lacking? Which features is it lacking that we use on Gerrit? What does it do differently?

A consultation should be about making an informed choice. I'm lacking information. I can of course go and google, but wouldn't it be good to have an overview here, so we can discuss it directly, and add to it as relevant things come up? Some kind of decision matrix would perhaps be useful.

I can find a couple of comparisons online, but non of them is talking about the things that would be most relevant to us. DKinzler (WMF) (talk) 08:42, 22 September 2020 (UTC)Reply

Hello @DKinzler (WMF), this is very good question which I wanted to ask also. :) Kizule (talk) 13:01, 22 September 2020 (UTC)Reply
@DKinzler (WMF) - there is a labs test instance currently set up at https://gitlab-test.wmcloud.org/ where many folks have been playing around, which might be helpful to you in assessing differences between gitlab and gerrit. It's a volatile instance though, so there is no expectation for any data to persist. Several features such as issue-tracking are also likely to be disabled for any Wikimedia installation - I believe the working group is attempting to compile these soon so as to help guide expectations. SBassett (WMF) (talk) 18:50, 22 September 2020 (UTC)Reply
@SBassett Thank you, I'll have a look. Still, a side-by-side overview would be extremely helpful for this consultation. DKinzler (WMF) (talk) 08:27, 23 September 2020 (UTC)Reply
+1 that it would be nice to see a feature comparison. Also, the test instance does not have any CI set up I believe, and CI UX is a pretty crucial factor - it is a big part of the first experience of new code contributors. Tgr (WMF) (talk) 06:46, 1 October 2020 (UTC)Reply

Also, the test instance does not have any CI set up I believe

On that one, see the .gitlab-ci.yml on this merge request for a rough working example on mediawiki/core. There's a job runner instance configured and users should be able to define CI pipelines for any project using docker images from the WMF registry. BBearnes (WMF) (talk) 08:19, 1 October 2020 (UTC)Reply
There are a few drivers listed on the GitLab consultation page, roughly that boils down to:
easiness to create new project
In Gitlab, as soon as you are connected there is a nice shinny button that lets you create a new project. It can be placed under your personal namespace or one of your groups namespaces. https://gitlab-test.wmcloud.org/projects/new , as `hashar` and being a member of the `release-engineering` group, I can create a new `wikioid` project such as:
https://gitlab-test.wmcloud.org/hashar/wikioid
https://gitlab-test.wmcloud.org/release-engineering/wikioid
The first will be managed by myself, the second by any member of the group. The access list is set on creation.
In Gerrit, creating a project is a global capability. It lets ones create a project anywhere in the hierarchy, for example under another team hierarchy. It can't really be given to anyone since the hierarchy is shared by everyone unlike Gitlab which namespace it per person/group. So essentially Gerrit platform is shepherd by a restricted group, which is very typical in the corporate world. Our process is:
The fix in Gerrit would be to change the project creation capability to be namespace based instead of global. Possibly mapping hierarchies to groups and personal users. So theoretically if I am in the LDAP group `releng` I would be granted the right to create a project under `/releng/` and additionally under a new hierarchy such as `user/hashar/`. And that would address that specific concern.
easier setup and self-service of Continuous Integration configuration
Also known as self serve CI. Similar to how on Github you can add Travis integration and immediately benefit from CI, on our infrastructure it is shepherd just like for Gerrit projects creation. The CI configuration is done independently from Gerrit, it relies on some [ https://www.mediawiki.org/wiki/Continuous_integration/Entry_points standardized entry points] such as running npm test.
Which mean that to benefit from CI you need to first know it exists, reach out to the proper people (release engineering) and get it configured. An advantage though is that the maintenance of CI is babysitted by a team and it is more or less consistent across repositories. That is especially true for the MediaWiki core, extensions and skins deployed to Wikimedia production for which we enforce a set of rules and do not let developers deviate from it.
We at some point had an ambitious plan to overhaul our current CI:
  • One of the outcome is the [ https://wikitech.wikimedia.org/wiki/Deployment_pipeline deployment pipeline] which still r equires central configuration and initial setup for the entry point (running Blubber to craft container) and is geared toward automatically packaging a repository as a Docker container we can then deploy.
  • The other intent was to shift the CI workload from WMCS instances to a Kubernetes cluster. The primary reason is the workload is a bit challenging to the WMCS infrastructure, it often comes in spikes, is CPU intensive and sometime caused the infra to crawl to its knees. There are other limitations here and there as well. Eventually that got de-prioritized in favor of another project, there is only a limited amount of things you can do at a point in time given our limited resources.
  • The Zuul version we use is dated, the next major one does come with self serve CI. I can't remember exactly why I haven't got the upgrade prioritized, most probably I wanted to have the new Zuul to be fully based on Kubernetes and when that plan fall down, the Zuul upgrade went with it.
workflow familiarity
For the old timers that have been used to Gerrit for years and years, it comes to a second nature. Amending commits, having the git hook setup and pushing to `refs/for/<branch>` is trivial enough once you get in that stance.
Wikimedia employees and contractors are hopefully onboarded by their new teammates that would help them get on the rails and explains the basics of Gerrit. There are a few that would rather stick to Github since they are familiar with or because they never got trained to use Gerrit in the first place. For outsiders, that is a bit more challenging, surely we have plenty of documentation available be it Gerrit own documentation https://gerrit.wikimedia.org/r/Documentation/intro-user.html or our tutorials at https://www.mediawiki.org/wiki/Gerrit
The GitHub/Gitlab and Gerrit workflows are exactly the same on the functional level: get one or more commits send in a staging area, get reviews, modify your commits, get them approved and ultimately merged to make them available to others.
The implementations though are "slightly" different.
In gitlab/github you fork a repository, clone it, send your commits in a branch then head to the web interface to emit a merge request to the upstream repository. One is on its own personal space with whatever branches they want and until the code is ready it is essentially isolated.
Code updates are done in your local branch which get pushed to your forked repository and that ultimately update the merge requests.
The way your commits are associated with a merge request is by using the quadruplet made of:
  • your forked repository
  • your branch in the forked repository
  • the target repository
  • the branch in the target repository
In Gerrit, you clone the upstream repository, craft your commits . You send them directly to the upstream repository (instead of your fork) targeting a special reference which has the target branch: refs/for/master. That creates a change for each commit and they are directly in the upstream repository.
Code updates are done by retrieving the change (or series of change), amending and sending again to the same special reference (refs/for/master).
The way commits are associated with changes uses a triplet made of:
  • The repository
  • The target branch
  • The Change-Id meta header
Gerrit merely just skip the need to fork the repository and your personal changes / work in progress are effectively shared in the same repository as upstream.
The two workflows are really the same. The Gerrit one is just a bit more intimidating when you comes from Github/Gitlab. Gerrit has an advantage: it deals mostly with individual commits, its drawback is that retrieving a series of change might prove to be difficult when some commits in the series got updated. Where as in Github/gitlab they are grouped in a single branch and afaik commits can't be individually changed outside of the branch.
A note is that you can also create branches on repositories for anything that requires a significant amount of work. That is used in some cases but is certainly not generalized. Antoine "hashar" Musso (talk) 08:24, 1 October 2020 (UTC)Reply
I only took the few entries that are listed at GitLab consultation. There are obviously a lot more differences.
Gitlab is essentially a clone of Github and offers a full suite of development tooling such as hosting releases, issue tracker, CI, wiki, spin a live environment based on a branch, design assets, webhosting (like github pages).
Gerrit only covers git hosting and code reviewing. It allows extremely fine permission settings, has a rather simple infrastructure: one big java process, some managed caches and Lucene indices, git repositories. Antoine "hashar" Musso (talk) 08:41, 1 October 2020 (UTC)Reply

In gitlab/github you fork a repository, clone it, send your commits in a branch then head to the web interface to emit a merge request to the upstream repository. One is on its own personal space with whatever branches they want and until the code is ready it is essentially isolated.

Code updates are done in your local branch which get pushed to your forked repository and that ultimately update the merge requests.It's incomplete, but note the docs at GitLab consultation/Workflows - we'd probably adopt a model similar to that used by KDE and many projects on GitHub, where regular contributors are able to create branches directly on the mainline repository and thus skip the forking step for most work. KDE's convention of branches named like work/user/feature-name seems a good one. BBearnes (WMF) (talk) 17:01, 1 October 2020 (UTC)Reply
Is the idea that we would start using feature branches that contain weeks and months of work, and then need to be consolidated with the main line?
I was very happy to see that we were slowly moving to smaller and smaller patches going directly into core. I'd hate to see that trend reversed.
On that note - what about code review / approval? Can we keep the refine -> approve -> check -> merge workflow? DKinzler (WMF) (talk) 18:44, 1 October 2020 (UTC)Reply

Workflow for translation updates

[edit]

Current translation updates in Gerrit have one benefit and one drawback. The benefit is that they are merged automatically. The drawback is that they skip most of the tests and such cause merge blockers.

Currently Gerrit requires explicit +2 to have a change merged (if it passes tests). In GitLab this is rather the other way around: you can require tests to pass first, then you can merge (if you have sufficient access). To prevent merging unwanted code to the master branch, I assume we will be enabling branch protections to disallow direct push to the branch. This means that we would not be able to push translation updates directly anymore.

I think GitLab allows to set exceptions per-repo level, but I do not think that is sustainable: extra work to configure each repository, and we could end up with different repos doing it in different ways, making configuration on translatewiki.net side more complicated.

Is there way to set such exceptions globally, or perhaps per group level?

The other option would be that translation updates are send as merge requests, and merged immediately if tests pass. This avoids the drawback of the current system of skipping tests. I also think it's possible to give the translation updater account global permissions to do that across repos, and I saw GitLab even provides a merge_request.merge_when_pipeline_succeeds option to easily do that. I think this would be the best workflow for translation updates. Nikerabbit (talk) 14:57, 22 September 2020 (UTC)Reply

I agree that running localisation updates through the regular merge process would be a good idea; we don't with the current architecture to avoid too much extra load on the CI servers, at the cost of the occasional test-violation pain, as you say. Switching to just "normal" C+2/gate is something we could do now, and is the easiest thing for the new GitLab world, I imagine. Jdforrester (WMF) (talk) 09:58, 23 September 2020 (UTC)Reply

Lower friction to create new repositories in Gitlab

[edit]

Just to be clear, we are referring here to the fact that new developers will be creating personal new repos and issue PRs to the group repos, right?


Cause I am guessing that group repos (which will be the "canonical" version) creation process will still be limited to people with rights to do so (Which these people are going to be and how we avoid ending up with the current situation with barely a couple of people tending to repo creation is going to be an interesting discussion in its own right, I guess) AKosiaris (WMF) (talk) 16:14, 22 September 2020 (UTC)Reply

Metrics for new contributors "acquisition"?

[edit]

Since the issue of Gerrit being a barrier to entry for new contributors is being raised repeatedly in this discussion, I think we would benefit from having some long term metrics for understanding whether the move to Gitlab actually helped or not. I say this cause we could very well have structural problems in our movement that make it difficult for people to contribute (lack of code reviews comes to mind) and we end up in a similar situation in the future despite the transition. What those metrics could be is in interesting discussion. Off the top of my head I 'd say

  • Rate of new contributors (with a twist for an extra metric counting only non-WMF, perhaps even non-WMDE members)
  • Number of active contributors (with > X PRs over a period of time, again perhaps with an extra metric for thost with non-WMF, non-WMDE affilication)
  • Number of very active contributors (same as above, just Y > X)
  • Number of drive by contributors ( < X PRs over a period of time)


I am guessing people can suggest others as well. AKosiaris (WMF) (talk) 16:22, 22 September 2020 (UTC)Reply

IMHO this would be in scope for Community metrics - there are currently some retention boards in the making for Gerrit (see phab:T254566, review welcome). Would have to enable and configure the GitLab backend once some GitLab instance is being used. AKlapper (WMF) (talk) 17:03, 22 September 2020 (UTC)Reply
Noteworthy that some metrics are being collected/worked on as part of tuning sessions: this page shows %change in # of independent developers; code review times change; change in # of outstanding code reviews. This page lists additional metrics: time to first merge, time to first review, average time to merge, average time to review, and cycle time. Source for whatever exists now is Community metrics as @AKlapper (WMF) mentions. TCipriani (WMF) (talk) 17:50, 22 September 2020 (UTC)Reply

Gitlab's community edition relies on nonfree proprietary software to combat spam & abuse

[edit]
It relies on the proprietary Akismet and Google's recaptcha. It is a known target for spammers. Without turning those on, it will quickly be overloaded with spam The main page mentions "GitLab is a system used successfully by many other members of the Free Software community (Debian, freedesktop.org, KDE, and GNOME)." freedesktop.org and debian turned on recatpcha, their instance cannot be used in freedom, it requires users to run proprietary google code. KDE and GNOME don't allow user registration. I've looked around, and there is no instance that runs the community edition and is open to the public for general use other than gitlab.com (which is running a proprietary version). It has been this way for several years. Gitlab has made lip service toward at least removing recaptcha, but so far has done nothing. It also optionally "integrates" each repo with over 10 different nonfree programs or services, "settings, integrations", so unless you trusted all your users to avoid using those, you would need to patch the software to use it in freedom. So, where the main page says "it adheres to the foundation's guiding principle of Freedom and open source", I don't think that is correct.
Then you have what some might consider more minor issues: People who want to contribute will have to do it upstream and run nonfree recaptcha to register, and they will have to do it in a repo containing all the nonfree code and make sure their contribution fits in with the nonfree parts of gitlab. They only have 1 version of the documentation, it includes the docs for all their nonfree features. Most instances of gitlab use nonfree code (including gitlab.com, debian and freedesktop.org), so calling your instance a gitlab instance would have an effect of promoting gitlab and proprietary software use. Gitlab's new repo license recommendation UI are at odds with the FSF's recommendations: see https://libreplanet.org/wiki/FSF_2020_forge_evaluation. Ian Kelling (talk) 22:46, 22 September 2020 (UTC)Reply
> Most instances of gitlab use nonfree code (including gitlab.com, debian and freedesktop.org), so calling your instance a gitlab instance would have an effect of promoting gitlab and proprietary software use. Gitlab's new repo license recommendation UI are at odds with the FSF's recommendations: see https://libreplanet.org/wiki/FSF_2020_forge_evaluation.
Hello Ian. I have looked at instances for Debian ( https://salsa.debian.org/help ), KDE ( https://invent.kde.org/help ) and Gnome ( https://gitlab.gnome.org/help ), they all list the community edition. Do you have any hints as whether they are using nonfree code or was that referring solely to recaptcha? We would mostly certainly not use that :) Antoine "hashar" Musso (talk) 13:42, 1 October 2020 (UTC)Reply
> Do you have any hints as whether they are using nonfree code or was that referring solely to recaptcha?
All I can see is the nonfree captcha. Hopefully that is all. All the gitlab "integrations" that call out to other nonfree services are still available for their users to use. Ian Kelling (talk) 05:17, 15 October 2020 (UTC)Reply
These issues were raised in the thread Talk:GitLab/2020 consultation#h-Software_freedom-2020-09-03T05:07:00.000Z. Recaptcha is not going to be enabled if we setup a gitlab instance. Nikerabbit (talk) 06:44, 23 September 2020 (UTC)Reply
Indeed all past migrations of big projects to GitLab have been a failure for software freedom so far. If we manage to keep the service running properly without proprietary software, we'll be a first. It might be possible but it will require a big investment. Nemo 08:22, 23 September 2020 (UTC)Reply
As discussed elsewhere (e.g. Talk:GitLab/2020 consultation#h-Improving_Gerrit-2020-09-18T18:14:00.000Z we'd keep using our own SSO system so at least login captchas are not a concern. (Captchas for rate throttling, maybe. But then Gerrit doesn't have anything like that, so it won't be worse than the status quo.) Tgr (WMF) (talk) 06:44, 1 October 2020 (UTC)Reply

Zuul

[edit]

Gerrit is much less complicated for me, but Gerrit will be replaced with GitLab, even if many peoples don't think that this should be done (I believe).. Eh, now, will we use and still Zuul/Jenkins for CI tests, as GitLab have this features? Kizule (talk) 03:30, 23 September 2020 (UTC)Reply

A big part of the motivation in moving to GitLab is to replace the current Zuul/Jenkins infrastructure with GitLab CI, yes.
GitLab CI is a really great system, and is one of the things I'm really positive about. Jdforrester (WMF) (talk) 10:02, 23 September 2020 (UTC)Reply
@Jdforrester (WMF) - I'm not certain that assumption is true. Not that GitLab CI is really nice - it is - but that we would be using it within an initial deployment of GitLab. SBassett (WMF) (talk) 14:58, 23 September 2020 (UTC)Reply
One of the three main grounds listed is

easier setup and self-service of Continuous Integration configuration

… so I assumed that was still the case, but fair point. It indeed later says:

For the avoidance of doubt, continuous integration (CI) and task/project tracking are out of scope of this evaluation.

Jdforrester (WMF) (talk) 15:48, 23 September 2020 (UTC)Reply
To reduce the cost (time) of migration, we do not plan to write CI-glue-ware between GitLab and Zuul/Jenkins and instead use GitLab's built in CI for any repositories moving to GitLab for code review. Greg (WMF) (talk) 16:26, 23 September 2020 (UTC)Reply
Thanks for responds! I'm just playing with GitLab on test instance, so I honestly changed my mind, and I agree that this should be done, and it's great that most things will be in one place. Kizule (talk) 18:22, 23 September 2020 (UTC)Reply

The Heffalump in the room

[edit]

Are Gerrit's interface usability deficits the real reason for the dissatisfaction with Gerrit?

There is something about Gerrit that I haven't seen discussed explicitly here, and I'd like to mention: it's the alienation between Gerrit and the wikis.

For most wiki editors, their home is wiki pages. Although it's managed by the same foundation and has "wikimedia" in its URL, Gerrit is a different website. It has a completely different interface from the wikis, even before we discuss whether this interface is good or bad. It requires a separate username, and to send even one line of code, you need to set up SSH keys, install, configure, and run lots of command line tools, clone the repo, and then, if all of that wasn't difficult enough, you have to go through the worst part: code review, which is often infinite.

And that's why many editors on Wikimedia wikis don't even want to hear about Gerrit: it's not a wiki. At Wikimania 2011, there was a talk about The Site Architecture You Can Edit, and as far as I can recall, Gerrit is the thing that was presented as the solution. Did it actually allow a site architecture that anyone can edit? Not really. It probably made it more accessible, but not really for everyone. On a wiki, editors can publish a template, a module, or a gadget, and it gets deployed immediately. Sure, they can break stuff along the way, and they can get reverted, but it's incomparably faster and more familiar than going through Gerrit.

By itself, migration to GitLab doesn't solve any of that.

And sure, GitLab is more popular, and it looks much more similar to GitHub, which is the most popular thing, so it will probably be more welcoming to experienced developers. It's also quite possible that GitLab is easier to integrate with CI tools. The problem is that these are things that interest people who are already experienced with the Git and deployment work, but by themselves they won't make it much easier for members of the editing community to update the code or the server configuration, and the gap will remain, with content editors and developers of on-wiki templates, modules, and gadgets on one side, and developers of core, extensions, and site configuration on the other side. There are some people who participate on both sides, but not enough.

Moving all PHP and JavaScript code to being stored on wiki pages is probably not the solution, and I'm not proposing it. Integrating version control with wiki pages, as we had in the SVN era, is probably not so important either. But changing the deployment policies and allowing the deployment of at least some code much more quickly and with less strict code review, or with no code review at all, can go a long way to making Wikimedia projects editors feel that it's worth the effort to learn these tools and commit code. Learning Git and everything else and then making a code commit that will get forever stuck in code review is not very attractive.

This change will probably have to be more social and administrative than technical. Can such a change happen along with the move to GitLab? Amir E. Aharoni {{🌎🌍🌏}} 09:09, 23 September 2020 (UTC)Reply

Hi @Amire80, thanks for joining the conversation.
I guess every tool and workflow has its pros and cons, and it's unfortunately a bit unclear to me what you might propose.
Personally speaking, trying to lower the learning curve and obstacles is something I'm interested in. We have quite some areas (Toolforge, Pywikibot scripts, modules, templates, gadgets, user scripts) in which folks can already immediately see results; with the costs being no code review and no quality assurance.
Regarding the current situation, https://meta.wikimedia.org/wiki/Small_wiki_toolkits intends to provide all info needed if you are technically interested and want to help your communities plus broaden your skills. If you spot something missing, please join the discussion there.
Editing wiki pages is for sure "incomparably faster", however being fast also needs to be in balance with the goal to provide a stable working website. As you already wrote, some editors can edit a gadget or site js and the immediate deployment "can break stuff along the way". Numerous times, default on-wiki code has loaded external stuff and hence violating user privacy. There is a number of tickets in Phabricator about broken gadgets after some other folks had copied code (instead of loading it) from some other wiki many years ago and then it rot, so it broke at some point, etc., and folks don't know how to fix it. When I last checked in June 2019 in phab:P8687, out of those 509 public wikis which had gadgets and/or a page MediaWiki:Common.js, 242 wikis had zero people editing these files in the last 12 months. (Random examples: fa.wikiquote.org had 56 gadgets and zero editors on them. ur.wikipedia.org had 235 gadgets and four editors on them.) What I want to say here is probably: Making it easier for folks to deploy more and more code while a lack of maintenance might appear at some point doesn't solve a problem in the long run, to the contrary. And that a lower barrier for deploying code might solve some problems but also creates new ones.
https://integration.wikimedia.org/ci/job/audit-resources/ would also list dozens of broken MediaWiki:Common.js files across our wikis if that job wasn't currently broken. (For background, see phab:T71519.)
Tickets like phab:T71445 were created to discuss a code-review process for MediaWiki JS/CSS pages on Wikimedia sites. In my very personal understanding such things need to happen first before thinking in potential "on-wiki vs Git repos" terms (if I understood your post correctly, I'm not sure).
As you mentioned "allowing the deployment of at least some code much more quickly and with less strict code review" I'm wondering if you have some thoughts or changes in mind about wikitech:Backport windows.
Also, could you provide a link to a specific example of a code change which you think could have taken place with "no code review at all"? AKlapper (WMF) (talk) 10:40, 23 September 2020 (UTC)Reply
Let's start with the gadgets that aren't edited by anybody: They aren't edited by anybody for one of the following two reasons:
  1. Because they are stable and don't need any edits.
  2. Because on that wiki there is nobody who has the skills to edit them.
If it's number two, this means that they were copied from another wiki. Let's follow your example, and think of MediaWiki:Gadget-HotCat.js in fa.wikiquote: Somebody at some point thought that fa.wikiquote needs that gadget, and copied it. Why did HotCat have to be copied, while RevisionSlider didn't have to be copied, and works everywhere? Because HotCat is a gadget and RevisionSlider is an extension. No one "edits" RevisionSlider on any wiki, neither fa.wikiquote nor de.wikipedia, because it's an extension. There's nothing fundamentally different between RevisionSlider and HotCat—both provide some extra functionality to users. They are different only in how their code is stored and deployed.
If we had a global repository of gadgets, HotCat could be stored globally, and automatically deployed to fa.wikiquote. No one would have to edit the gadget on fa.wikiquote.
But you know what's the craziest part? In practice, we already have a global repository of gadgets. Gadgets can be stored on another wiki, e.g. Commons, and loaded using mw.loader.load() and a URL, as it is already done with HotCat and a few other gadgets. So, HotCat is a huge gadget, with thousands of lines of code and tens of thousands of users on the English Wikipedia alone, and many thousands more in other wikis. Technically, any Commons admin, of which there are more than 200, can edit it without any more code review. I heard that in the particular case of HotCat there's a convention that every edit is supposed to be checked by someone, but this is a social convention and not a technical constraint. And for some gadgets there is no such convention, and they can just be edited. And there are even some user scripts that aren't stored in the MediaWiki space, so anyone can edit them, and not just admins.
Why is HotCat a gadget and not an extension? No particular reason. Just history. I asked some people involved in its development, and they all said that it can be converted to an extension, and no one bothered to do it. And probably no one really feels that it's necessary, because the current way to develop it, as a wiki page, is good enough. Besides, converting it to an extension has some disadvantages: code review will get harder and changes that do pass code review will be deployed after a week and not immediately.
You also asked about Backport windows (formerly known as SWAT). Backport windows are not fun. You have to be at a certain (online) place at a certain time, connect to IRC (IRC!!!!!! In 2020!!!!!!), wait for your turn, and then wait a few minutes more for scap and all that stuff to run (at least that's how it was a few months ago, last time I did it; maybe things changed since then). It's far, far more difficult than editing a wiki page and pushing "Publish".
Does bad code sneak into gadgets sometimes? Yes, it does. If it was intentional and malicious, the user who did it is usually desysopped or blocked. But bad things can also sneak into code on Gerrit. Does it happen much more often with gadgets than with Gerrit? I honestly don't know, I didn't count. But consider this: reverting bad code in gadgets is actually much easier and faster than reverting code that is deployed through Gerrit—yet again, you just edit a wiki page and push "Publish". No code review, no scheduling a backport window, no waiting for scap. In any case, what's really important is that the possibility of breaking things shouldn't be the top reason to stop every conversation that compares the easiness of deploying code immediately to the difficulty of going through long code review and waiting for several days to deploy.
There is a lot of code that could be deployed immediately after merging. Dozens of extensions could be like that. Just looking at Special:Version: CodeEditor, WikiEditor, Score, SyntaxHighlight, WikiHiero, TimedMediaHandler, WikimediaMessages, and many others. Some of them are more complex than HotCat, but not by orders of magnitude. I can even imagine some parts of more complex extensions, such as VisualEditor, ContentTranslation, or Echo, that could be deployed more quickly.
So, my particular proposals:
  • Reduce the deployment train frequency from once a week to once a day or even more.
  • Change some repositories' configuration so that their code would be deployed immediately after merging, without waiting for train or backport window.
  • Give merge rights to many more people who are trusted editors on Wikimedia projects, at least on some repositories.
  • Allow self-merging in some repos (both technically and socially).
Most of these things will require some development work, but for all of them, someone first needs to decide that they are desirable. I mean, someone decided about eight years ago that code will be deployed every week; is that decision still good in 2020? Who can decide that it should be more frequent?
And most of these things can probably be done with Gerrit, without waiting for the migration to GitLab. As I mention already, I totally realize that these thoughts are not really about the migration from Gerrit to GitLab, but about something else. They are about rules and power structures. However, I am bringing them up here because I want to draw everyone's attention to the notion that by itself migration to GitLab will at most resolve some technical issues that the CI people have, and it won't resolve the bigger issues that the Wikimedia editors' community has with the way we manage and deploy source code. Amir E. Aharoni {{🌎🌍🌏}} 13:18, 23 September 2020 (UTC)Reply
Editing code in GitLab via the web interface is actually a very pleasant experience. There is a "web IDE" (a single-page web app with an IDE-like arrangement, file browser, file tabs, edit + diff mode - [1], [2]) which is not all that dissimilar from the wiki editing experience, other than you can edit multiple files at the same time. The only bad impression I had is that creating your own fork is slow for large repos (cloning MediaWiki core took 5+ min, which makes me wonder whether there will be a resource problem when thousands of people do it), otherwise it was great. (Gerrit also has a web editor, but it's anything but great.) If we wanted to engage wiki editors more, or even just for low-effort projects like Google Code-in, it would be quite useful.
(IMO the reason gadgets are not edited much is the lack of any kind of sane testing and review workflow. Providing that workflow on GitLab seems viable to me and preferable to the current on-wiki workflow. There has been some discussion on that in T187749.) Tgr (WMF) (talk) 07:13, 1 October 2020 (UTC)Reply
Hi Amire. [ https://wikimania2011.wikimedia.org/wiki/Submissions/The_Site_Architecture_You_Can_Edit The Site Architecture You Can Edit] presentation has slides published on wikitech: https://upload.wikimedia.org/wikipedia/labs/f/f0/The_Site_Architecture_You_Can_Edit.pdf
What Ryan Lane presented was to open up the production configuration management to the world. That resulted in:
  • publishing the Puppet files (until that they were only on the puppet master)
  • OpenStack being configured through MediaWiki (the OpenStackManager extension)
  • ultimately started the deployent-prep project which at the start was entirely volunteer driven ( https://en.wikipedia.beta.wmflabs.org/ )
Gerrit was really just a little brick in the proposal and the intent was indeed to have the same account as used on Wikitech. So in that regard, the wiki account is the same. Note that all that comes even before we unified accounts on the wikipedia project (which was done in 2015), before that you had one account per wiki project!
The perspective is that if you want to act on the infrastructure or code, the portal is https://wikitech.wikimedia.org/ which brings you access to the WMCS infrastructure and Gerrit with the same login and ssh keys. So that is at least consistent on that little island. Whether to unify with the SUL wiki account is indeed an entirely different issue and using Gitlab or whatever is not going to solve it.
Solving the code review delay is not going to be solved by any amount of tooling. It all boils down to social responsibility, clear ownership of code and making code review an actual priority. Even inside a Wikimedia team it can be troublesome to get code reviewed, and that is with pairs we interact with on a daily basis.
So essentially I agree, changing the tooling is not going to magically fix the code review issue. Then that consultation is merely about filing technical gap in the current tooling. Antoine "hashar" Musso (talk) 07:39, 1 October 2020 (UTC)Reply

Workflow documentation / tutorial

[edit]

Based partly on docs from other projects (e.g. KDE) and on discussion here, we've started roughing out some documentation at GitLab consultation/Workflows. Aiming for something similar to Gerrit/Tutorial with more workflow guidance. It needs much more work, but figured I should link what's there now. BBearnes (WMF) (talk) 21:38, 25 September 2020 (UTC)Reply

Patches on Gerrit

[edit]

I have another question for case it is decided to switch from Gerrit on GitLab... What we will do with patches on Gerrit which aren't merged? Kizule (talk) 22:21, 26 September 2020 (UTC)Reply

First, there would be a period of time where Gerrit would still be online but GitLab would be the canonical location for a repository. So any active developer will have the chance to migrate their open changes to GitLab if they haven't already done so.
We would also plan to provide a read-only archive of Gerrit to prevent the breakage of URLs (see the "Breaking Links?" topic on this). This would also preserve the open changes so for developers who are no longer active (or only active on a yearly+ basis) we can resurrect their changes as needed. Greg (WMF) (talk) 03:06, 27 September 2020 (UTC)Reply

Feature requests

[edit]
This is summary of Talk:Wikimedia Release Engineering Team/GitLab#h-Feature_requests-2020-07-01T15:48:00.000Z which has been linked to in this discussion, but I would like to be formally part of the consultation:
"Here are some features of Gerrit our team use regularly that we would very much like to see in any future system:
  • Being able to easily create a stack of dependent commits, each one representing a single commit in the final tree, and being able to modify those commits, and review changes between those modifications.
  • Being able to easily re-order that stack using an interactive rebase
  • Creating cross-repo dependencies using the "Depends-On" tag or similar."
Some of the +1's:
  • EBernhardson (WMF): "I extensively use the ability to create a stack of commits that are separately reviewed, and re-ordering that stack is not so uncommon. My experience attempting this sort of workflow in GitHub (no GitLab experience, but it seems pretty similar) was pretty bad"
  • GLavagetto (WMF): "SRE tend to break down important changes in subsequent patchsets to better control application of a change (so for instance - first on one server, then on the canary pool, then everywhere). If we lose the ability to do so with ease, and to rebase/resubmit those stacks of changes, our workflow will need to change radically compared to what we have today."
  • ArielGlenn: "when I'm looking back through puppet or MediaWiki core or other repos, small commits are easier to eyeball, search through the commit history, etc to find the source of a specific change or narrow down when a particular behavior changed. And especially in the case of commit messages, I look at the related patchset to be sure the code does what the change says, or to understand what is being said; that's much more onerous with a thousand line squashed commit."
And these follow ups:
  • Roan Kattouw (WMF) "This means any solution that addresses the "stack of multiple commits" use case also needs to address the "amendments in response to code review" use case. You also need to be able to rebase a change onto an update version of the change it depends on. I suppose it's possible that some of these things are already supported, and support can be added for the others, but I'm somewhat skeptical that this will work as well as it does in Gerrit."
  • Roan Kattouw (WMF) "whether you end up merging one big change or several smaller ones also impacts what the history looks like. That's not just a cosmetic thing, it has real-life impacts on how useful tools like blame, bisect and revert are." ESanders (WMF) (talk) 14:03, 28 September 2020 (UTC)Reply
  • Being able to easily create a stack of dependent commits, each one representing a single commit in the final tree, and being able to modify those commits, and review changes between those modifications.
  • Being able to easily re-order that stack using an interactive rebase
Merge requests create a stack of independent commits, you can do git rebase -i locally to reorder them, if you don't squash on merge then each can be a commit in the final tree, you can also force push a merge request branch and still be able to view the modifications between revisions as in a quick example I made that has 1 commit, but 3 revisions.
  • Creating cross-repo dependencies using the "Depends-On" tag or similar.
Is this to prevent accidently merging dependent patchsets? To group patchsets? My main use-case is to prevent accidental merges. I find Gerrit's UI for this confusing. TCipriani (WMF) (talk) 16:01, 28 September 2020 (UTC)Reply
"Merge requests create a stack of independent commits, you can do git rebase -i locally to reorder them"
Two things:
  • This only works within a single pull request, you can't easily:
    • Rebase onto commits in another pull request (often authored by someone else in our use cases, e.g. developer A will write the API, and developer B will implement the frontend, dependent on developer A's patch)
    • Merge a sub-section of the commits. I will often separate a stack and put it in order of how easily it is to merge, e.g. "Doc fixes" > "Minor refactor" > "Change API slightly" > "Implement X feature". The first 3 can be merged, tested & QA'd separately and probably sooner, than the more complex last commit.
  • As mentioned elsewhere, the tools in the web UI are lacking for:
> Is this to prevent accidently merging dependent patchsets? To group patchsets? My main use-case is to prevent accidental merges. I find Gerrit's UI for this confusing.
Something that happens fairly often for Growth team:
  • We are building a user-facing feature that the user interacts with on the mobile site. Sometimes, that requires us making a patch to Extension:MobileFrontend to add new functionality, another patch to Skin:Minerva Neue to load that functionality in the front-end skin code, and then a third patch in Extension:GrowthExperiments that needs the code from MobileFrontend and Minerva to work. The code reviewer might +2 to the GrowthExperiments patch but need changes on the MobileFrontend one, or the MinervaNeue one.
  • More straightforward: adding some functionality to core (for example a convenience method to the MediaWikiTestingTrait) and then using that method in new tests added to a GrowthExperiments patch. And then that GrowthExperiments patch might be the first refactoring patch done in advance of adding new functionality (so those two patches are stacked, and only the first explicitly depends on the patch added to core) KHarlan (WMF) (talk) 19:24, 29 September 2020 (UTC)Reply
Same here. We also regularly use Depends-On for dependent patches between VisualEditor and various extensions that implement VE modules (Cite, Citoid, Math, Syntax_Highlight, etc.), also other extensions that implement VE surfaces (DiscussionTools, MobileFrontend, ContentTranslation), and also sometimes upstream in mediawiki-core or skins. ESanders (WMF) (talk) 13:40, 30 September 2020 (UTC)Reply
@Arlolra, @Cscott, and I often have this workflow that Ed laid out.
Ex: https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/628910 and the chain it is part of. The individual patches in that chain have seen many changes, And, I ended up re-ordering patches, squashing 2 of them (causing me to abandon this patch), and later on added a new patch into that chain. At this point, one of the patches get merged after I moved it to the head of the chain. So compare that dependency chain with https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/628908/4 which reflects the status as of Friday evening.
I personally don't end up with cross-repo dependencies via Depends-On tag but I know Scott does rely on that feature a lot more (as he works across Core, Parsoid, and various extension repos, sometimes CI repos). SSastry (WMF) (talk) 14:49, 28 September 2020 (UTC)Reply

Voting on Gitlab

[edit]

Voting is part of almost every project's workflow. We have the "Verified" votes, where a +2 might have a hook to automatically merge a change, and we have the code-review votes, where a -2 and a -1 have a special meaning and can prevent a merge. Are there ideas how we can replicate that behaviour in Gitlab CE? After today's workgroup meeting, so far there are two features that are similar, in a way, but still do not provide that exact functionality:

  • Resolve all discussions before merging
  • Thumbs up, thumbs down, although those do not seem to be binding when it comes to merging

Something similar is available in Gitlab CE though: https://docs.gitlab.com/ee/user/project/merge_requests/merge_request_approvals.html. I think it is worth looking into it. Effie Mouzeli (WMF) (talk) 17:21, 28 September 2020 (UTC)Reply

In Gitlab core (the open source version) users with the developer permission (or greater) approve the merge requests. The permission also grant access to a wild range of actions (such as force push, creating a releasee, changing tags ..). So it is a bit of an all or nothing separation, at least I haven't seen a way to create custom groups with different permissions.
The idea is thus that people people would review the merge requests using the chats and thumbs up emoji. Eventually it will reaches a good state and the merge request can be assigned to members of the developers that will ultimately approve it. Not sure how that behaves when a merge request is updated, I guess a developer would have to look again at the comments to figure out whether it is ready to go.
The approvals are available in Gitlab starter which is proprietary (although source are readable) and subject to a license. That rules them out.
At least the next Gitlab core version will have an approve button available to developers to mark their approval without having it merged. For operations/puppet, that would be the equivalent of voting code-review+2 which unlocks the right to vote verified +2 and ultimately submit the change. https://gitlab.com/gitlab-org/gitlab/-/issues/27426
The required approvals will NOT be backported to Gitlab core, its primary audience is a single user for which it probably does not make much sense to require a self approval. https://gitlab.com/gitlab-org/gitlab/-/issues/20696#note_335594095
Additional references:
(edit cause I missed linking to gitlab issues referring to approval and managed approvals) Antoine "hashar" Musso (talk) 20:20, 28 September 2020 (UTC)Reply

Notes from Product Analytics

[edit]

Hello, I'm writing on behalf of Product Analytics. From our discussion:

  • THE biggest differentiator from Gerrit for us is GitHub's ability to render Jupyter notebooks (example); GitLab can do this and we just want to make sure that this feature is enabled (and maybe coupled with an internally-hosted nbviewer service for the actual rendering).
  • We frequently need to read and search code, and Gerrit has extremely poor support for this. Many of us use GitHub to search the mirrored repositories.
  • We have generally chosen to use GitHub for our code/analysis repositories since we find it much easier to use, and creating repositories is much easier (since we can do it ourselves without requesting).
  • Conversations on Gerrit can be difficult to navigate since comments are tied to specific patchsets, so there may be an active discussing happening about something in patchset 3 meanwhile the patch is already on patchset 9. If CR in GitLab is similar to GitHub (in terms of how comments/conversations happen & are displayed) that is nice.
  • In the past we've used GitHub Pages for sharing reports. For example, if generating an HTML document from the R Markdown source document where the analysis is done it's easy to enable GH Pages to have the "rendered" version of the report available via URL (example); GitLab appears to also have this feature and we'd like it available if possible.

From my own perspective, as author & maintainer of several R packages the team uses in our workflows, GitLab's support for CI for R packages (more info) is very appealing. There have been efforts made in the past (T153856), but modern CI tools (especially with availability of r-base Docker image) will make it possible for us to have proper CI (which I have on my personal R packages on GitHub). MPopov (WMF) (talk) 21:21, 28 September 2020 (UTC)Reply

Since Gerrit 3 we have a Comment Threads tab which is fairly similar to how conversations are displayed in Github.
The consultation page says In addition [to issue tracking] we would turn off repository wikis, GitLab Pages, and other features overlapping with currently provided tooling. (which I find a bit confusing: sure, we have a - probably superior - existing alternative for issue tracking and wikis, but what's the currently provided tooling for GitLab Pages-like functionality? people.wikimedia.org is only awailable to a few people and using Toolforge for this purpose would have a ridiculous level of overhead. Doc page generation via CI, maybe? It's not quite the same thing - you can use Pages to generate a webpage from your repo code, but also in a number of other ways. And in any case, doc generation via CI seems even more arcane and complex to set up than Toolforge.) Tgr (WMF) (talk) 09:15, 1 October 2020 (UTC)Reply
Re: Pages:

which I find a bit confusing: sure, we have a - probably superior - existing alternative for issue tracking and wikis, but what's the currently provided tooling for GitLab Pages-like functionality? people.wikimedia.org is only awailable to a few people and using Toolforge for this purpose would have a ridiculous level of overhead

FWIW, I don't think we in the consultation WG have analyzed that particular aspect of things deeply. If there's a strongly felt use case for a Pages-like feature, then I think that's probably a reasonable discussion to have. We've called out wikis and issue tracking explicitly to prevent fragmentation in those domains, and I don't have a strong feeling as to whether Pages presents a similar risk. Would be curious what others think. BBearnes (WMF) (talk) 16:45, 1 October 2020 (UTC)Reply
@MPopov (WMF) said above:
We have generally chosen to use GitHub for our code/analysis repositories since we find it much easier to use, and creating repositories is much easier (since we can do it ourselves without requesting).
To expand on that, it's not just that we have the rights to create GitHub repositories in the wikimedia and wikimedia-research organizations. It's also that we can create repositories under personal GitHub accounts and later move them effortlessly to the main organization.
For example, I originally created wmfdata-python to streamline my personal analysis workflows, so I naturally stored it in my personal GitHub namespace. Over time, others on my team and, later, researchers on other teams started using it too. Eventually, we decided we should move it to a more official location. With GitHub's move repo feature, it literally took 1 minute to accomplish this and the automatic redirection (for both web and Git access) make it completely seamless for user.
From what I understand, GitLab has these exact same abilities natively. Some comments here have pointed out that it would be theoretically possible to create user namespaces in Gerrit, which would be an improvement on the current situation, but as @BBearnes (WMF) said it would be "fighting the design of the system" and wouldn't be nearly as good as the GitLab/GitHub model. Neil Shah-Quinn (WMF) (talk) 08:55, 5 October 2020 (UTC)Reply
Also let me emphasize another point that Mikhail made:
THE biggest differentiator from Gerrit for us is GitHub's ability to render Jupyter notebooks (example); GitLab can do this and we just want to make sure that this feature is enabled (and maybe coupled with an internally-hosted nbviewer service for the actual rendering).
Jupyter notebooks have nearly become the common format for data science (for example's, GitHub's State of the Octoverse report says that their use on GitHub has grown more than 100% in each one of the last three years).
Gerrit can only display Jupyter notebooks as long JSON blobs, but GitLab can show them in their rich, rendered format. This is a hugely important feature for us; if we switch to GitLab, we can start using it to host our analysis code, but if we stick with Gerrit, we will have no choice but to continue the fractured status quo ("production"/"library" code on Gerrit, analysis code on GitHub). Neil Shah-Quinn (WMF) (talk) 09:05, 5 October 2020 (UTC)Reply

Gerrit usability problems

[edit]

When I started working for the Wikimedia Foundation Gerrit was all new to me. I've grown to feel that the Gerrit workflow has a lot to recommend it; however, it took me a lot of daily Gerrit usage to understand how everything fits together.

Certain Gerrit behavior is still opaque to me. Until recently I hadn't reflected on what, in particular, is hard-to-use about Gerrit. Many of these things are configuration settings, some of these things are how Gerrit itself works. This is my list.

Voting on Labels

[edit]

This is a UX problem. Explicit code review is a good thing. GitHub allows for 3-types of review, "Comment only", "Approve", or "Request changes" which is exactly what I want when reviewing code. The things that cause confusion about voting on labels is two-fold:

  1. What does each label (CR or Verified) mean?
  2. What does a vote for each label (+1, -1, +2, -2) mean?

You can break down each of these points to allow for multiple levels of misunderstanding: what does "Verified" mean in a semantic sense vs what happens if I set something to "Verified"?

Likewise, to unpack what +1, -1, +2, -2 means we have to separate what they do (i.e., CR+2 merges code) from their semantic meaning. CR+1 can mean anything from, "I generally approve of this change, but don't have the time or the permissions to merge it" to (the old joke) "Reviewer has working mouse".

Adding numbers to this situation only serves to confuse matters more. I cannot be the only person to have asked and answered, "how many +1s makes a +2?".

It's also confusing that the answer to, "how do I merge this?" is typically to vote CR+2; however, it is likewise true that something that has CR+2 isn't necessarily merged.

All this is to say that the basis of code review in Gerrit is confusing. Additionally, the fact that voting on labels and commenting on tasks trigger opaque actions means that, as a new user, I was scared to touch anything.

Repo discoverability

[edit]

There is a list of all the repos athttps://gerrit.wikimedia.org/r/plugins/gitiles/. There is, as far as I'm aware, no way to get to this page from the Gerrit homepage. You can also see a paginated list two-clicks off the Gerrit homepage if you look under "Browse" > "Repos". This list is mostly useless. Which of these repos are active/have any recent activity? There are projects that are explicitly container projects here; they appear to be mostly empty repositories (https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/ for example). This list is not useful unless you know what you're here to find.

It's noteworthy the information above is only a few clicks away assuming that the word "gitiles" means anything to you. Doing my best to mimic a person who doesn't know the word "gitiles" I end up going down the route: Home > Browse > Repositories > repository name > Branches > ... — I was wrong — you just have to know that "gitiles" means "browse repositories" since it's the only clickable word (this is configurable, IIRC).

What repositories are my team responsible for? What repositories do people on my team contribute to a lot? What repositories get contributed to a lot generally? These are findable in Gerrit, but discoverable in other systems. Unless a newcomer really thinks about the questions that a typical GitLab or GitHub dashboard answers, they won't know what questions to ask.

URLs

[edit]

Gerrit URLs are bad. Sometimes there's an /r or a /p or an /a or a /c or some other weird fiddly detail to remember in a URL that people ignore.

Committing useful Gerrit URLs to memory or guessing the right URL is difficult in Gerrit.

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core vs https://github.com/wikimedia/mediawiki

To be fair, the above Gerrit URL could be shortened to https://gerrit.wikimedia.org/g/mediawiki/core but as soon as you click on something it will expand back tohttps://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core

Also, that doesn't actually link to code, that links to the README and a list of commits and branches. A more equivalent link to the GitHub link ishttps://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master

A pro-tip for Gerrit is that each change is globally unique so thathttps://gerrit.wikimedia.org/r/630637 andhttps://gerrit.wikimedia.org/r/c/mediawiki/core/+/630637 are actually the same change. Going to https://gerrit.wikimedia.org/630637 (vs https://gerrit.wikimedia.org/r/630637 [/me squints]) is, of course, the generic apache 404 page without a path to the home page.

Repository creation

[edit]

The repository creation right is reserved in our Gerrit for Administrators or Gerrit Managers. Expanding this has proven difficult. There are cases where Gerrit Managers need Gerrit Administrators to change things.

There is no user namespace where people can develop ideas without first having to figure out where they live in the hierarchy of existing projects and what permissions they should inherit.

Sign in

[edit]

When you sign up for a developer account on Wikitech you provide 3 potential usernames: Wikitech username, shellname, and email: Which of these do I use to login?

The answer here is that you login with your Wikitech username, you ssh in with your shell username, and your email is forever associated with your gerrit account. It'll let you change your email in the Gerrit interface, but that gives you an error, but lets you do it anyway.

I, personally, had a hard time remembering which username was which and which one to use where.

Merging

[edit]

In order to merge in many repos you click CR+2. In some repos you click CR+2 and V+2 and wait for the Submit button to show up and click Submit. If you mistake a repo of the first type for a repo of the second type you've broken CI.

Local branches

[edit]

Your local repository state sometimes matters in Gerrit and sometimes doesn't. Gerrit doesn't care about your local branches at all, but it cares deeply about your local commits. The fact that the local state of your git repository only matters in certain circumstances is confusing to newcomers.

If you have a local feature branch with two commits do git review -R it creates two patchset and your local branch is immaterial. At first this seems like not at all what you meant to do: those two patches are related! Gerrit takes care of that for you and no one can merge one without the other. Unless they *really* want to. This relation is shown in the UI in an order I can never remember (starting at the top or bottom? Dunno) alongside other possibly (un)related things like code submitted with the same topic -- for example if you've helpfully selected "docs" as the topic of this code you end up listing a number of changes in the same area as your related patchset.

What if those are two changes weren't on a feature branch and were in fact orthogonal patchsets that can be merged independently? Then your local repository must reflect that.

The fact that Gerrit cares about commits but doesn't care about branches makes sense if you understand that Git is a DAG, but should you have to understand that to intuit about submitting doc changes?

Also, since the local state of your code doesn't match the state of your changes on Gerrit: how do you update your existing changes on Gerrit? Do you need to force push? To what url? How does gerrit know to update your existing change vs creating a new change since the commit SHA1 changed? When you push a merge-commit for review: what happens?

Change-Ids

[edit]

You have to run a shell script hook on your local machine to generate a unique ID for a change you want to submit to Gerrit. This is how Gerrit knows how to update your existing changes vs creating new changes.

Reverts don't work with the hook install instructions provided by our Gerrit. To revert you run git revert followed by git commit --amend --no-edit otherwise you still won't be able to push even though you know you've got the hook installed correctly. This hits all new deployers at some point.

Change-Ids are linkable, globally unique IDs. Patchset URLs contain a number which is also a linkable, globally unique ID. Likewise the SHA1 of your change is a linkable ID that is most likely globally unique. Gerrit has all of these and they are all different links to the same patchset.

To be fair, it's possible that Change-Ids will prove genius when git repos move to SHA256 hashes. It's also possible/likely that we'll just have yet another globally unique ID in Gerrit.


I haven't thought about a lot of these issues in a while. GitLab probably has its own laundry list of issues and isn't a panacea. This post was meant as a response to all of the questions about what, precisely, are Gerrit's usability issues: these are the ones I've run into while learning to use the system and helping others do likewise -- there are undoubtedly more that I didn't cover here.

</rant> TCipriani (WMF) (talk) 22:14, 28 September 2020 (UTC)Reply

Inability to edit or delete comments is also a problem; first in that it's not obvious you can't edit your comment before you press "Send", and second in that once you do realize it's impossible to edit/delete, commenting becomes a more intimidating thing to do (since your message is there "forever" to highlight your lack of insight of what is happening in some piece of code, for example) KHarlan (WMF) (talk) 16:07, 30 September 2020 (UTC)Reply
Wrt labels, they have some seemingly arbitrary behaviors which are relevant to their purpose - for example, C-2 is copied to new patchsets but C-1 or V+2 is not; C+1 is copied to new rebases but V+1 is not. In general, yeah, labels are one of the bigger Gerrit UX fails.
Wrt repo discoverability, yeah it sucks, but is it better in GitLab? On https://gitlab-test.wmcloud.org/explore I can't even filter out other peoples's forks.

Change-Ids are linkable, globally unique IDs. Patchset URLs contain a number which is also a linkable, globally unique ID. Likewise the SHA1 of your change is a linkable ID that is most likely globally unique. Gerrit has all of these and they are all different links to the same patchset.

Not quite. sha1 IDs link to the specific comit (patchset). Changeset IDs link to a changset (collection of patchsets). Change-ids link to all versions of that patchset in all branches. Tgr (WMF) (talk) 11:01, 1 October 2020 (UTC)Reply
> Wrt labels, they have some seemingly arbitrary behaviors which are relevant to their purpose - for example, C-2 is copied to new patchsets but C-1 or V+2 is not; C+1 is copied to new rebases but V+1 is not. In general, yeah, labels are one of the bigger Gerrit UX fails.
That was done on purpose (T43074 / All-Projects #124891) and the lengthy explanation is at https://lists.wikimedia.org/pipermail/wikitech-l/2014-April/075918.html
The uniqueness of a Gerrit change is determined by the triplet: (repository, branch, Change-Id). Different changes can thus have the same Change-Id, that is typically the case when cherry-picking a change to a different branch. The unique thing is the change number. Antoine "hashar" Musso (talk) 16:44, 1 October 2020 (UTC)Reply

That was done on purpose (T43074 / All-Projects #124891) and the lengthy explanation is at https://lists.wikimedia.org/pipermail/wikitech-l/2014-April/075918.html

Sure. My point is, it adds to the mysteriousness of the code review interface, and thus to the learning curve. Tgr (WMF) (talk) 18:22, 1 October 2020 (UTC)Reply

Cross-repository search / browsing

[edit]

I've read a few comments that bring up the purported strengths of GitLab's cross-repository browsing and search. Does cross-repository search work though? If I search for fast_finish: true which shows up in MediaWiki core's .travis.yml file, the search doesn't find that file unless I am searching specifically in the MediaWiki core repo (https://gitlab-test.wmcloud.org/search?utf8=%E2%9C%93&snippets=false&scope=&repository_ref=master&search=+fast_finish%3A+true&group_id=37&project_id=21). If I search by any group/project then the result isn't found https://gitlab-test.wmcloud.org/search?utf8=%E2%9C%93&snippets=false&scope=&repository_ref=master&search=+fast_finish%3A+true Maybe I'm missing something?

This appears to work differently from GitHub, where cross-repository search does tend to work quite well. KHarlan (WMF) (talk) 12:57, 29 September 2020 (UTC)Reply

Search through multiple repositories seems to not be available in Gitlab starter? That needs the proprietary version which comes with an ElasticSearch backed search. https://gitlab.com/gitlab-org/gitlab-foss/-/issues/14597 . Their doc about integrating ElasticSearch https://docs.gitlab.com/ee/integration/elasticsearch.html
That being said, we have https://codesearch.wmcloud.org/ to process any git repository. Antoine "hashar" Musso (talk) 13:37, 29 September 2020 (UTC)Reply
@Hashar thanks, I feel like this is important to highlight then, as several people have mentioned this feature as an argument in favor for moving to GitLab. (As an aside it also seems like https://codesearch.wmcloud.org/search/ doesn't have enough prominence in onboarding / documentation for developers.) KHarlan (WMF) (talk) 13:40, 29 September 2020 (UTC)Reply
I have further edited my comment to mention codesearch. I had replied too fast. Antoine "hashar" Musso (talk) 13:42, 29 September 2020 (UTC)Reply
An outcome of this discussion is that Code Search is now linked from Gerrit top menu: Browse > Code Search. Antoine "hashar" Musso (talk) 16:35, 1 October 2020 (UTC)Reply

The "what am I looking at" test for GitLab

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


First, I'm relatively new to GitLab, and I also accept that for powerful tools, you often need to spend time to get acquainted before you can use them well.
That said, I'm not sure the GitLab merge request UX wins over Gerrit's patchset view for clarity on what is happening in a given change.
Looking at https://gitlab-test.wmcloud.org/mediawiki/core/-/merge_requests/5, some questions as a newbie:
  • Overview (12) -- what is the 12 counter indicating?
  • Commits (23) -- there is discussion in another topic about what our workflow would be like, so maybe this is an open question, but as a reviewer I'm not sure if this means I'm squashing 23 commits, or are there 23 separate commits to review? When I click on the tab, it looks like there's really only one commit to look at from this merge request (https://gitlab-test.wmcloud.org/mediawiki/core/-/merge_requests/5/diffs?commit_id=d9bcde7b543fcdd3222137d4cb84ac3a7914a137)
  • Pipelines (3) -- what does this mean? Are there three different jobs that are running for this merge request? When I click on the tab, it looks like it's a single job that ran three times and failed each time?
  • Changes (5) - this is always the number of touched files?
  • 2 unresolved threads -- when I press the button to navigate, it's really unclear to me what is resolved and unresolved, especially as there is a UX component indicating that all threads were resolved 1 week ago?
  • Deletes source branch -- that seemed a little unnerving but hovering over the ? mark made it clear that nothing bad would happen with this.
  • "Ask someone with write access to this repository to merge this request" -- would be cool if we were able to modify this somehow to provide a list of usernames who could be pinged per repo, otherwise this seems like a dead-end similar to the gerrit patches where new contributors have to figure out on their own who can review their code.
  • As a non-member of the MediaWiki/Core group on gitlab-test (e.g. the equivalent of a Gerrit reviewer without +2), is my vote a thumbs up emoji?
I am not going to defend Gerrit's UX (see this list :) ), but I am uncertain whether the Merge Request UX in GitLab is easier to use or more intuitive to grok. Some questions/suggestions:
  • Is there a way to switch off features that clutter up the chrome and cause more cognitive load, like "Milestone", "Time tracking"?
  • Similarly, for the sidebar, can we remove things like "Operations", "Packages and Registries", "Analytics" all of which detract focus from the code review process and don't seem like things we'd be using? KHarlan (WMF) (talk) 13:26, 29 September 2020 (UTC)Reply

Is there a way to switch off features that clutter up the chrome and cause more cognitive load, like "Milestone", "Time tracking"?

Good questions. I'll investigate. BBearnes (WMF) (talk) 17:35, 29 September 2020 (UTC)Reply
I suspect that merge request is a bad place to start… I don’t know what happened to it since I left it, but it seems to have at least a backdated comment from Greg (displays as “1 week ago” but on hover reveals “Sep 17, 2020”, whereas I only opened the merge request on “Sep 21, 2020”) and also these 22 mystery commits. My best guess is that too many people have been trying out different things at the same time…?
As for the pipelines, I think one “pipeline” equals one “build”, and may contain multiple “jobs” (that term, at least, seems to be the same). In this setup, each pipeline only contains one job, but a more complex example can be seen at e. g. https://gitlab.gnome.org/lucaswerkmeister/gnome-shell/-/pipelines/215178, which is a pipeline with a total of 8 jobs distributed between three… “stages”, apparently. Lucas Werkmeister (WMDE) (talk) 15:19, 29 September 2020 (UTC)Reply

I suspect that merge request is a bad place to start… I don’t know what happened to it since I left it, but it seems to have at least a backdated comment from Greg (displays as “1 week ago” but on hover reveals “Sep 17, 2020”, whereas I only opened the merge request on “Sep 21, 2020”) and also these 22 mystery commits. My best guess is that too many people have been trying out different things at the same time…?

Ah, yeah, there's some weirdness here that is my fault. I force-pushed to mediawiki/core somewhat recently to do a reset there, which would leave any forks from its pre-updated state out of sync. This serves to highlight the axiom that one generally shouldn't rewrite shared history, and in retrospect was a bad idea while people were already actively testing, but it's also not a situation that should arise in real-world use. BBearnes (WMF) (talk) 17:34, 29 September 2020 (UTC)Reply
Ah, thanks… I assume merge requests don’t count as shared history here, and those can still be force-pushed if necessary? (Only while they’re not merged, of course.) Lucas Werkmeister (WMDE) (talk) 17:45, 29 September 2020 (UTC)Reply
Yeah, I think that's a reasonable understanding of it. BBearnes (WMF) (talk) 17:47, 29 September 2020 (UTC)Reply
It seems foreshadowing that even in such a simple trial we're already running into inexplicable behavior and bugs.
In the "Changes" tab I can select between which "versions" (???) of a Merge Request I want to see a diff. The latest version, "version 3", refers to commit d9bcde7b and contains 1 commit. But there is also "latest version" that also refers to commit d9bcde7b but contains 23 commits. What?
https://imgur.com/a/3UQ3iKy Michael Große (WMDE) (talk) 18:36, 29 September 2020 (UTC)Reply
This behavior is entirely explicable in terms of the target branch having changed in the interim, and although undeniably confusing, does not appear to surface any bugs. Because I force-pushed to master on mediawiki/core after the merge request was submitted, it no longer contains the extra experimental commits upon which d9bcde7b was based.
It would be nice if the interface made it clearer what's going on, but in practice the primary branch on a repository is protected by default, "don't force-push to shared branches" is generally standard policy, and thus I doubt it's come up enough to receive much development attention. BBearnes (WMF) (talk) 18:55, 29 September 2020 (UTC)Reply
OK, I think I should try the "What am I looking at test" on a more correct merge request that doesn't have a force push involved. KHarlan (WMF) (talk) 19:17, 29 September 2020 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Code review experience on a mobile device

[edit]

I have positive first impressions from interacting with GitLab via a mobile device, much better than the status quo with Gerrit (at least on iOS where it seems like there are various bugs with the commenting interface). Commenting and voting on Gerrit via iOS Safari is a pain.

I have no idea how much overall impact a more mobile-friendly code review UX (with GitLab) would provide, but it seems like a good consideration to keep in mind. KHarlan (WMF) (talk) 19:07, 29 September 2020 (UTC)Reply

More human UX: support for users avatars

[edit]

I appreciate that in GitLab there are avatars where you can see the face (or whatever image someone chooses) behind the screen name, which can help center participants around the idea that they are collaborating with other humans (so, be nice, constructive with criticism, etc). Also nice that you could hover over a user's name to see more about their affiliation, role, etc.

Sure, it's kind of a minor issue in comparison with branch workflows, dependent merge requests and other topics discussed on this page, but the ability to more humanize participants in code review would be a welcome addition to a space that can feel pretty intimidating and sharp, especially to newcomers. KHarlan (WMF) (talk) 19:13, 29 September 2020 (UTC)Reply

This is possible in gerrit, BTW, but is disabled because the default implementation violates our privacy policy; enabling a work-around for it just for gerrit was rejected in favour of doing so for OTRS and other Wikimedia services too. Jdforrester (WMF) (talk) 19:28, 29 September 2020 (UTC)Reply
To be clear, the initial idea for a workaround was rejected due to lack of a complete solution. The workaround was to let people store arbitrary image data in git alongside our repositories, which seemed like a fraught path aside from any UX issues.
Avatars are possible to have in Gerrit (see also Gerrit's upstream instance), but (as I understand things which is often subtly incorrect) we need to come up with the UX on our own. TCipriani (WMF) (talk) 20:29, 29 September 2020 (UTC)Reply
Well, GitLab will not really provide "More human UX" since Gerrit already offers the same capability, it is just not enabled. If it were to be deemed essential for CodeReview it can be enabled in Gerrit too. – Ammarpad (talk) 06:36, 30 September 2020 (UTC)Reply
Support for avatars would be great. It is a bit concerning on the privacy side of things since we do not want to leak any private information to a third party when users visit our websites. So we would need our own service or at least a proxy.
> Also nice that you could hover over a user's name to see more about their affiliation, role, etc.
The devil is that we do not really have those informations available. If there was some API to retrieve a user "social profile", surely we could add a bit of javascript in Gerrit to display those when hovering a username. Or we could use Facebook for SUL.. Hm.. No! Wait. Antoine "hashar" Musso (talk) 19:34, 1 October 2020 (UTC)Reply

Squash Merge considered harmful

[edit]

GitLab is substantially lacking compared to Gerrit in its central, most important feature: The Code Review Workflow.


GitLab's Core Review workflow is centered around its concept of reviewing Merge Requests, i.e., the entirety of usually multiple commits. That is to be contrasted with Gerrit's approach of reviewing patches individually.


This has consequences for how the review and merge of a chain of multiple atomic commits happens, e.g., two refactoring commits and one commit that does the behavior change. (Which is the by-the-book best-practice thing to do.)


In such a scenario, if there are any changes needed than you have three options:

1. Create follow-up commits, and review all commits collectively, and have all commits survive the merge, leading to a very messy, fractured, red and basically useless git history or

2. Create follow-up commits, and review all commits collectively, and squash them all on merge, leading to one single huge commit that does a ton of things and is hard to revert or

3. Rewrite the git history in your merge branch for which GitLab has absolutely crappy UI support and still doesn't allow you to merge individual commits out of that chain


All of those options are bad, and we will see _all_ of them in practice in our repositories all the time. We'll see chains of commits in our history that are all broken individually (and without tests, because those are added in a follow-up) next to monster commits from feature branches squashed down into a single commit. We will _not_ be able to rely on master being green for every individual commit. All the while some devs are getting even more frustrated as they are dutifully rewriting the history of their merge requests for pointless effort of trying to maintain a meaningful and useful git history.


I really hope the benefits for Ops and CI are truly awesome to be worth all this.


PS: Did I mention that this will lead to reverts being close to impossible because behavior changes will be spread across multiple commits because only the very last commit (i.e. only the MR as a whole) has to pass CI? Michael Große (WMDE) (talk) 19:27, 29 September 2020 (UTC)Reply

Thank you Michael. I completely agree with you. – Ammarpad (talk) 06:32, 30 September 2020 (UTC)Reply
To rephrase the idea is that given a merge request with 3 commits:
  • refactor 1
  • refactor 2
  • behavior changes
> 1. Create follow-up commits, and review all commits collectively, and have all commits survive the merge, leading to a very messy, fractured, red and basically useless git history
I would imagine that the three commits already got reviewed and one adds an adjustment commit on top of those. Lets name 'fix typo in variable'. You could approve the merge request for merging and the history would look like:
$ git log --oneline --decorate --graph
* (master) Merge branch 'feature' into 'master'
*\
| * fix typo in variable
| * behavior changes
| * refactor 2
| * refactor 1
|/
* some base commit
And the pile of commits can be filtered out by instructing git to only follow the first parent of merge commit:
$ git log --oneline --decorate --graph --first-parent
* (master) Merge branch 'feature' into 'master'
* some base commit
And you then have to rely on the feature to branch to have a meaningful name cause the commit doesn't old any further information. That is the famous rant by Linus Torvalds regarding pull requests: https://github.com/torvalds/linux/pull/17#issuecomment-5654674 . It is from 2012 though and I guess GitHub did not have the squash feature yet.
This first case is fairly typical in Github, people just pill up their commits, push to their feature branch until the review is pleased. You then get a single merge commit but still retain all the history of the iterations. When they are nice isolated commit that is actually helpful: it is a proper feature branch. When they are pill up of meaningless commits 'fix stuff', 'oops', 'grr did not pass', 'fix spaces'. Indeed that is a lot of noise.
That leads me to the second case: squash!
> 2. Create follow-up commits, and review all commits collectively, and squash them all on merge, leading to one single huge commit that does a ton of things and is hard to revert or
That depends. If your the branch has been used to quickly iterate over a rather well isolated code, it might make sense to consolidated it as a single commit, and I guess that is what most new comers would end up doing: git add / git commit / git add git commit etc. It is a linear iteration.
If the branch mixes concerns, surely one would NOT want to squash it for the reason you stated: a single huge non atomic commit. In this case either the commits would need to be rephrased / cleaned up, which can be done locally with git rebase -i and squash / beautify the commits making the branch, then send that again to update the merge request. It is definitely doable in GitHub (force push to your local branch and that update the pull requests), I would imagine it is supported by GitLab.
> 3. Rewrite the git history in your merge branch for which GitLab has absolutely crappy UI support and still doesn't allow you to merge individual commits out of that chain
Oh or maybe Gitlab does not after all? Beside the UI, isn't it possible to rewrite the git history in your local branch and force push to get GitLab to rebuild the merge request?
If one of the commit is worth merging immediately, the developer can surely extract it to its a new branch/merge request and get that merged. When the older branch get merged I am not sure what would happen though: maybe the commit will be redundant unless the old branch/merge request gets rebased.
Extracting a quote:
> We will _not_ be able to rely on master being green for every individual commit.
If you only follow the first parent of the branch then those would be green. But indeed individual commits merged in might be broken assuming CI only runs against the tip of the merge request.
I guess if one requires a revert, the whole merge request get rolled back. Which leads to git howto: revert a faulty merge ie: git revert -m 1 <merge commit>
----
I agree those are valid concerns, I assume they can be addressed by guidelines and contribution advice and surely people approving merge request would be more sensible to having a clean git history. They are more likely than others to have to often dig in the history. Antoine "hashar" Musso (talk) 19:26, 1 October 2020 (UTC)Reply
> To rephrase the idea is that given a merge request with 3 commits:
I was hoping we could replace the "stack of patches" model in Gerrit (see here for random example) with a stack of merge requests, so rather than a Merge Request with three commits (only the latest of which would be tested with CI, and where the merge request needs all commits to be merged together), you could have a stack of merge requests where the first MR is to merge into `main`, the second MR is made with the first MR as the target branch, and the third MR is made with the second MR as the target branch.
In the GitLab UI, you end up with a similar diff view to what you see in Gerrit when you are looking at a patch in a stack of patches. So that's nice. The downside is that in the UI there's no way to specify that this chain of merge requests are related, and that they should be reviewed together (as in Gerrit where you can see the stack of patches at the top right of the window. At least, I couldn't figure it out over at https://gitlab.com/kostajh/mediawiki-extensions-growthexperiments/-/merge_requests KHarlan (WMF) (talk) 03:25, 3 October 2020 (UTC)Reply
A stack of changes in Gerrit is the grouping of related commits in a chain of dependencies. In GitLab the grouping is done in the form of a branch which is being requested for a merge.
In Gerrit you can't really express that a stack depends on another one. You would have to chain them in the same stack, eventually differentiated by setting a topic.
In GitLab core it is not possible either, but the proprietary version offers merge requests dependencies which spawns across projects: https://docs.gitlab.com/ee/user/project/merge_requests/merge_request_dependencies.html Antoine "hashar" Musso (talk) 06:33, 3 October 2020 (UTC)Reply

Integration with Cloud Services and web hooks

[edit]

Apologies if much of this is already answered. I am very curious if this would be open to integration with Striker (eg. new Toolforge tool creation can create a new repo in Gitlab), or if we are considering a model similar to the manual creation of repos in Gerrit.

In addition to that, will webhooks be enabled that we could potentially use in future Cloud Services continuous deployment workflows that are likely to be maintained by Toolforge admins inside the Cloud. BStorm (WMF) (talk) 18:52, 30 September 2020 (UTC)Reply

Apologies if much of this is already answered. I am very curious if this would be open to integration with Striker (eg. new Toolforge tool creation can create a new repo in Gitlab)

I haven't investigated this deeply, but glancing over the API it at least seems possible.

In addition to that, will webhooks be enabled that we could potentially use in future Cloud Services continuous deployment workflows that are likely to be maintained by Toolforge admins inside the Cloud.

We experimented with having a webhook post to a Toolforge endpoint yesterday. There were some rough edges (namely, GitLab webhooks don't currently include a User-agent header for some reason, which Toolforge doesn't seem to like), but apart from that it was pretty easy to configure on a per-repo basis and seems like it'd be an obvious benefit to support. BBearnes (WMF) (talk) 17:24, 1 October 2020 (UTC)Reply
In the past I had triedc to set up a Gerrit webhook to auto-deploy wikibugs to Toolforge, but we ran into the problem that production services (intentionally) cannot talk to Cloud Services and Toolforge. We worked around this by having a simple CI job use curl to trigger the "webhook".
Is the setup for Gitlab with respect to webhooks going to be any different? If they ran in CI's network space where prod can talk to cloud services, that would be nice. Legoktm (talk) 19:08, 1 October 2020 (UTC)Reply

medium/long term viability

[edit]
One thing that is not super clear to me is what the longer term viability of Gerrit is compared to GitLab.
Looking at merged patches to Gerrit you can see there's a lot of activity, and quite a bit more if you look at unmerged patches. Looking at the GitHub mirror of Gerrit (ha) to see contributor information, it looks like most commits in the last two years are coming from fewer than 10 people, and of the top ten contributors, half stopped contributing in 2018/2019. GitHub says there has been a total of 285 contributors in the 12 year history. On the plus side, there is a roadmap and it looks to have been updated recently with plans for a 3.4 release in Q2 of 2021.
Whereas GitLab is a mega project by comparison, with 3,395 contributors (excluding GitLab employees; with GitLab employees included then there are 6,534 contributors). The GitLab company is comprised of 1,279 people. Of the contributions to GitLab, 82.75% come from GitLab so there is at least for now an ecosystem around their products that is accepting of contributions from outside.
So just looking at the scale of the ecosystems around the two different products, if I had to say which code review system is going to be more mature and include more features that would make developer lives easier in, say, the next 3-5 years, then GitLab seems like a more certain bet. (Of course this doesn't erase the concerns about open core / enterprise editions, or that the company could decide to move things in a direction that cause issues for us, etc) KHarlan (WMF) (talk) 09:13, 1 October 2020 (UTC)Reply
Having written the above, personally I would favor investing resources in improving our current setup and working on the various social issues around code review, then reassessing the viability of open core GitLab in the future. The open core edition of GitLab is limited enough that I'm not sure I see the overall benefit of switching to it in comparison to the tools and features we have in our current setup. KHarlan (WMF) (talk) 03:17, 3 October 2020 (UTC)Reply
Came here to ask/say this as well as add another wrinkle: by standardizing on a well maintained, large ecosystem, we lower our cost of maintenance. Lowering our cost of maintenance means we *should* be able to free up more people to work on things that are core to where we add value. The less we maintain one-off solutions that don't add value to our mission as a movement, the more we have to focus on the things we do care about: creating an awesome platform for open knowledge. We don't need to innovate (or even glue) in the Continuous Integration/Continuous Delivery/Code Review space. Let's pick "industry standards" so that we don't have to train and maintain individuals on non standard things. This goes way beyond onboarding of new devs or attracting new talent and is as much about the experienced devs who have to regularly help onboard new folks, update the documentation, etc. GIngersoll (WMF) (talk) 14:38, 2 October 2020 (UTC)Reply

OpenCore / our use case does not match GitLab business cases

[edit]

I have been acting on the migration from subversion to git/gerrit. My primarily responsibility is maintaining the CI infrastructure and a secondary one is administrate our Gerrit


I would like to dig into Gitlab's business model (open core), how it splits the features between free to use and proprietary ones, list the features we would definitely want and expose the workaround we will have to implement. My thesis is that if we stick solely to the open source version we do not fit GitLab vision and will endure a long journey of implementing on own tooling on top of their limited open source offering. It might not even be cost effective and would not address the maintenance and upgrade needs.

My conclusion is that we should rather adopt their full offering which implies relying on proprietary code but with the benefit of a fully integrated environment. If we do not want to compromise on the open source principle, the alternative is to seek an alternate code review system or stick with the statu quo and invest in it.

GitLab open core

business model

There is one thing I really like about the GitLab company is that they are extremely transparent (see for example their staff handbook). There is thus no surprise on their offering.

Gitlab is a company, it has to make money somehow which is achieved by selling a product to consumers. Their business model is Open core: the main features of their product are released as open-source software and are thus free to use (as the license grants you the ability to use it and modify it however you want AND the license is not subject to a financial fee). Extended features are offered under a proprietary license which comes with a fee, in Gitlab case the code is additionally available for reading, but you can not use it without agreeing to their proprietary license.

My understanding is the open core business model emerged as a way to fund an open source software. By restricting the availability of some features, that effectively forces corporate users to pay for them which in turn fund the company and the open source part. Wikimedia already uses the ‘’’open source part’’’ of such open core projects: Kafka, Cassandra, ElasticSearch, Redis to name a few.

open core and Wikimedia

One of the Wikimedia Foundation guiding principle is freedom and open source [3]. All software written by the Foundation is open-sourced. My opinion is that it allows anyone to fork our projects and have all the software stack to do that without requiring any licence.

For the organization itself, we do prefer using open sources software, but do not forbid proprietary ones. When there is no effective open-source alternative or the infrastructure would be too challenging to maintain, we would adopt a proprietary solution. As an example, the Wikimedia Foundation uses the Google suite for emails, calendar, some documents and for video call. Open source alternatives do exist, but integrated them all together in a friendly to use suite is probably not achievable.

I don't think GitLab proprietary code goes against the foundation guiding principle about freedom and open source, as long as it is shown that those proprietary features can't be fulfilled effectively by open source tools. Afterall, the code review system is not essential in order to fork a wiki project, just like gmail or using a proprietary code editor do not prevent producing open source software.

features tiers

The way GitLab determines whether a feature should be open source or in one of the proprietary tiers is described on their stewardships page and specially on pricing page. A summary is that when a feature is introduced, they would ask themselves who is going to be the likely buyer:

  • a single developer: open source
  • a team manager, director, executive: proprietary tiers.

I will take in example two features that were discussed on this talk page previously:

merge approval
when you are a single user, it does not make much sense to request a self approval or ask yourself to review your code. It is thus not intended for a single developer and the feature is thus for a paid tiers.
searching
as a single developer, you already have all the repositories on your local machine and can just search through them on disk. You would probably not bother setting up an ElasticSearch backend and running the indexer either. It is thus a paid tier.

Based on community feedback, they might move some features to the open-source tier. Then, and quoting GitLab: the premium product needs to hold value. Since they can't just open-source everything or there will be little incentive for corporate users to be willing to pay for a license which in turns will financially dry the company.

Features requirements

GitLab features by tiers is available at https://about.gitlab.com/pricing/self-managed/feature-comparison/ . We already went to list the features we are after and how the gap could be filed: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/GitLab/Features. The bottom of the tables lists them as likely non blocker, I thus assume the top are the hard requirements.

There are a few deal breakers left:

merge approvals
it is the feature that enforces code review. The lack of that feature prevents us from implementing our privilege policy and would make it challenging to figure out whether a merge request is suitable for approval. A proposed workaround is marge-bot, but its documentation mentions that you have to turn on the required approval proprietary feature in GitLab. marge-bot seems to be more about a CI system that runs tests of the branch against the target branch to ensure nothing breaks. The same feature we have in our current CI / Zuul. I could not find any lead as to how marge-bot could potentially replace the proprietary feature, it actually seems to rely on it.
Merge Request Dependencies
for a single project in Gerrit, you would chain your commits in the order you want them merge, in GitLab that would be a branch and a single merge request. We often have to express a dependency between projects, such as a breaking change in mediawiki/core that affects several extensions, we want to make sure extensions get updated before merging the breaking change. It is not really a blocker for GitLab since Gerrit does not manage cross repositories dependencies, that is enforced by the CI system (Zuul). Still it would be nice to have rely on the builtin GitLab feature.
permissions
GitLab comes five roles: Guest, Reporter, Developer, Maintainer, Owner. The model seems to have clear separation of concerns, but it is not clear to me whether that would fit our current permissions schemes which are based on groups of users rather than generic roles. As an example,mediawiki/core fundraising branches are only actable by the Wikimedia fundraising team, GitLab offers a way to protect branches but that is based on one of the existing roles (for example Developers). We would thus be unable to restrict that branch to a subset of people.
Cross repositories search
I could not find a system to find related merge requests across projects. Similar to how in Gerrit I can look for any change I am a reviewer for (is:open reviewer:self) or any change related to a given bug (bug:T12345).

GitLab core / open-source set of features is strong, however there are several proprietary features that would also be very nice to have to replace existing tooling. We thus prevent ourselves from proposing a nicely integrated solution. A partial list of such features would be:

Code Owners
It assists in finding reviewers. We currently run a bot configured via https://www.mediawiki.org/wiki/Git/Reviewers, our Gerrit installation does have a similar feature (reviewers plugin) although it is not used. We can surely port our existing bot to GitLab but would still lack the nice integration.
global search
currently worked around with https://codesearch.wmcloud.org/search/ and mirroring to GitHub.
All the contributions metrics served currently by Bitergia.
Security analysis of dependencies, currently implemented as a custom script run by CI or relying on GitHub security analysis.


Benefit from the full suite

GitLab offers a well integrated suite and could even replace a lot of our custom toolings which would enhance our overall experience. Unfortunately restricting ourselves to the sole set of open-source features prevents us from benefiting from the whole experience which in my opinion should have been the main driver toward migrating to GitLab beside just git hosting and code review. Security audit, code search, metrics are all features that are currently badly exposed to developers and would be more prominently shown via a higher tier of GitLab, in turns leading people to use them and enhance our daily tasks.

The compromise of restricting ourselves to just the open source features might address the usability issues that people encounter with Gerrit. But it comes with important costs that we should not underestimated:

  • developers workflow will be disrupted, and we must accept that some of the existing workflow would not be implementable in GitLab.
  • the GitLab architecture involves several components we would have to maintain. Whereas Gerrit is a java jvm and flat files, GitLab involves way more components (rails, redis, postgreSQL, diff storage at least). We should not underestimate the resources that would need to be allocated in sustaining it. It has already proven to be troublesome for Gerrit.
  • a lot of our tools do not have a drop-in replacement and would have to be migrated.

Getting the most out of GitLab

I would like to suggest we evaluate GitLab proprietary paid tiers. Oh, I see pitches and forks raising. As alluded above the freedom and open source guidance comes first and foremost for the software we write and the wiki projects. It still allows use of proprietary one when no alternatives exist. There does not seem to be any proper open source alternative for a modern code forge (beside Phabricator/Differential which we ended up rejecting).

I strongly believe in open-source, but we also have to be pragmatic and understand that not all open-source software can be heavily funded via fundraising. People end up having to make a living out of it somehow and the open core is a compromise between economical reality and the purity of open-source. If we were to adopt the proprietary features, we would relieve ourselves of the burden of reinventing the wheel and in turn allocate the saved resources toward producing more open source software and better sustain our very own projects.

One might as well ask whether it still makes sense to self host the application. The Gerrit upgrades proved to be somehow problematic due to lack of funding and or active participation with upstream (though we had at least two volunteers dramatically helping on that front). I would guess we would suffer from the same trouble with GitLab which architecture is an order of magnitude more complicated, or at least involves more components. Using a SASS or a managed on premise appliance would free us up from the maintenance burden.


Conclusion

Hypothetically, if we were to agree to use proprietary software and SASS, we would have a top of the art code hosting solution with all the whistles and bells that makes life of developers so much easier.Under that hypothesis, we might as well consider using Github which is already the canonical place for several repositories. GitHub does have an on-premise offer which would fit some of our privacy requirements. Afterall GitLab and GitHub offer very similar experiences in the end.

The cost (time and money) of migration will be fairly large and I don't think it is offset by the limited set of features offered. We will still need to deal with the infrastructure maintenance and add a lot of new and custom tools on top of GitLab to make it fit our requirements.

There are alternatives though. One such is to elevate ourselves from being a Gerrit consumer to an actor of its open-source community. A lot of the usability concerns can be addressed by proposing code and enhancing the software. The UI is now using a JavaScript templating engine instead of java. Changing the project creation capability to no more be global but instead be a regular permission is probably not that complicated to implement had we had a couple of our java developers to look into it. But that needs resourcing on our behalf, either in our own developers or by contracting Gerrit familiar developers.

We could also consider other forms of hosting for Gerrit. Be it through a company such as GerritForge or via a likely minded organization: OpenStack and their OpenDev project. The latter offers more or less the same stack of tooling we use, are entirely open source and we borrowed our CI system from it.

The gitlab full suite is a good fit, but the limited subset of features in the open source version only gives us the branch workflow, a somehow more pleasant UI and repository creation. It is in my opinion too limited to consider worth migrating too.

Antoine "hashar" Musso (talk) 12:40, 1 October 2020 (UTC)Reply

Notes from Platform Team

[edit]

The following is consolidated feedback from members of the Platform Team (the team until recently known as Core Platform). We recognize that many of our questions and concerns have already been communicated by others outside Platform Team. Consider our duplication to be +1s.


Summary:

Based on the feedback we got from the team, there is a feeling that the move to Gitlab will lower the barrier to entry because of its similarity to Github and the edition of an easy to use UI to edit on the web (web IDE). We anticipate this will increase contributions from volunteers and improve onboarding new employees.

We are also excited about the move towards a chronological merge request branch to track changes. However, there is a worry that Gitlab may not offer all the features we have become accustomed to with Gerrit.


Now, the more critical part. We had a number of specific questions and concerns:

PROCESS QUESTIONS

- This consultation feels more like an initial exploration than solicitation of feedback on a concrete proposal. What question is actually being considered, beyond "should we move to GitLab?" When? How? How fast? What happens to repos currently on Github? Will Gerrit be shut down? Are we talking about "invest into exploring further?" or "commit for better or worse?" If we're not yet prepared to answer those sorts of questions, will there be another round of evaluation where they are considered?

- The document says an initial evaluation has been performed. What output did it produce? Did it produce a pro/con list, feature comparison, or decision matrix? It would be helpful to see that, as other feedback has mentioned Gerrit workflows that at least some Platform Team members were unfamiliar with, and which should be considered. Also, it seems unreasonable to ask all consultation participants to find out similarities and differences by themselves. Starting with the output of the previous stage would allow the consultation to focus on raising additional questions, discussing discrepancies, or highlighting the importance of certain features. Will there be a follow-up phase where this occurs?

- A switch of this magnitude is costly and risky. It may also be worthwhile, but the reasons presented do not seem compelling. From what is in the document, it isn't clear how GitLab is sufficiently better than Gerrit (or other alternatives, for that matter). Some of the reasons stated seem off. For instance, self-service repo creation is something we could turn on in Gerrit. While GitLab has advantages, and are we are not opposed to this switch, the consultation in its present form doesn't seem fit to provide sufficient certainty to justify the change.


WORKFLOWS

- Right now to make a change one would make a local branch, make changes and push it up to Gerrit. How would that change with GitLab? Would people have to directly interact with the UI to make a pull request? Would people have to fork each repo before starting?

- How would we make +1/+2 possible in GitLab? Would everyone be able to approve a pull request and then only some could merge? Is there some other sign-off mechanism?

- How do we feel about the handling of large and complex changes? A Gerrit topic branch naturally becomes a feature branch in GitLab, the main difference being that a Gerrit topic branch can be merged incrementally, whereas a GitLab feature branch is typically merged all at once. Importing a Gerrit workflow into GitLab defeats the ostensible point of taking advantage of developer familiarity with Github's PR model. However, some see feature branches as an anti-pattern. Should we accept bigger changes as part of the transition and consider the consequences? Or aim for the opposite?

- Can MRs be chained, like Gerrit changes can? Can we model cross-repo dependencies? The hard part about implementing this is in CI, but the question is about CR: Can we prevent changes that have unmerged dependencies from going in? This page calls it a "premium feature", does that mean it's not in the community edition? https://docs.gitlab.com/ee/user/project/merge_requests/merge_request_dependencies.html

- Does GitLab support private branches? A better workflow for security patches would be great (having CI, in particular)


BRANCHES / COMMITS

- When someone updates their code via the Gerrit method (no new commit/just amends) GitLab tends to force push and the previous changes aren’t easily accessible. How would we ensure we can see previous versions of the code and comments?

- In Github if someone makes a pull request and then later needs to rebase, all of the commits from the rebase will show up in the commit changes. For pull requests is there a way to disable rebase for users and instead suggest git merge master?

- The PR/MR model leads to dirty history, if we don't squash. It would be nice to automate/force squashing.

- Would we keep the current development model, trunk-based with deployment branches and release branches?


OTHER

- Although CI is not part of this discussion, would moving to GitLab allow repos to be able to use external CIs like Travis CI or Circle CI?

- GitLab has a built-in issue tracking mechanism. Would that be disabled or would there be a way to automatically send them to Phabricator? Issue tracking is not in scope, but it's important to make sure we do not end up with bugs being lost because they are being tracked on GitLab.

- We finally have decent integration of Gerrit into Phab. How hard would it be to create the same for GitLab?

- Does GitLab integrate with LDAP?

- How's GitLab's security? This doesn't sound so great https://devclass.com/2020/03/05/gitlab-provides-remedies-for-slew-of-potential-risks/

- The open core model is concerning, because it can lead to odd architectural decisions which put the company's needs over user interests and code quality.

- By the sheer virtue of changing something (anything), would some people become fixated on any regressions (perceived or otherwise), unnecessarily draining energy from everyone involved, including the foundation and the wider community? BPirkle (WMF) (talk) 16:58, 1 October 2020 (UTC)Reply

The MR/PR model is probably inevitable

[edit]
There's a point I've argued in conversation that I'm not sure has been articulated explicitly as part of this consultation, so I'll do my best to lay it out here.
Briefly: It seems likely to me that we're getting the PR/MR model whether we want it or not. My thinking is as follows:
  • The current status quo is not that everything lives on Gerrit. Per the "Why" section, it's Gerrit plus 150-odd repos on GitHub.
  • If we didn't have a requirement that things deployed to production be hosted on Gerrit, the GitHub number would almost certainly be higher.
  • If we don't provide standard code review & CI tooling that meets some basic expectations, projects and teams will continue drifting to other platforms.
  • Eventually, we're going to reach a crisis point with Gerrit. It'll be brought to us by one or more of:
    • Our ability to maintain a public Gerrit instance (already stretched to the breaking point in terms of people and resources)
    • The upstream health / responsiveness of Gerrit as a project
    • Pressure from developers and projects/teams to ratify the de facto migration away from Gerrit which is already underway
And at that point, my expectation is that we're going to wind up scrambling to adapt, locked into a fully-proprietary monopoly platform (GitHub) with little control over the decision, and cleaning up a few years' worth of additional fragmentation. We'd still be adapting to PR-style workflows and tooling, just less deliberately, not on our own terms, and at a greater remove from the path taken by other projects that share a great many of our values and concerns.
In thinking this through, it's also become clear that if we elect not to migrate away from Gerrit at this time, we're still going to have to spend substantial money and person-hours on the technical problems of our code review infrastructure. There's just not a viable option to do nothing here. (I specify "technical problems" because this consultation is first and foremost about improving an unsustainable software situation, not about whether our culture and priorities around code review need help. The latter is a very important question, but it is not the problem we set out to solve with this process.) BBearnes (WMF) (talk) 06:19, 2 October 2020 (UTC)Reply
In addition to the 152 GitHub projects you mention there are several additional GitHub organizations that contain repositories used in people's day-to-day. Not to mention the tools that exist under individual user accounts that folks are using for day-to-day work.
Many repos are created outside Gerrit because it's easier to create them elsewhere. Or easier to set them up elsewhere. Or easier to access them elsewhere. I, personally, don't put small projects on Gerrit because I don't want to think about where they fit in the giant hierarchy of things in Gerrit before I can even start on a README.
I am a Gerrit workflow fan, but I worry that if we don't address the real issues with Gerrit that we'll just end up slouching into whatever's easiest without regard for guiding principals or preserving workflows or CI or deployment or anything other than what's expedient. TCipriani (WMF) (talk) 21:33, 2 October 2020 (UTC)Reply
I'm slowly coming to the same realization, if for different reasons. We discovered that force-pushing to a branch leaves no record of the previous history. This is a dangerous situation because an accidental push could irreversibly destroy work and break auditing. If the branch is associated with a merge request however, the patchset comparison tools become available. We very much would want to use this workflow, since most of us have been conditioned by years of force pushing and I expect that we'll find ourselves continuing to do so. Adamw (talk) 12:31, 2 October 2020 (UTC)Reply
> Pressure from developers and projects/teams to ratify the de facto migration away from Gerrit which is already underway
Isn't this a social issue, in that teams are largely free to pick whatever code review platform they like to do their work -- similar to how different teams used different chat mediums, or I think in the not too distant past there were various combinations of Asana / Trello and perhaps other bug trackers in use by team. Similar to how in theory everyone is supposed to use phabricator to organize and document their work, there should probably be a similar effort to have people use the same code review tooling. Otherwise I could easily see, of all those repositories listed as being used on GitHub, the majority staying on GitHub since GitHub !== GitLab.
> Our ability to maintain a public Gerrit instance (already stretched to the breaking point in terms of people and resources)
My understanding is that GitLab is more complex to host and maintain, would it require fewer resources? KHarlan (WMF) (talk) 03:11, 3 October 2020 (UTC)Reply

Repository creation

[edit]

One of the three listed "Why"s is repository creation. I think it's undeniable that repository creation has friction today. But, is this relevant to the Gerrit discussion?

The status quo is that we have intentionally restricted in Gerrit the permission to create repos, and granted it only to a select few admins who respond to requests.

Personally, I think that's needlessly complex and bureaucratic. But, I think it's important to recognise that this was intentionally set up, and has little to do with the hosting platform. If we're confident that we can just document clearly what should and shouldn't be hosted on Gerrit, and that people will find and remember this and know how repos should be named etc, then we should just hand out this right to whomever we choose (e.g. starting by adding ldap/wmf to Project Managers).

I think it would be a mistake to change this policy implicitly as part of a hosting platform transition. The process implications of that would I think be worth seeing through on its own first. If a decision has already been made on this, can we document and work out the details of that, and apply it to Gerrit as well?

(The interface itself is not meaningfully different in GitLab, as it's a single input field with a button.) Krinkle (talk) 19:00, 2 October 2020 (UTC)Reply

But, is this relevant to the Gerrit discussion?

Yes.
The point that there are policy decisions involved is a good one, and it is likely that we could to some extent reduce the friction of creating new repos in Gerrit. I think that'd be effort worth making if we stick with Gerrit.
Nevertheless, Gerrit is designed for the enterprise, and giving each user their own namespace is non-trivial and fights the design of the system. Meanwhile it's built directly into GitLab's basic design.
I'm on the team that administers our Gerrit instance, and I have administrative privileges. There are fewer barriers to me creating a repo there than for all but a handful of other people who use the service. I never even consider it for anything less than long-lived official software projects with more than a couple of collaborators. For experiments, one-offs, pairings, presentation materials, and so on, I'm more likely to use any of my self-hosted Gitea instance, GitLab, or GitHub than I am the official code hosting / review platform for the organization I work for. I think that's pretty telling. BBearnes (WMF) (talk) 20:01, 2 October 2020 (UTC)Reply

Self-service continuous integration

[edit]

The second of three listed "Why"s is easy and self-service continuous integration configuration.

This has indeed been a point of friction for many years. This wasn't related to Gerrit, but rather because we didn't resource/prioritise setting up something that could securely run unreviewed code for installing arbitrary packages and running arbitrary shell commands.

Between 2013 and 2015 we invested in this. We got rid of the hardcoded Jenkins jobs, and instead defer all package and command selection to a file in the source repository/branch, just like Travis CI and GitLab. These files are package.json, composer.json, Gemfile. Just like Travis, the entry point commands are just "npm install + npm test" or "composer install + composer test". Fully self-serviced.

There are some cases where for security or performance reasons, where we bypass the base image and instead provision something custom ahead of time for a specific repository. I assume this will still be possible in GitLab, and would require similar effort either way.

From an end-user perspective, what is the difference?

(I do want to recognise that RelEng currently spend significant time maintaining the Docker base images that drive this. I believe GitLab has similar preset images, that would save RelEng time. However, the consultation lists ease of use for end-users. And, of course, changing the backend of CI to GitLab was already approved months ago and is out of scope here. Also, whether we can/should use GitLab's base images remains to be seen since I believe we generally prefer to match OS and package backports with prod.) Krinkle (talk) 19:30, 2 October 2020 (UTC)Reply

From an end-user perspective, what is the difference?
I might take issue with your characterization of our current CI as "Fully self-serviced". Only 19 people out of all users of Gerrit can fully setup CI without any help.
------------------
I'm just going to stumble through getting something running as an experienced person at stumbling through the process.
===GitLab===
  • Click "Set up CI/CD" on the repo
  • Click the "Apply a template" dropdown
  • Click "Commit changes" button
  • Jobs run in CI
===Current CI===
  • git clone ssh://gerrit.wikimedia.org:29418/integration/config.git
  • $EDITOR zuul/layout.yaml -- grep around for "npm" and find repos using "reponame"-npm-node-6-docker
  • git grep 'node-6-docker'
  • $EDITOR jjb/job-templates.yaml -- There's a job-template that seems to do what I want...ok...got to use that template -- I'll git grep for projects using that template
  • $EDITOR jjb/mediawiki-services.yaml -- there appear to be a lot of projects using the template I want here...maybe this is where I add my project:
    - {project: {name: 'tyler-test', jobs: {name}-npm-node-6-docker}}
  • So that should create the job, now I need to add the job to the repo
  • $EDITOR zuul/layout.yaml (again)
- {name: tyler-test, test: tyler-test-npm-node-6-docker}
  • Send for code review
  • self-merge (27 people can do this currently, including you and me)
  • Deploy the job (19 people can do this currently, including you and me)
  • Deploy a new zuul configuration (19 people can do this currently, including you and me)
This is the perspective when we already have the functionality to do something simple. As you mention adding new docker images (something only the same 19 contint-admins can do) adds complexity to this step. You need to add a new Docker image currently if you want to, say, install a library that your node project is using -- it's not uncommon.
I stumbled my way through to a working CI in gitlab without reading any documentation. I've been maintaining CI via zuul/jjb for 5 years and I still had to do a lot of grepping.
Once you've got your CI setup you can change things through the npm test entrypoint it's true, but this is different than what I mean by self-service CI. TCipriani (WMF) (talk) 20:47, 2 October 2020 (UTC)Reply
I did not find the CI self-service setup easy on the GitLab test instance. If you look at https://gitlab-test.wmcloud.org/translatewiki.net/repong/-/commits/master it took me four commits to get it working. I could not do it without looking for a working example from another repo in the test instance. Possible caveat is that on the actual instance the images might be unrestricted, so the premade templates would actually work.
I found no way to actually test the pipeline without committing it to the repo first. Now there are bunch of useless and broken commits there in the history. Even if there is a way to test before committing, it is definitely not obvious as I spend a lot of time trying to find it. Nikerabbit (talk) 16:23, 3 October 2020 (UTC)Reply

Did you consider git-hosting platforms not linked to commercial entities?

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Looking back, corporate-sponsored FOSS projects seem to be somewhat at risk of getting abandoned (Gerrit by Google Inc. not atypical?). Did you consider and evaluate "grassroot-movement", community-driven Open-Source alternatives like Gogs or Gitea, so that future development does not depend on a single commercial sponsor? Platforms like NotABug or Codeberg.org seem to prove that these are approaching maturity and scale easily to several thousand repos and would easily meet the requirements listed above. Have such alternatives been discussed and evaluated? 78.54.178.98 (talk) 16:15, 6 October 2020 (UTC)Reply

I'll preface this by noting that I use Gitea personally, and find it to be pretty good software. That said, though this consultation is specifically about whether or not to use GitLab for code review, we initially evaluated GitLab in the context of looking at alternatives for our continuous integration system, and that's still a problem we need to solve. Gogs/Gitea is essentially a lightweight replication of the GitHub-style code forge, not a platform with components like the full-fledged CI system that motivated us to investigate GitLab in the first place.
The shorter version of this answer is: Not really, but not for lack of awareness. BBearnes (WMF) (talk) 17:36, 6 October 2020 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Don't create new Github repos

[edit]

I don't really understand why some of code made by the WMF is not hosted on its git platform, actually Gerrit. So I hope that all reasons used for the exceptions (for example notebooks previews) will be solved for official Wikimedia Gitlab creation.

May it be possible to define a rule saying that all developments made during WMF employees worktime be made on our Gitlab instance exclusively? Of course excepting pull requests for improvements on external repositories hosted somewhere else. Framawiki (talk) 17:55, 10 October 2020 (UTC)Reply

Okay, but how is this related to the GitLab consultation...? AKlapper (WMF) (talk) 18:29, 10 October 2020 (UTC)Reply
When we have migrated from Subversion to git, we have selected Gerrit as the code review system. As part of the project we also had the repositories mirrored to GitHub https://phabricator.wikimedia.org/T37429.
Why? Well I am not quite sure, but most likely to open the possibility to submit a pull request via GitHub: https://phabricator.wikimedia.org/T37497 . At the time (2012), some wanted additional tooling to make it very easy to contribute. I would argue the complexity of the tooling and the reviewing workflow itself are more to blame as a barrier of entry rather than the tooling itself, but that is really just my point of view.
Before that subversion migration, we already had repositories on Github mostly for mobile applications:
And after the migration to git/Gerrit, we still had repositories created on Github instead of Gerrit. For example: Limn, a data visualization framework https://github.com/wikimedia/limn . Groups got created, people added to them and eventually more repositories have been created.
In short we do not have a policy to enforce Gerrit has the canonical code hosting place. Although anything touching MediaWiki on production is definitely on Gerrit (we do not deploy from GitHub hosted repositories), anything else is a gray area at the discretion of the team, and sometime due to technical limitations such as testing the IOS based applications.
The point you have raised to have a rule to exclusively host on Gitlab is covered on the consultation page:
  • What happens to repositories developed on GitHub if we move to GitLab?
    • Given that GitLab provides a very similar workflow and feature set, we will strongly encourage all developers to use GitLab instead of GitHub for all development. Repositories will still be mirrored to GitHub, for visibility purposes.
So essentially the same situation: still mirroring and GitHub is not explicitly forbidden. Then given Gitlab and Github have essentially the same workflow, one can imagine that repositories might want to migrate from GitHub to Gitlab unless they rely on tooling which is only available at GitHub (such as issue tracker, see https://github.com/issues?q=is%3Aissue+org%3Awikimedia for currently opened issues on the Github organization). Antoine "hashar" Musso (talk) 09:39, 12 October 2020 (UTC)Reply

Gerrit is multi-site and its implementation is open-source

[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


What is written in the conclusion is not accurate, with regards to the multi-site capabilities:

"We are unique in the community of Gerrit users which include large companies such as SAP, Ericsson, Qualcomm, and Google. Google, in particular, is singular in their use of Gerrit for projects like Android and Chromium. To support these large, open projects multi-site capabilities are needed; however, much of that work is either closed-source or does not support multi-site writes".

If you follow the multi-site link you will see that the multi-site plugin is open-source, supports multiple writes from all sites and is able to prevent split-brains.

GerritHub.io has been multi-site for over one year. Lucamilanesio (talk) 21:55, 26 October 2020 (UTC)Reply

I even wrote a task on phabricator about this: https://phabricator.wikimedia.org/T217174 Paladox (talk) 21:57, 26 October 2020 (UTC)Reply
Looking at the README on the multisite page the quote on the page is, "Currently, the only mode supported is one primary read/write master and multiple read-only masters but eventually the plan is to support multiple read/write masters."
We currently have one main gerrit sync'd with a replica. It seems that the multi-site plugin will not currently support two gerrit's being written to simultaneously without partitioning: is that accurate? TCipriani (WMF) (talk) 22:09, 26 October 2020 (UTC)Reply
Seems this has been fixed with https://gerrit-review.googlesource.com/c/plugins/multi-site/+/285782 Paladox (talk) 22:27, 26 October 2020 (UTC)Reply
That comment on the README.md is stale and misaligned with the DESIGN.md. The multi-site plugin supports multiple sites in read/write and correctly prevent split-brains in case of two users pushing concurrently on the same repo on the same branch from two remote sites on the globe.
I have addressed the stale comment with a change for review, thanks for pointing that out :-)
With regards to the problems in migrating to newer versions of Gerrit, I do recognise that it has been difficult until v3.0. You guys are not far from the "tipping point" and I would be more than happy to help, as I did with the Eclipse Foundation and I am doing with the OpenStack project.
Also, the multi-site setup, allows Gerrit canary deployments because supports, from v3.0 onwards, different sites with different versions of Gerrit (typically the version +1).
Since the introduction of multi-site on GerritHub.io, we went from 99.9% uptime to > 99.99% uptime, and never declared a "planned outage" for any of our upgrades.
I would be more than happy to help the Wikimedia Foundation to get there as well.
Luca. Lucamilanesio (talk) 22:28, 26 October 2020 (UTC)Reply
Hi @TCipriani (WMF) the multi-site README.md has been updated, thanks for the reviews. Can you also update the relevant section in the GitLab consultation? Thanks a lot for pointing this out.
Luca. Lucamilanesio (talk) 22:49, 28 October 2020 (UTC)Reply
Done. Thanks for the update. TCipriani (WMF) (talk) 13:46, 29 October 2020 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.