Talk:GitLab consultation

Jump to navigation Jump to search

About this board

Workflow documentation / tutorial

1
BBearnes (WMF) (talkcontribs)

Based partly on docs from other projects (e.g. KDE) and on discussion here, we've started roughing out some documentation at GitLab consultation/Workflows. Aiming for something similar to Gerrit/Tutorial with more workflow guidance. It needs much more work, but figured I should link what's there now.

Reply to "Workflow documentation / tutorial"
KHarlan (WMF) (talkcontribs)

A few observations from discussions here and on mailing lists:

  • Gerrit is a (mostly) known thing already (as in -- we know what it looks like, how it functions, what the strengths and weaknesses are, although we could debate them)
  • Not a ton of people use GitLab in their daily workflow, so it's harder to make judgments about UX, strengths and weaknesses for code review, etc.
  • A lot of people use or have used GitHub and are familiar with its code review model, the UX, etc
  • It seems that many people assume (maybe not correctly?) that GitLab would be more or less just like GitHub

Given the above, it seems like it would be useful to have some small pilot projects that use the test instance to help us better assess how well GitLab's code review tooling would work for us. So far https://gitlab-test.wmcloud.org/explore shows 1 merge request among the repos that have been created. I think if we (i.e. collectively most everyone who has commented or cared about this proposed move, especially those of us who are skeptical of switching to GitLab) tried to use the test instance for a few tasks, for core or an extension, it would provide a lot more concrete information on the strengths/weaknesses of using it. @EGardner (WMF) and @ATomasevich (WMF) did a similar workflow with GitHub, while working on the Vue port of MachineVision; you could make your merge request in the GitLab test instance for a feature / bug you're working on, then at the end clean it up and resubmit to Gerrit for a final merge. It's duplication of work but could provide us useful experience IMHO.

In theory it wouldn't be too hard to have quibble running on merge requests for core or extensions, I don't think.

Anyway, curious to hear what others think about doing something like this, and how we'd go about organizing it.

BBearnes (WMF) (talkcontribs)

This seems like a great way to surface more of the experience. I'd be happy to try it out for stuff we're working on within RelEng, though I'm guessing it'd be more useful for folks who work directly on MediaWiki and extensions to trial some code reviews.


Let me know if I can help in any way from the administrative end, or if bugs crop up in the configuration / setup.

EGardner (WMF) (talkcontribs)

I second this proposal, and would happily volunteer to be a guinea pig at some point in the future (assuming I'm working on a feature amenable to this approach). There are aspects of Gerrit I appreciate, but it was nice to return to the more familiar GitHub/GitLab workflow for the MachineVision project. For one thing, I think that the tools GitLab will provide for actually browsing and reading through code are much better than what Gerrit offers. I have never been a fan of the Gerrit / GitTiles integration, and generally rely on GitHub mirrors when I actually want to navigate a codebase.

MarcoAurelio (talkcontribs)

Would you actually have to pay for GitLab even if we use our own servers to host/run it? If so, how much? Do we have to pay for Gerrit?

KHarlan (WMF) (talkcontribs)

@MarcoAurelio the proposal would be to use the test instance we already have set up https://gitlab-test.wmcloud.org/ . AIUI we don't pay a license for GitLab as we are using the community edition, and neither do we pay a license for Gerrit. The time/resources needed to maintain the Gerrit/Jenkins/Zuul setup is probably another story though :) (and I have nod idea how that would compare with maintaining code review + CI tooling for GitLab)

MarcoAurelio (talkcontribs)
BBearnes (WMF) (talkcontribs)

So, a rough proposal for how to proceed here:

  1. We add fresh clones and at least basic CI on gitlab-test for:
    1. mediawiki/core
    2. some appropriate extension
  2. Interested parties use these for reviewing a handful of real changes and experiment with workflow

It seems like there are a couple of broadly viable workflows:

  1. Developer pushes to a branch on the primary copy of the repo which then becomes a merge request
  2. Developer forks the primary copy of the repo to their account, pushes to a branch on their fork, and creates a merge request from there

In practice #1 might be the default for known, trusted contributors who have write access to the repo (the equivalent of +2 rights), while #2 might be the path for new volunteer contributions. (More fine-grained distinctions are possible with roles, but I haven't explored how that works in depth yet.)

Within those broad approaches, we've identified some unanswered questions:

  • How should we configure merge commits?
    • Merge commit for every merged MR
    • Merge commit with semi-linear history
    • Fast-forward merge, no merge commits, rebase required if no FF is possible
  • Should multi-commit MRs be squashed into a single commit by default?
    • Should this be mandated or optional?
  • Matters of culture/convention
    • How granular should MRs be in general?
    • There's a feature that allows pushing changes to a MR if you're allowed to merge, even if it's on another user's fork - echoing the ability in Gerrit to update someone else's patchset. Worth enabling by default?

I can imagine most of those answers varying by project/team, but they seem like the sort of thing that's worth exploring on the test instance.

We got a basic CI pipeline with Quibble working for mediawiki/core earlier. If this seems like a fruitful way to proceed, we can start with a clean copy of the repo and do the same for an extension.

Thoughts? Volunteers to work through a few changes on gitlab-test? What would make a good extension to use for this exercise?

Greg (WMF) (talkcontribs)

Thanks Brennen.

Given this is an important part of the conversation and we want people to engage and test out and help us determine how we can make GitLab workflows work best for us, I've cross-posted this on the old talk page's "feature requests" thread.

Adamw (talkcontribs)
BBearnes (WMF) (talkcontribs)

We've started roughing out some workflow documentation at GitLab consultation/Workflows. At the moment it's much more basic than I'd like, I think partly because it's hard to make preemptive recommendations about optimal workflows for people coming from Gerrit without more experimentation.

Reply to "Try before you buy?"
Lucas Werkmeister (WMDE) (talkcontribs)

I tried to replicate a Gerrit-like workflow, where you force-push new versions of a single-commit change, in mediawiki/core!5 on the test instance. Since the test instance might go away, I’ll copy/summarize my experience here. (Also, I didn’t yet know how comment threading in GitLab works, so you can watch me figure that out in real time over there ^^)

I like that there is a “compare with previous version” link, like a PS(n-1)..PS(n) diff in Gerrit, which is something that GitHub doesn’t have, as far as I’m aware. To me, this means that a force-push-based workflow is at least not completely unfeasible.

That said:

  • I don’t see any way in the UI to compare a larger range of pushes, say PS2..PS5 (supposing that PS2 was the last patch set you looked at). In fact, so far I haven’t even found a way to do this by hand-editing the URL.
  • The commit message is not considered part of the commit (unlike the COMMIT_MSG “file” in Gerrit), so the second force-push, where I removed a Change-Id from the commit message, shows up as “no changes”.

Overall, my impression is that force-pushing is doable, but likely not the best way to go, or not how GitLab “wants” you to do it. I’d like to try out other workflows as well, e. g. a “squash everything on merge” one.

Adamw (talkcontribs)

This is a helpful investigation, good idea!

My understanding is that rewriting git history is not recommended anywhere outside of the Gerrit workflow. Of course, rewriting is fine locally but as soon as the changes are pushed to a public repo and potentially downloaded by others, rewriting causes problems. I agree that if we were to switch to Gitlab, we should also change our workflows to the natively supported ones rather than try to make our old workflow fit the new system.

Skizzerz (talkcontribs)

Pushing each revision as a new commit, while squashing everything on merge, is a replication of the Gerrit workflow:

  • Gerrit patch sets = GitLab commits
  • Merging a Gerrit patch = Squash merging a GitLab pull request

In both cases, you have the ability to review across "patch sets." No force-pushing is needed, and it's in my opinion a good thing to get rid of it as rewriting history is generally a Bad Idea™ in git, and the requirement to force-push is one of the primary reasons I hate the Gerrit workflow.

Nikerabbit (talkcontribs)

I have never force-pushed with Gerrit. What are you talking about?

Adamw (talkcontribs)

Force-push is the Gitlab equivalent of pushing amended patches for review in Gerrit, but yes to be most accurate we can call the Gerrit workflow "rewriting history".

Nikerabbit (talkcontribs)

Even the merge request workflow is rewriting the history if you use the squash or rebase option.

TCipriani (WMF) (talkcontribs)

The comment that the commit message is not a part of the merge-request is a good note.

I don’t see any way in the UI to compare a larger range of pushes, say PS2..PS5 (supposing that PS2 was the last patch set you looked at). In fact, so far I haven’t even found a way to do this by hand-editing the URL.

FWIW, I have that the changes tab has versions drop downs: https://gitlab-test.wmcloud.org/mediawiki/core/-/merge_requests/5/diffs?diff_id=19&start_sha=8c643bedd7f9e191c8041d3eee682075b4803040

Another example is a https://gitlab-test.wmcloud.org/release-engineering/tools-release/-/merge_requests/2/diffs?diff_id=11 where I added some code review to the mix.

I liked that the comparison between mainline and each revision retained comments on specific code lines even after a force-push.

Nikerabbit (talkcontribs)

For me as a developer, this workflow question is the most important one.

I been playing around with the GitLab test instance today. I really like it... up until to the point I tried to get actual development done.

It is not practical to properly chunk out big changes properly for review with GitLab without history rewriting (i.e. force push). My requirements are:

  • chaining dependent changes together
  • independent iteration of those changes based on code review
  • approving and preferably merging changes independently

Gerrit deals with the problem by having append-only patchset versions and nice way to link them together (no force-push needed).

GitLab seems to primarily support doing it at merge-time. The UI has very limited support for it (only squash and rebase or merge commit). Full power of git rebase -i is needed. If we only allow history rewriting during merge-time, we have multiple issues: the person doing the merge now has the burden to rewrite the history, which in Gerrit is done by the author (except trivial rebases), and they must be able to push directly to the target branch (this would likely be disabled for master branch) and thirdly, the worst thing in my opinion, the result would not be reviewed. All this goes away if we allow history rewriting and force-push updates to the merge request (with sufficient UI support so that reviewers can do their job).

Things get even worse if we also want to independently merge the changes. This means they have to be split into multiple merge requests. I found no support for rebasing merge request chains. In my testing you either have to work backwards and merge latest merge request to earlier one first (frankly, nobody does that), or doing manual force-push for each merge request. Gerrit provides a handy button that does it for most of the simple cases. When you merge a first merge request from a chain in GitLab, the UI gives up. You can change the merge target manually (away from the now [hopefully] non-existing target branch) in the UI, but then, if you squashed or rebased the first merge request, it forces you do do a manual rebase and and force-push to update the subsequent merge requests.

GitLab seems a good solution for work that produces reasonable sized merge requests. However, the suitability of GitLab for me depends on its support for non-trivial history rewriting, which is required to develop and review work that consists of chained changes. Moreover, the only way to do that in GitLab is force-push to a merge request. So the suitability of GitLab essentially hinges on its support for force-push type of workflow.

Lack of a good support for such workflow would, in my opinion, lead to:

  • messy histories with irrelevant and untestable commits, and/or
  • slower development and code review burnout due to difficult to review mega-changes, and/or
  • slower development due to having to wait for author or code-reviewer to complete previous steps before being able to proceed.
Nikerabbit (talkcontribs)

As an update to my previous comment, the force-push workflow does look promising to me. Discussions and comments do not disappear as if they did not exist – they are just marked that they are based on an outdated version of the merge request. I kind of even like this more than Gerrit, where, if you need to iterate, it's unclear whether to continue the discussion in the old patchset, or start a new one in a new patchset. In GitLab you would logically continue the existing discussion until it is marked as resolved. There is even an option to require resolving all discussions before merging. Also the navigation through issues is better than in Gerrit, imho.

As pointed out above, there is an UI to compare different versions of merge requests. I didn't use it as much, so it is not clear whether it is as intuitive and powerful as Gerrit's version. For example, I am not sure yet whether it shows what differences are due to a rebase. I need to explore this more.

I currently think that I could adopt my workflow to GitLab, but I would really miss one feature from gerrit: the rebase button with an option to choose a target. Without it, working on chained merge requests is cumbersome.

Tim Starling (talkcontribs)

I prefer chronological work branches. I think "commit --amend" is an awkward workflow because it leaves you without a local record of your change history. Storing a history of changes seems like a core feature of a VCS, but Gerrit seeks to emulate this with server-side comparison across patchsets. So I'd rather see forced pushes used only for rebases and niche applications rather than importing the complete Gerrit workflow into GitLab.

It's true that instead of having two levels of prospective change management (patchsets and topic branches) we will have one: potentially enormous merge requests of feature branches. I don't think that's infeasible. Simplifying the model in this way is what developers were asking for in the developer satisfaction survey.

AKosiaris (WMF) (talkcontribs)

For what is worth, I 've mostly given up on chronological work branches. I just commit early and often and then very liberally and aggressively edit git history (mostly with interactive git rebase and heavy use squash/fixup/ammend) to produce a set of series of commits that just makes sense (to me) at least as logical unit. Then push for review.


One thing that I would need force push for would be if I realized I got some typo somewhere after creating the MR or realizing I 've made some bad decision when grouping the commits. Not the end of the world if I couldn't do it, but it would be nice to be able to fix them nonethless.


As far as topic branches go, I think that Gerrit topic branches can be in some cases (not sure how many) replaced by gitlab labels (at least I managed to do so in my limited test and for the workflow I was testing).

SSastry (WMF) (talkcontribs)

Pre-gerrit, I was strongly averse to the use of --amend and non-chronological commit history, but gerrit effectively changed my habit on that (once you start using --amend to fix problems based on review, you start using rebase more heavily if you a chain of gerrit patches and reordering commits now only feels normal and natural).

So, in the gerrit workflow, it makes sense, but I imagine once we transition to gitlab, we are more likely to move back to chronological commits (assuming a chain of commits in a pull-request are squashed pre-merge). But, anyway, I imagine we'll have to reconfigure our workflow habits in a gitlab world.

Reply to "Workflow report: force-push"

Packaging all changes under one commit

6
Sohom data (talkcontribs)

One thing that I've really liked about Gerrit is the git commit --amend && git review -R workflow, where even though I've made multiple changes based on comments, I can look back at a particular commit and identify exactly what was changed and which file were affected in a particular change without having to even get out of the current context (command line, Gitiles, Github). The PR workflow on the other hand allows (and in fact encourages) users to make multiple commits for one PR, where one change can get spread over multiple commits, some of them affecting other files as the code review progresses (and/or the user realizing any mistakes they have made). This can make it sort of difficult to isolate particular changes since I don't think we will have much context as to which PR a particular commit relates to outside the Gitlab instance.

Adamw (talkcontribs)

Please see the "squash merge" comments elsewhere on this page, I think it offers a compelling substitute. At the same time, it makes it easier to follow the evolution of a larger patch by looking at the original feature branch.

Nikerabbit (talkcontribs)

Aren't feature branches usually deleted after they are merged?

BBearnes (WMF) (talkcontribs)

Feature branches are typically deleted after a merge (there's a checkbox for this on the merge request in GitLab), but their history is still viewable on the merge request itself. See here for an example.

Nikerabbit (talkcontribs)

That's only visible in GitLab itself, right? If so, that's a less than ideal solution for the case of doing squash merge on big changes. For example, you cannot git bisect them.

CAlbon (WMF) (talkcontribs)

Squash on merge is wonderful. You lose a little detail but you definitely get a similar behavior. Without squash on merge I would have thrown my computer out the window many times looking for a particular feature change.

Reply to "Packaging all changes under one commit"

Lowering the barrier to contribution is part of the mission

9
CAlbon (WMF) (talkcontribs)

I believe reducing barriers to contribution (which Gerrit is, despite its great parts) is worth any potential productivity impact. Building an industry best practice replacement for ORES would be a lot faster if we used closed-source industry tooling like AWS or SaaS model management systems. However we don't because how we build things is itself a reflection of our mission: open source tools because we believe in open knowledge.

Similarly, reducing the barriers to volunteer contribution is also a reflection of our mission. It might reduce productivity, but I believe that is just the price we pay to make Wikipedia as open (to contributors, to lurkers, to students, etc) as possible.

EMcnaughton (WMF) (talkcontribs)

I think it's probably variable whether switching from gerrit to gitlab would have a productivity cost. I see gerrit as reducing my productivity whereas your comment implies others see it as increasing theirs. (I currently use github, gitlab and gerrit on a daily basis)

ESanders (WMF) (talkcontribs)

Indeed, we have to balance the cost of anything we do here against the effect it has on staff productivity. We wouldn't, for example, tell every member of staff to spend 50% of their time on volunteer outreach.

Mutante (talkcontribs)

Switching from gerrit to (any other system) will definitely have a high productivity cost (at first). Any change of an existing UI will slow down development overall while users have to spend time on learning a new system. The question is just how long that slow-down takes and at which point advantages of a new system are starting to outweight that impact.

ESanders (WMF) (talkcontribs)

The cost in not just learning a new UI (many will already be comfortable with GitHub/Lab UIs already), it's a different workflow. See Topic:Vpbonspnaaafjdwq.

Greg (WMF) (talkcontribs)
This post was hidden by Tacsipacsi (history)
Sumanah (talkcontribs)

Indeed - I second the general point that easing the barrier to entry for new code contributors is important and is part of the mission.


(I came across this consultation via the fediverse and wave hello and best wishes to my past colleagues! (I used to work at Wikimedia Foundation, and was one of the people who helped with the Subversion-to-Git transition.))

Abbasidaniyal (talkcontribs)

I was an outreachy intern for the round 20 and I would like to share my experience with Gerrit.

Initially the project (gdrive-to-commons) for which I was selected was hosted on Github. My first task was to migrate it to Gerrit. I had no prior experience with Gerrit and the whole UI and workflow seemed to be very strange initially.

However, When I went through the Gerrit documentation and followed the tutorial, it took me just a couple of hours to understand the workflow and get accustomed to the UI.

After using Gerrit continously for 3 months, I have developed a fondness for the platform. I really like the code review procedure, in many ways, I even find it supreior to Github. For example, Unless commits are squished before merging, it is a mammoth task to trace back the changes made on a Github repository. The Patch Set based review system seems really intuitive. Every patch set has comments which help in understanding what change was made and why. I also believe since there is no concept of forking the repository, It does save a lot of memory and that makes a lot of sense to me.

Personally, I would like to WMF to continue with Gerrit. I don't think Gerrit creates a barrier to volunteer contributions at all.

Reply to "Lowering the barrier to contribution is part of the mission"
Kizule (talkcontribs)

Gerrit is much less complicated for me, but Gerrit will be replaced with GitLab, even if many peoples don't think that this should be done (I believe).. Eh, now, will we use and still Zuul/Jenkins for CI tests, as GitLab have this features?

Jdforrester (WMF) (talkcontribs)

A big part of the motivation in moving to GitLab is to replace the current Zuul/Jenkins infrastructure with GitLab CI, yes.

GitLab CI is a really great system, and is one of the things I'm really positive about.

SBassett (WMF) (talkcontribs)

@Jdforrester (WMF) - I'm not certain that assumption is true. Not that GitLab CI is really nice - it is - but that we would be using it within an initial deployment of GitLab.

Jdforrester (WMF) (talkcontribs)

One of the three main grounds listed is

easier setup and self-service of Continuous Integration configuration

… so I assumed that was still the case, but fair point. It indeed later says:

For the avoidance of doubt, continuous integration (CI) and task/project tracking are out of scope of this evaluation.

Greg (WMF) (talkcontribs)

To reduce the cost (time) of migration, we do not plan to write CI-glue-ware between GitLab and Zuul/Jenkins and instead use GitLab's built in CI for any repositories moving to GitLab for code review.

Kizule (talkcontribs)

Thanks for responds! I'm just playing with GitLab on test instance, so I honestly changed my mind, and I agree that this should be done, and it's great that most things will be in one place.

Reply to "Zuul"
Amire80 (talkcontribs)

Are Gerrit's interface usability deficits the real reason for the dissatisfaction with Gerrit?

There is something about Gerrit that I haven't seen discussed explicitly here, and I'd like to mention: it's the alienation between Gerrit and the wikis.

For most wiki editors, their home is wiki pages. Although it's managed by the same foundation and has "wikimedia" in its URL, Gerrit is a different website. It has a completely different interface from the wikis, even before we discuss whether this interface is good or bad. It requires a separate username, and to send even one line of code, you need to set up SSH keys, install, configure, and run lots of command line tools, clone the repo, and then, if all of that wasn't difficult enough, you have to go through the worst part: code review, which is often infinite.

And that's why many editors on Wikimedia wikis don't even want to hear about Gerrit: it's not a wiki. At Wikimania 2011, there was a talk about The Site Architecture You Can Edit, and as far as I can recall, Gerrit is the thing that was presented as the solution. Did it actually allow a site architecture that anyone can edit? Not really. It probably made it more accessible, but not really for everyone. On a wiki, editors can publish a template, a module, or a gadget, and it gets deployed immediately. Sure, they can break stuff along the way, and they can get reverted, but it's incomparably faster and more familiar than going through Gerrit.

By itself, migration to GitLab doesn't solve any of that.

And sure, GitLab is more popular, and it looks much more similar to GitHub, which is the most popular thing, so it will probably be more welcoming to experienced developers. It's also quite possible that GitLab is easier to integrate with CI tools. The problem is that these are things that interest people who are already experienced with the Git and deployment work, but by themselves they won't make it much easier for members of the editing community to update the code or the server configuration, and the gap will remain, with content editors and developers of on-wiki templates, modules, and gadgets on one side, and developers of core, extensions, and site configuration on the other side. There are some people who participate on both sides, but not enough.

Moving all PHP and JavaScript code to being stored on wiki pages is probably not the solution, and I'm not proposing it. Integrating version control with wiki pages, as we had in the SVN era, is probably not so important either. But changing the deployment policies and allowing the deployment of at least some code much more quickly and with less strict code review, or with no code review at all, can go a long way to making Wikimedia projects editors feel that it's worth the effort to learn these tools and commit code. Learning Git and everything else and then making a code commit that will get forever stuck in code review is not very attractive.

This change will probably have to be more social and administrative than technical. Can such a change happen along with the move to GitLab?

AKlapper (WMF) (talkcontribs)

Hi @Amire80, thanks for joining the conversation.

I guess every tool and workflow has its pros and cons, and it's unfortunately a bit unclear to me what you might propose.

Personally speaking, trying to lower the learning curve and obstacles is something I'm interested in. We have quite some areas (Toolforge, Pywikibot scripts, modules, templates, gadgets, user scripts) in which folks can already immediately see results; with the costs being no code review and no quality assurance.

Regarding the current situation, https://meta.wikimedia.org/wiki/Small_wiki_toolkits intends to provide all info needed if you are technically interested and want to help your communities plus broaden your skills. If you spot something missing, please join the discussion there.

Editing wiki pages is for sure "incomparably faster", however being fast also needs to be in balance with the goal to provide a stable working website. As you already wrote, some editors can edit a gadget or site js and the immediate deployment "can break stuff along the way". Numerous times, default on-wiki code has loaded external stuff and hence violating user privacy. There is a number of tickets in Phabricator about broken gadgets after some other folks had copied code (instead of loading it) from some other wiki many years ago and then it rot, so it broke at some point, etc., and folks don't know how to fix it. When I last checked in June 2019 in phab:P8687, out of those 509 public wikis which had gadgets and/or a page MediaWiki:Common.js, 242 wikis had zero people editing these files in the last 12 months. (Random examples: fa.wikiquote.org had 56 gadgets and zero editors on them. ur.wikipedia.org had 235 gadgets and four editors on them.) What I want to say here is probably: Making it easier for folks to deploy more and more code while a lack of maintenance might appear at some point doesn't solve a problem in the long run, to the contrary. And that a lower barrier for deploying code might solve some problems but also creates new ones.

https://integration.wikimedia.org/ci/job/audit-resources/ would also list dozens of broken MediaWiki:Common.js files across our wikis if that job wasn't currently broken. (For background, see phab:T71519.)

Tickets like phab:T71445 were created to discuss a code-review process for MediaWiki JS/CSS pages on Wikimedia sites. In my very personal understanding such things need to happen first before thinking in potential "on-wiki vs Git repos" terms (if I understood your post correctly, I'm not sure).

As you mentioned "allowing the deployment of at least some code much more quickly and with less strict code review" I'm wondering if you have some thoughts or changes in mind about wikitech:Backport windows.

Also, could you provide a link to a specific example of a code change which you think could have taken place with "no code review at all"?

Amire80 (talkcontribs)

Let's start with the gadgets that aren't edited by anybody: They aren't edited by anybody for one of the following two reasons:

  1. Because they are stable and don't need any edits.
  2. Because on that wiki there is nobody who has the skills to edit them.

If it's number two, this means that they were copied from another wiki. Let's follow your example, and think of MediaWiki:Gadget-HotCat.js in fa.wikiquote: Somebody at some point thought that fa.wikiquote needs that gadget, and copied it. Why did HotCat have to be copied, while RevisionSlider didn't have to be copied, and works everywhere? Because HotCat is a gadget and RevisionSlider is an extension. No one "edits" RevisionSlider on any wiki, neither fa.wikiquote nor de.wikipedia, because it's an extension. There's nothing fundamentally different between RevisionSlider and HotCat—both provide some extra functionality to users. They are different only in how their code is stored and deployed.

If we had a global repository of gadgets, HotCat could be stored globally, and automatically deployed to fa.wikiquote. No one would have to edit the gadget on fa.wikiquote.

But you know what's the craziest part? In practice, we already have a global repository of gadgets. Gadgets can be stored on another wiki, e.g. Commons, and loaded using mw.loader.load() and a URL, as it is already done with HotCat and a few other gadgets. So, HotCat is a huge gadget, with thousands of lines of code and tens of thousands of users on the English Wikipedia alone, and many thousands more in other wikis. Technically, any Commons admin, of which there are more than 200, can edit it without any more code review. I heard that in the particular case of HotCat there's a convention that every edit is supposed to be checked by someone, but this is a social convention and not a technical constraint. And for some gadgets there is no such convention, and they can just be edited. And there are even some user scripts that aren't stored in the MediaWiki space, so anyone can edit them, and not just admins.

Why is HotCat a gadget and not an extension? No particular reason. Just history. I asked some people involved in its development, and they all said that it can be converted to an extension, and no one bothered to do it. And probably no one really feels that it's necessary, because the current way to develop it, as a wiki page, is good enough. Besides, converting it to an extension has some disadvantages: code review will get harder and changes that do pass code review will be deployed after a week and not immediately.

You also asked about Backport windows (formerly known as SWAT). Backport windows are not fun. You have to be at a certain (online) place at a certain time, connect to IRC (IRC!!!!!! In 2020!!!!!!), wait for your turn, and then wait a few minutes more for scap and all that stuff to run (at least that's how it was a few months ago, last time I did it; maybe things changed since then). It's far, far more difficult than editing a wiki page and pushing "Publish".

Does bad code sneak into gadgets sometimes? Yes, it does. If it was intentional and malicious, the user who did it is usually desysopped or blocked. But bad things can also sneak into code on Gerrit. Does it happen much more often with gadgets than with Gerrit? I honestly don't know, I didn't count. But consider this: reverting bad code in gadgets is actually much easier and faster than reverting code that is deployed through Gerrit—yet again, you just edit a wiki page and push "Publish". No code review, no scheduling a backport window, no waiting for scap. In any case, what's really important is that the possibility of breaking things shouldn't be the top reason to stop every conversation that compares the easiness of deploying code immediately to the difficulty of going through long code review and waiting for several days to deploy.

There is a lot of code that could be deployed immediately after merging. Dozens of extensions could be like that. Just looking at Special:Version: CodeEditor, WikiEditor, Score, SyntaxHighlight, WikiHiero, TimedMediaHandler, WikimediaMessages, and many others. Some of them are more complex than HotCat, but not by orders of magnitude. I can even imagine some parts of more complex extensions, such as VisualEditor, ContentTranslation, or Echo, that could be deployed more quickly.

So, my particular proposals:

  • Reduce the deployment train frequency from once a week to once a day or even more.
  • Change some repositories' configuration so that their code would be deployed immediately after merging, without waiting for train or backport window.
  • Give merge rights to many more people who are trusted editors on Wikimedia projects, at least on some repositories.
  • Allow self-merging in some repos (both technically and socially).

Most of these things will require some development work, but for all of them, someone first needs to decide that they are desirable. I mean, someone decided about eight years ago that code will be deployed every week; is that decision still good in 2020? Who can decide that it should be more frequent?

And most of these things can probably be done with Gerrit, without waiting for the migration to GitLab. As I mention already, I totally realize that these thoughts are not really about the migration from Gerrit to GitLab, but about something else. They are about rules and power structures. However, I am bringing them up here because I want to draw everyone's attention to the notion that by itself migration to GitLab will at most resolve some technical issues that the CI people have, and it won't resolve the bigger issues that the Wikimedia editors' community has with the way we manage and deploy source code.

Reply to "The Heffalump in the room"

Integration: Merge requests and patchsets

18
TCipriani (WMF) (talkcontribs)

One aspect of a migration to GitLab that has been touched on in other discussions is that of integration.

Integration is the process of combining a new piece of code into a mainline codebase. Our mainline codebase under Gerrit and, presumably, under GitLab will be to a mainline branch: a shared sequence of commits that record changes.

The mechanism by which code is integrated under Gerrit is a patchset. A patchset is a single commit that represents the difference between itself and a target branch, typically the target branch is the mainline branch. The mechanisms of combining patchsets with mainline vary by repo but a patchset may either be merged (creating a merge commit) or fast-forwarded (no merge-commit) meaning the patchset is the only commit added to the target branch. In addition to a git commit SHA1, each patchset has a "Change-Id" which is an ID that is searchable in Gerrit and points to a given patchset. Patchsets may be chained. When one commit is the parent of another commit pushing those commits to Gerrit creates a chain of patchsets. The child patchsets may not be merged independently without the parent patchsets having merged. The mechanism to update a patchset is to push a new commit (or chain of commits) with the same Change-Ids as those patchsets you're wishing to update to the special refs/for/[target branch] reference.

The mechanism by which code is integrated under GitLab is the merge request. Each merge request consists of a source branch and a destination branch. The source branch contains 1 or more commits not contained in the destination branch along with a description of the intent of the merge request. The mechanism of combining merge requests is a combination of the repository settings and the merge-requests settings. A merge request may either be squashed (creating a single commit) or each commit may be left seperate. Additionally, a merge-request may be combined by merging (creating a merge-commit) or fast-forwarded: adding the commits in the merge request to the tip of the mainline branch without creating a merge commit. The mechanism to update a merge request is to push a new commit to the source branch or to --force push to the source branch. Generally, force pushing a source branch is not recommended as review discussion may become confusing.

The branching pattern most frequently used with merge-requests is feature branching; that is, putting all work for a particular feature into a branch and integrating that branch with the mainline branch when the feature is complete.

The branching pattern most frequently used with patchsets is what Martin Fowler has termed continuous integration with reviewed commits. That is, there is no expectation that a patchset implements a full feature before integrating it with the mainline branch, only that it is healthy and has had its commits reviewed.

The branching pattern is not necessarily tightly coupled with the tooling, for example, a merge-request could be created with a single commit that itself does not implement an entire feature: this is a merge-request that is not using feature branching. Each tool does, however, encourage using their respective branching mechanisms.

There are various aspects to explore here:

  1. Workflow/branching pattern changes
  2. Review changes
  3. Integration frequency changes
  4. Necessary tooling changes
TCipriani (WMF) (talkcontribs)

Per working group discussion I've added a few definitions from this topic to the main page.

GLederrey (WMF) (talkcontribs)

For context, the previous discussion in Topic:Vpbonspnaaafjdwq is probably relevant here. It describes some of the current use cases about strings of patches and potential ways of having similar workflows in Gitlab (but it looks like currently there isn't an obvious way to implement a similar workflow).

Adamw (talkcontribs)

The way I've seen this work in Gitlab and Github is that CI tooling will automatically check the head of the branch. You can choose to test the head with or without a rebase. If three patches are submitted at once, only the final patch is tested. If a new patch is submitted as a test is running, that test is canceled and the new patch is tested.

Testing can also be configured to gate the branch merge to master.

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

BBearnes (WMF) (talkcontribs)

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

This emphasizes a really important point here: Should every commit to the master / main branch represent a known-good, deployable state (as far as we're capable of achieving that)? We do our best to achieve that currently, at least on repos like core, which does seem like it militates in favor of squashing commits by default.

EBernhardson (WMF) (talkcontribs)

To me squashing depends on what those patches were. If the patch chain is the following, it should probably be squashed:

Primary feature implementation -> typo in Foo.php -> correct comments in Bar

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting:

Refactor to allow implementation -> Implement feature

Jdforrester (WMF) (talkcontribs)

Unfortunately, in my experience, in the real world in systems where force-rewrite of open PRs isn't available (most FLOSS GitHub and GitLab repos), people end up mushing multiple feature commits and fixups into the same chain.

A 'simple' example, with committer shown in square brackets ahead of each commit:

[A] First feature > [B] typo fix > [B] addition of test suite > [C] Second, dependent feature > [A] failing expansion and modification of test suite based on feedback from the second feature > [C] fix of first feature

Squashing this stack is pretty bad, throwing away the separation of the two main features, and the authorship blame. Not squashing this stack is also pretty bad, littering the history with nonsense commits, making git bisect vastly harder, and creating toxic, test-failing branch points.

Theoretically you can dump the MR, git rebase -i the stack to make history "sensible" with just two commits, and then re-push it as pair of a MRs (one with the first feature commit, the other with the first and second), the second of which screams "DO NOT MERGE UNTIL YOU MERGE MR X FIRST!" manually, but this loses this history of the discussion on what's best to do, still loses the kudos/blame of some of the contributors, and is an epic extra piece of work.

Of course, GitLab could be extended (either by us or upstream) to add features to manage this (turning the 'squash' button into a complex form to let the merge select arbitrary squash/fixup/rebase actions on a per-commit basis), but that's a huge undertaking, taking GitLab quite far away from the simple branch model it's built around so upstream may well not be keen, and said code has to be written and supported by someone.


This workflow is one that I personally do up to a few times a day, pretty much every day. It's the core of MW's development model. I know that a few areas of our codebase don't use this model and don't have the complexity of multi-feature inter-relation development, but they're the simple exceptions, and it feels like we're focussing on toy development rather than the main stream of our work in all the analysis. It's not an "oh well" to lose it, it's going to be pretty seriously disruptive.

EBernhardson (WMF) (talkcontribs)

I haven't run into the issue of force-rewrite on open PRs being disabled, but indeed that would make my current experiments with finding a reasonable workflow completely useless. If the only option in a PR is to continually add more code that will be squashed into a single patch, I worry the development history and general experience of performing code review is going to suffer for anything of complexity.

Adamw (talkcontribs)

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting: Refactor to allow implementation -> Implement feature

Good point, in this case with a squash workflow the feature would have to be split into two branches.

EBernhardson (WMF) (talkcontribs)

How does that work though? As far as I can tell gitlab has no affordance to split a PR into two branches. If branch A is the refactor, and branch B is the new feature, then as far as gitlab is concerned a PR on B is a PR for A+B and it can be merged without consideration of the A PR.

TCipriani (WMF) (talkcontribs)

There is a feature in the premium version for merge request dependencies that is needed here.

I'm not entirely satisfied with any other mechanisms (aside from merge request dependencies) for splitting merge-requests and having them depend on one another. The "smartest" thing I could think to do is to have dependent merge requests target other merge-request branches. For example, I split !4 into !4 and !5. In !5 I targeted the master branch and in !4 I targeted the work/thcipriani/beautiful-soup-dependency branch (the branch from !5). After merging !4 the merge showed up in !5 rather than in master where it could cause issues. I suppose that's desirable in terms of behavior, but there are a few problems with this:

  1. History becomes messy. Maybe this could have been avoided had I used some other options in merging.
  2. It's non-obvious that it's not merged to master
  3. I wasn't prevented from merging the dependent patchset, it merely mitigated any risk of merging it

With the general advice on getting speedy code review being to split your patchsets it'd be nice to have this be a more supported path. It's noteworthy that there are many open issues about creating a merge-request splitting tool.

Adamw (talkcontribs)

We're just talking about the gitlab UI, I think? From the commandline, let's say you have patches (1, 2, 3, 4) that make up a branch "A", and you want to split (1, 2) into its own merge request. To do that, check out patch 2 then "git branch <new name>" or "git checkout -b", and push that.

Agreed that stacking merge requests can get tricky--but you can usually get the desired effect by carefully choosing the merge target for your PR. If I have branches A and B stacked on each other, then A will be merged to master but B will be "merged" to A. This prevents the UI from showing all of the branch A commits as if they were part of B.

AKosiaris (WMF) (talkcontribs)

Let me add a workflow that SRE uses in gerrit and is pertinent I believe to the integration topic.


An SRE pushes a topic branch in the puppet repo. Every single one of the commits in the topic branch needs to be merged and deployed individually, after having been reviewed (hopefully) and CI has +2ed it. Rebasing might be needed but it's also expected in the current workflow. The reason for that is that every single one of those commits has state changing consequences for at least part of the server fleet and the SRE in question is expected to merge, "deploy" it and perhaps even trigger multiple puppet runs (alternatively they can also wait for the full 30mins that currently puppet changes to reliably be distributed to the entire fleet).


The most recent example I can think of is https://gerrit.wikimedia.org/r/q/topic:%22623773%22+(status:open%20OR%20status:merged).


How will SRE have to adapt that workflow for gitlab? Create a separate MR per change? Using a single MR clearly doesn't cut it (right?), but on the other hand having to go through the process of manually creating 4 or 5 MRs for something that is automatic in Gerrit isn't great either.

TCipriani (WMF) (talkcontribs)

I made a concrete example of this on our gitlab-test instance

Of Note

  • I used merge request labels in the place of topics
  • This is a series of patchsets, but they have no semantic relationship to one-another
  • My interaction with this repo was purely through the git client and no other programs

From my side the steps were:

  1. Create my work locally as a series of commits
  2. Use push options to make a merge-request for each patchset

This looked like:

$ echo '5' > COUNTDOWN
$ git commit -a -m 'Start countdown (1/5)'
$ echo '4' > COUNTDOWN
$ git commit -a -m 'Decrement countdown (2/5)'
...
$ git push \
  -o merge_request.create \
  -o merge_request.target=production \
  -o merge_request.remove_source_branch \
  -o merge_request.title="COUNTDOWN (1/5)" \
  -o merge_request.label='T1234' \
  gitlab-test \
  HEAD~4:work/thcipriani/T1234

Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 327 bytes | 327.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote:
remote: ========================================================================
remote:
remote:     A test instance of GitLab for the Wikimedia technical community.
remote:   Data may disappear. Consider everything here potentially public, and
remote:                     do not push sensitive material.
remote:
remote: ========================================================================
remote:
remote: View merge request for work/thcipriani/T1234:
remote:   https://gitlab-test.wmcloud.org/thcipriani/operations-puppet/-/merge_requests/1
remote:
To gitlab-test.wmcloud.org:thcipriani/operations-puppet.git
 * [new branch]            HEAD~4 -> work/thcipriani/T1234

As is already mentioned, this could be wrapped in a friendlier interface.

AKosiaris (WMF) (talkcontribs)

I 've iterated a bit on your take. Mostly a for loop to go through all the changes. What I got

is at https://gitlab-test.wmcloud.org/akosiarisgroup/simplegroupstuff/-/merge_requests?label_name%5B%5D=T42


$ for i in 5 4 3 2 1 ; do \

git push -o merge_request.create \

-o merge_request.target=main \

-o merge_request.remove_source_branch \

-o merge_request.title="We are leaving together ($i/5)" \

-o merge_request.description="And still we stand tall ($i/5)" \

-o merge_request.label="T42" \

origin HEAD~$((i - 1)):refs/heads/Europe$((i - 1)) ; done


Couple of notes:

  • The gitlab label for this specific use case seems to supplant the gerrit topic branch adequately
  • CI is run on every MR, which is what we want
  • While some merge_request git push options are static, some others like title and description aren't. A wrapper tool will need to extract them from the git commit messages I guess. That adds a bit to the complexity of the tool that needs to be written, but it's arguably doable and probably maintainable. It will be however akin to git-review (does anyone cringe already?) albeit only for those that require this workflow
  • The big issue I see is the fact we don't got dependent MRs in this case. So any of the later MRs can be merged at any point in time by mistake causing the expected mayhem. And it seems like this is a Paid feature and not a Community edition feature per Wikimedia Release Engineering Team/GitLab/Features, which isn't a great sign. The notes in that table say "Important for our workflows, but can be re-implemented as a CI job.". Not sure what that means? We 'll create CI that checks something, but what exactly? A "Depends-on"? That's opt-in (as in the user must actively write it), it will probably not safeguard us much.
Nikerabbit (talkcontribs)

I would imagine that doing a wrapper that emulates `git review` behavior in a way that creates a separate merge request for each commit wouldn't be too hard. The real issue is lack of nice UI in GitLab to automatically rebase merge requests on top of the target branch.

Adamw (talkcontribs)

Gitlab will show you when a merge request is going to conflict with master, and a successful merge includes rebase. Is there a reason we need to explicitly rebase the merge request before merging, or maybe that's just a habit carried over from our Gerrit workflow?

Nikerabbit (talkcontribs)

If the merge requests depend on the previous one, at least the merge base needs to be updated.

If there are conflicts during a rebase, I would like the test pipeline to run again on the rebased merge request before it is merged to the master.

Reply to "Integration: Merge requests and patchsets"
Mainframe98 (talkcontribs)

I prefer the Gerrit workflow, where I can clone a repository, create changes & commit them and then push the change for review. I don't even have to create a branch! It is a really effective workflow for users who create patches against multiple repositories without being very involved in the development of that repository. A Github workflow (and from what I understand a GitLab workflow too) requires the user to fork the repository, create a branch and then a merge/pull request in the UI. If they (like I am prone to) forgot to fork first, they'll have to fork first, adjust the origin of their local repository and then go through with the other steps. This seems more work than with Gerrit currently.

It especially impacts users who do a lot of maintenance work on extensions and skins. I'd rather not have to maintain dozens upon dozens of forks just because I'm trying to deprecate a feature (for which all production usages have to be removed first, across the dozen of repositories used by WMF's MediaWiki).

AKlapper (WMF) (talkcontribs)
Mainframe98 (talkcontribs)

If existing developers (perhaps trusted-contributors) are granted such permissions (with master protected) then that would make things easier, especially if, like mentioned creating a branch from the CLI is an option. It seems a bit wasteful to have multiple branches littering the repository, as it makes the distinction between feature branch/merge request branch and canonical branches (e.g release branches, collaboration feature branches) less distinct than it is now.

Paladox (talkcontribs)

I concur with @Mainframe98 here that I prefer Gerrit's workflow. I've used gitlab in the past to contribute patches to other projects and I found the UI difficult at first and still kind of find it difficult.

I must admit I do like that gitlab includes a user interface to actually browse the code but I dislike it's workflow. I like Gerrit's workflow because you just create one commit rather than a merge commit with a ton of different commits because you decided to edit each file using the web ui. Gerrit creates one commit whilst using the webui, so your not like creating lots of commit.

I also like the fact that gerrit shows all recent commits regardless of projects as the main page (doesn't look possible in gitlab?). This allows really anyone to go to status:open and either provide input or merge without needing to know a project to look at.

Users get their own dashboard with Gerrit (I don't think this is possible with gitlab).

Gerrit also provides the command you need to use to amend you commit (if you want to do it locally) or want to test it out. It's also easy to create a change (e.g git push HEAD:refs/for/<branch>).

You also don't have to fork in order to contribute granted when you use the inline editor it forks automatically for you. But the disadvantage is you have a duplicate repo, the advantage is the user can do what ever they want with the repo. So like for example for MediaWiki forking that more than 1 times would use a lot of storage (please do tell me if I got this wrong).

Gerrit makes the life of the reviewer and owner easier by creating a simple user interface. Being able to comment, being able to contribute easily through the interface and also being able to find comments that you found you needed to address but missed it.

Wikimedia has had a great relationship with gerrit upstream. I'm sure if we engage we could get the features/ui changes done that our users request. Gerrit is very extensible via plugins and when it's not, upstream are quite accommodating into adding the extension/plugin points if you give a valid usecase. My work with the Gerrit Frontend Team has been quite good, they are willing to accept any requests as long as it has a usecase (though they've rejected some stuff).

AKlapper (WMF) (talkcontribs)

@Paladox Could you elaborate the actual problem with "creating lots of commits"? I'm asking because you can squash them into a single commit when merging.

Forking: As I wrote in my previous comment in this thread there are workflows which do not require forking.

Showing all recent commits regardless of projects: GitLab offers that across projects too, within a group. See for example https://gitlab.gnome.org/groups/GNOME/-/merge_requests?state=opened.

Dashboard: Which specific functionality do you use and expect from your "own dashboard"? It's hard to discuss if we don't know which problem a dashboards solves for you.

(Furthermore, I personally disagree that Gerrit "creates a simple user interface". Random example: Try to create a code change within the web interface, without reading the docs. Hint: It is possible.)

QEDK (talkcontribs)

A commit is meant to be atomic in nature, most code reviewers prefer multiple commits with each commit dealing with one thing only - in fact, the Gerrit way of having patchsets with one commit doing a lot of things is in the minority and only works because of the patchset-based workflow. Furthermore, Andre's point in the brackets is what's most important, (almost?) every newcomer to Gerrit has found it opaque in nature and contrarian to the pull-request-based workflows that most new contributors are used to.

AKlapper (WMF) (talkcontribs)

Re Dashboards:

Apart from my still open question which specific functionality you use and expect from your "own dashboard", in my understanding it is only possible to filter across repos via a label (like /dashboard/merge_requests?state=opened&label_name[]=LABELNAME) or within a group, but not on a custom subset of repos. The term "dashboard" in GitLab terminology seems to be about metrics.

Reply to "Gerrit workflow"

Workflow for translation updates

2
Nikerabbit (talkcontribs)

Current translation updates in Gerrit have one benefit and one drawback. The benefit is that they are merged automatically. The drawback is that they skip most of the tests and such cause merge blockers.

Currently Gerrit requires explicit +2 to have a change merged (if it passes tests). In GitLab this is rather the other way around: you can require tests to pass first, then you can merge (if you have sufficient access). To prevent merging unwanted code to the master branch, I assume we will be enabling branch protections to disallow direct push to the branch. This means that we would not be able to push translation updates directly anymore.

I think GitLab allows to set exceptions per-repo level, but I do not think that is sustainable: extra work to configure each repository, and we could end up with different repos doing it in different ways, making configuration on translatewiki.net side more complicated.

Is there way to set such exceptions globally, or perhaps per group level?

The other option would be that translation updates are send as merge requests, and merged immediately if tests pass. This avoids the drawback of the current system of skipping tests. I also think it's possible to give the translation updater account global permissions to do that across repos, and I saw GitLab even provides a merge_request.merge_when_pipeline_succeeds option to easily do that. I think this would be the best workflow for translation updates.

Jdforrester (WMF) (talkcontribs)

I agree that running localisation updates through the regular merge process would be a good idea; we don't with the current architecture to avoid too much extra load on the CI servers, at the cost of the occasional test-violation pain, as you say. Switching to just "normal" C+2/gate is something we could do now, and is the easiest thing for the new GitLab world, I imagine.

Reply to "Workflow for translation updates"