Git/Conversion

This page discusses efforts to convert away from our current Subversion repository to Git. MediaWiki core and extensions used on Wikimedia sites have now switched to Git, but moving additional affected projects, and improving our new development infrastructure with tools integrating Gerrit and Git into our workflow, will continue to take engineering time until we completely switch our Subversion repository to read-only in the summer of 2013.

Rationale
Our current Subversion-based version control system has served us well, but we're in need of a more suitable version control system for our development effort. Our community is very distributed, with many parallel efforts and needs to integrate many different feature efforts. After long consideration, we've decided to move from Subversion to Git.

Some advantages of git:


 * "I love git just because it allows me to commit locally (and offline)." - Guillaume Paumier
 * "[Y]ou can create commits locally and push them to the server later (great for working without wifi), you can tell it 'save my work so I can go do something else now' in one command, and it'll allow us to review changes before they go into "trunk" (master).... without human intervention in merging things into trunk. Gerrit automates this process." - Roan Kattouw

Affected development projects
MediaWiki core (/trunk/phase3/) and MediaWiki extensions that WMF deploys moved to Git in March 2012. Afterwards, any other extensions, tools, or projects that wish to move can do so. These might include operations, fundraising, pywikipediabot, etc.

We will leave some codebases in Subversion and not bother migrating them, because those extensions or tools have been abandoned. Some developers will choose to move their projects to Github or some other git site. We will also leave svn.wikimedia.org up for at least multiple years; for the subdirectories holding projects that have moved to git, the repository will be read-only.

The Git conversion team will publicize any changeover date with at least 2 weeks' notice. As of right now (1 Feb 2011) there are no specific cutover dates set.

Chad would like to gradually migrate all projects currently on Wikimedia's Subversion repository so that he can make all of svn.wikimedia.org read-only by the middle of 2013 -- they can use Git/New repositories. They could move to WMF's git repo, or to another host; Chad can help them decide and migrate.


 * MediaWiki extensions (not used by WMF)
 * cf. Git/Conversion/Extensions queue
 * Starting in March or April 2012, Chad will move alphabetically through all extensions (that are not deployed on Wikimedia Foundation sites) and offer each of them choices as to when and whether to shift.
 * Pywikipediabot
 * The pywikipediabot community has not yet decided on whether to move, but is strongly leaning towards staying with SVN for now. Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator 00:23, 8 February 2012 (UTC)
 * Wikimedia Foundation fundraising
 * Fundraising extensions (including DonationInterface, FundraiserLandingPage, ContributionReporting, et al) can move along with core
 * Fundraising stuff in the Wikimedia repository can move on a timeline TBD
 * Fundraising migration is under discussion.
 * Wikimedia Foundation operations
 * Ops is pretty much aware of this since they've already started the git move. Happening piecemeal by them as they're ready.
 * Ariel Glenn's dumps infrastructure
 * Just two paths, /trunk/backups and /branches/ariel/. Should be pretty trivial, history's not complicated.
 * Can convert: as soon as we're ready, just give Ariel a day's notice or so.
 * Moved to operations/dumps.git on 15-Feb-2012. svn made r/o.
 * Wikimedia Foundation data mining and analytics, including Community Department
 * Toolserver internationalisation
 * In active use but maintainers don't have time right now to deal with migration. Another time.
 * Daniel Kinzler's WikiWord project
 * Per IRC: No rush, will move casually after main migration. Not under active development right now.
 * mwdumper
 * Not being actively developed right now. Can move this whenever.
 * Has been moved to mediawiki/tools/mwdumper.git on 15-Feb-2012. svn made r/o.
 * WM planet configuration
 * Was moved into operations/puppet.git (in files/planet) by Dzhan. This whole system needs redoing anyway (see bug 27208, GSoC 2012 project idea), but it'll do for now.
 * Should probably make svn r/o for this.
 * Wikimedia Mobile
 * Currently being done on github -- moving will be easy, just have to talk to mobile team about adjusting their workflow.
 * Continuous integration, for example TestSwarm
 * Not yet migrated to Git
 * All the testswarm/jenkins stuff is ongoing in git. Nothing from SVN is being used anymore (still maybe need to make paths r/o?)
 * Not yet migrated to Git
 * All the testswarm/jenkins stuff is ongoing in git. Nothing from SVN is being used anymore (still maybe need to make paths r/o?)

Split up and convert repositories
A naïve  conversion of the entire repository (with branches) weighs in at around 7.8GB (November 2011). It makes no sense to make one Git MediaWiki repository, it should be split up.

In Subversion everything gets squashed into one giant repository. In Git repositories are split at the boundaries over which code does not cross.

Splitting
We have a test repository up, but in February 2012 will redo the split to create a permanent git repo.


 * MediaWiki will go in mediawiki/core.git
 * Extensions will go in mediawiki/extensions/foo.git
 * There will be an extension "meta repository" at mediawiki/extensions.git which will contain all extensions as submodules.
 * Other things across SVN need to find new homes in Git

/Splitting tests

Converting

 * ✅ Every commit needs to be rewritten to give name/email pairs to SVN users. We are using username@users.mediawiki.org for a unified e-mail address scheme for all old commits.
 * Only for those without a known mailaddress or all?
 * What about username@svn.wikimedia.org instead?

Unscheduled items
Other cool things to do (not blockers):
 * ❌ Convert the Bugzilla code to recognize the new SHA-1 commits. Come up with a shorthand to autolink from BZ to gerrit changeset
 * Would Mark H. mind looking into this, since Bugzilla is his baby?
 * Filed as 35144. Workaround: For now, when referring to a Git diff, please paste changeset "Change ID"s, or the changeset number in the Gerrit changeset URL, into the BZ comment. Both are globally unique.
 * ❌ Create database of SVN revision ids -> Git SHA-1's for useful lookups.
 * Info is included in Git commits, just need to make a DB mapping of them. Good weekend project after the conversion is complete if someone is feeling bored.
 * Coren may work on this.
 * ❌ Rusty and Roan's effort to turn every Bugzilla patch into a git pull request.
 * ❌ change MediaWiki CR to readonly, no new comments or statuschanges. This is in the very long run: how to write redirects for viewvc -> gitweb?  But that can wait till like 2013.
 * ❌ Enforce commit message format http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

Done:
 * ✅ Change the configs of automatic bots -- when I commit, where will it spit out?
 * Isn't this done? gerrit-wm now posts to #mediawiki by default.

Wontfix:
 * ❌ Bump gerrit ids to some big number (eg. 200000) greater than the largest svn revision (so that they could be treated as 'logical' revisions, without conflicting the svn ones).
 * Not easy -- we've already started using gerrit in a production setting and we've got over 2k changesets.
 * We can live with those 2k changesets having the same number of MW ones, as they are different repos. For the actual change, it seems as easy as running ALTER TABLE tbl_name AUTO_INCREMENT = N.
 * WONTFIX

Open issues after migration

 * Niklas is unable to do code review as he used to
 * (bug 35455) Gerrit breaks Unicode in commit messages
 * (bug 35534) How to do tagging of revisions (especially for collaborative projects; "topic" can't be edited after commit)
 * Email notifications are useless/broken
 * (bug 35531) Mailing list with all changes
 * (bug 35533) Link to the unified diff (for all files) and other necessary stuff (very hidden on gitweb)
 * (bug 35532) Unified diff directly in the notification
 * No codereview list either, probably harder and less useful? all actions are sent to IRC
 * Review tool should load fast, display all diffs inline (?)
 * "Gitweb pages load slowly, and for some reason they seem to disable the bfcache in Firefox"
 * (bug 35612) Loading of a change for review in Gerrit is often very slow
 * Path filtering also for unmerged commits
 * This is not possible. Needs to be filed upstream if it's not already.
 * Unable to push for review if there are local uncommitted changes (like debugging hacks)
 * This is by design so you don't lose changes. If you've got local changes you don't want to commit but want to hang on to, use git stash.
 * If i understand correctly, the "local uncommitted changes" are changes that were committed to the developer's local repo and submitted using `git review' and were not yet reviewed and merged. See Talk:Git/Workflow, Committing followups: please no --amend and consecutive commits in Gerrit. --Amir E. Aharoni (talk) 08:36, 27 March 2012 (UTC)
 * No. These are changes that are not going to be committed upstream.

Ideal state
This is what we'd love to see:

History

 * See history of MediaWiki version control

Working on the conversion

 * User:^demon
 * Antoine Musso
 * Roan Kattouw

Documents

 * User requirements:
 * Specifications: Gerrit bugs that matter
 * Software design document:
 * Test plan:
 * Documentation plan:
 * User interface design docs: Gerrit bugs that matter
 * Schedule: see Timeline above
 * Task management: Task list from Bugzilla (bug 22596)
 * Release management plan:
 * Communications plan:
 * Status updates

Communications

 * draft plan
 * announcement of test repository
 * "git boot camp" from October 2011 NOLA hackathon GitBootcamp
 * https://blog.wikimedia.org/2012/02/15/wikimedia-engineering-moving-from-subversion-to-git/
 * Git, Gerrit, and You! or, Gerrit training available starting Monday 27 February
 * Postponing Git migration until March 21
 * When/how we'll add, remove people from Gerrit project owner groups
 * Git migration: documentation and short-term considerations
 * MediaWiki core deployments starting in April, and how that might work