Git/Conversion

This page discusses efforts to convert away from our current Subversion repository to Git. The current (very preliminary) plan as of September 2011 is to do this by the end of 2011.

Rationale
Our current Subversion-based version control system has served us well, but we're in need of a more suitable version control system for our development effort. Our community is very distributed, with many parallel efforts and needs to integrate many different feature efforts. After long consideration, we've decided to move to Git from Subversion

October 2011

 * Prelim test conversions early in month
 * [to finalize] "git boot camp" @ WMF tech days and/or NOLA hackathon
 * Git workflow architecture review
 * CI tests get run when a developer chooses to push to the stage between their branch and the mainline branch
 * this stage is roughly equivalent to merge requests on github & gitorious -- show test results to reviewers on a more or less completed chunk of code (may be one or several commits, and can be updated & resubmitted)
 * Agree on implementation strategies regarding remaining development process questions, e.g. how to handle multi-repo commits

November 2011

 * Finish code review on trunk
 * Cut 1.19 release branch
 * Git migration begins \o/
 * Finish up specific Git management scripts
 * to support WMF workflow
 * i18n updates
 * new developers
 * SVN import, post-release/deployment make r/o switch
 * Deployment scripts
 * Make Gerrit behave like we want it to
 * 1.19 deploy (from SVN)
 * Move towards git-based development and release process

December 2011

 * First release from git mainline development branch
 * Move towards continuous integration via git, goalpost: weekly deployment
 * Jenkins (Testswarm/PHPUnit tests) on git branches

Plan of attack
To do a conversion of the repository we need:


 * ✅ Get a dump of the repository. There's now a Pushmi mirror running on
 * ❌ Split up and convert the MediaWiki repository.
 * ✅ I tried git-svn which has all sorts of cpu (conversion from dump takes 3 weeks), memory (keep having to kill it & restart) and reliability problems.
 * Trying snerp-vortex which is much more promising. Working with the author to solve some bugs related to the MediaWiki dump. Still a few outstanding. This is the current blocker for further progress.
 * ❌ Get a copy of the old CVS status to reimport its history properly. From an IRC talk between Avar and Brion on 20101026, the current svn repository suffers from cvs2svn bugs.
 * ✅ sourceforge CVS repository enabled by brion (2010-11-24) http://wikipedia.cvs.sourceforge.net/viewvc/wikipedia/
 * ❌ get repository with rsync : rsync -av USER@wikipedia.cvs.sourceforge.net::cvsroot/wikipedia/*
 * ❌ cvs to git conversion
 * ❌ svn to git conversion
 * ❌ Write some documentation about git usage for our developers. A list of useful links might be a good start.
 * ❌ Have developers to start using git-svn to learn about git usage.
 * ❌ Convert MediaWiki's infrastructure to Git
 * ❌ Special:CodeReview needs to work with it. Shouldn't be too hard relatively. It just shell out to SVN. Just need to find the equivalent Git commands.
 * There might be some more complications actually; I think the current code assumes integer IDs and ID-based ordering. We'd need to change it to accept the longer hex commit ID hashes, and to understand the commit history tree structure. It's not rocket science, but I'd very strongly recommend giving it some more smarts there, as linear squashes of git trees can get real confusing around merges. --brion 20:30, 25 October 2010 (UTC)
 * ❌ Commits via IRC: Should be easy with CIA and *insert hundreds of IRC bots here*
 * ❌ Commits via E-Mail: ditto
 * The two above should be no problem if running our own primary git repo. I would recommend also having automatic mirrors syncing to a live backup mirror on github or gitorious. (Note that gitorious.org does not allow adding your own post-commit hooks etc.) --brion 20:30, 25 October 2010 (UTC)
 * ❌ Convert the Bugzilla code to recognize the new SHA-1 commits.
 * ❌ Create database of SVN revision ids -> Git SHA-1's. Needed for redirecting CodeReview links and anything else that uses rXXXX to the new commit ID's.

Split up and convert
A naïve  conversion of the entire repository (with branches) weighs in at around 650MB (early 2010). It makes no sense to make one Git MediaWiki repository, it should be split up.

In Subversion everything gets squashed into one giant repository. In Git repositories are split at the boundaries over which code does not cross.

Splitting

 * ❌ Everything in trunk/* gets its own repository
 * ❌ Further everything in extensions/* gets its own repository
 * To convert a massive git repo of all extension, following script can be used: https://gist.github.com/865120
 * ❌ Maybe other bits too, like tools/* get split up

All of these bits get the branches/* and tags/* history relevant to them integrated into their repository.

Git submodule repositories can be created to track various aggregates people are interested in. There could be a repository with:


 * MediaWiki core + Wikimedia extensions
 * All extensions (like checking out extensions/) now

There would need to be on-commit hooks to update these submodules. This has some disadvantages, for example if some function gets deprecated in core one would need to commit in core + all the extension repos. This is easy to script but a bit harder than with SVN today.

On the other hand keeping it all in one repository would mean a much larger repository. Anyone wanting to hack on core would need the full history of all extensions.

Converting

 * ✅ Every commit needs to be rewritten to give name/email pairs to SVN users. There's a tool for this already that works with git-filter-branch.
 * ❌ Have people populate the USERINFO/ directory. The information in it is incomplete, and so is the conversion as a result.

History

 * See history of MediaWiki version control

Working on the conversion

 * User:Ævar Arnfjörð Bjarmason
 * User:^demon

Would like to see it happen

 * 
 * Aryeh Gregor
 * Ashar Voultoiz (already use git locally)
 * Daniel Friesen

Documents

 * User requirements:
 * Specifications:
 * Software design document:
 * Test plan:
 * Documentation plan:
 * User interface design docs:
 * Schedule:
 * Task management:
 * Release management plan:
 * Communications plan:
 * Status updates