Git/Conversion/issues

There's a lot of folks wanting to migrate MediaWiki to git distributed source control, but there are a few issues still to sort out. This page is meant to lay out a few of those and make sure we know what needs to be done; please consider this a discussion aid, not a formal todo list.

Extensions
The main question is whether to:
 * stick core and all extensions directly in the same repository
 * bad: big repo means always big checkout -- unlike SVN can't check out just one dir
 * bad: can complicate management & dev on small subprojects (but no worse than today's SVN)
 * good: easy to set up
 * good: easy to pull updates
 * stick core in one repo and all extensions in another repo
 * almost same as first option
 * stick core in one repo and extensions in their own repos
 * good: looks nice
 * good: keeps history clean in each ext
 * bad: need to create & manage more repos (but there's infrastructure like gitorious for making farming easy)
 * bad: management is more complicated
 * bad: pulling updates from upstream to a local checkout may be more complicated
 * use git submodules
 * bad: slightly complicated
 * submodules track commit ids, so the master repo will need a new commit for each change made to each of the submodules Dantman 20:09, 22 March 2011 (UTC)
 * can sometimes cause you to lose something you were in the middle of working on in a submodule Dantman 20:09, 22 March 2011 (UTC)
 * How does it cause you to lose data? --Ævar Arnfjörð Bjarmason 07:50, 23 March 2011 (UTC)
 * The repos submodules check out are not attached to any branch. If you don't explicitly check out a branch, try to make some code changes, then commit it. And later run  on the parent repo, your changes will be silently overwritten and disappear. Additionally because of the attachment to specific commits you can get cases where update will update your code to an older point if you haven't pushed your changes and waited for the submodule repo to be updated. Dantman 10:48, 23 March 2011 (UTC)
 * The commits are not lost; they are still in your object database and appear in your reference log (which you can view by typing ).  Any commits made to a detached HEAD can be incorporated back into the branch by merging or rebasing.  I agree that the behavior is clumsy, scary and unintuitive, and it is one of my least favorite aspects of git, but it is not destructive. blipvert 16:49:31, 19 July 2011 (UTC)
 * Theoretically all the features we actually want from submodules could be done using some scripts checked into a common repo and we don't get the downsides and can add more advantages Dantman 20:09, 22 March 2011 (UTC)
 * bad: most people don't have much experience with them, so they're a bit of a wildcard
 * bad: submodules don't help people commit to large numbers of repos, only checkout specific commits of them Dantman 10:48, 23 March 2011 (UTC)

Localization
First, a brief overview of the localization process today:


 * 1) code is checked into SVN which adds or changes localized messages
 * 2) TranslateWiki.net's automated updates import the new messages into the wiki for editing
 * 3) ... translators edit new or old messages ...
 * 4) Siebrand runs TranslateWiki.net's scripts to export localized messages back to files in the source tree
 * 5) * and may have to manually resolve conflicts
 * 6) * and may have to manually edit calling source code to correct formatting errors, misuse of options, bad description comments
 * 7) * and then commits it to upstream SVN

This process isn't inherently unfriendly to git -- it works pretty much the same way for l10n updates for StatusNet -- however the issue of may complicate this.

StatusNet's git repository today stores all plugins in the same repo as core, which behaves about the same as MediaWiki's SVN situation.

If instead we put each extension in a separate repository, then the process will need many more checkouts and more commits to run a full update on everything. Not impossible, but would at least require changes to TranslateWiki's processes to make it manageable.

Production checkouts
It's common among power users (developers and folks who like running their own sites) to run MediaWiki straight out of a SVN checkout. This has some nice benefits:
 * really easy to pull updates
 * if editing code in place, easy to commit fixes

The main 'funky' issue with SVN today is with extensions. To avoid checking out *every extension ever* you can do partial checkouts, so you tree looks like:


 * root == phase3
 * extensions/ == phase3/extensions (stub dir)
 * extensions/Foo == extensions/Foo (separate dir in same repo)
 * extensions/Bar == extensions/Bar (separate dir in same repo)

A 'svn up' from the root updates everything, but you don't need to keep around extensions you don't use.

This is harder to duplicate in a git situation with 'single repo' layout: you'd just end up with all the extensions in your dir and have to live with it.

If each extension is a separate repo, then you can check them all out in their own subdirs -- much like the SVN layout earlier -- however pulling from upstream may not happen automatically.

LocalisationUpdate extension
The LocalisationUpdate extension in particular is meant to slurp in updated localization files that have been pulled (from SVN or from git) and use their data as updates to what shipped with the current live code.

I *think* this should work the same with git as with SVN, with the caveat that automatically pulling updates may be slightly harder to automate if pulling a lot of separate repos. But it's still automatable, and the actual extension should work exactly the same.
 * Being the guy that de facto does the day-to-day operations of LocalisationUpdate (i.e. fixes it when it breaks), I agree that git support in LU will be easy. LU now uses two separate checkouts (one for trunk, one for extensions) to pull from on our setup, but there's no reason that couldn't scale up to a few hundred. In vanilla setups LU pulls updates over HTTP, but as long as there's an SVN-like interface that allows pulling the raw content of the latest trunk version of a messages file from a stable URL (like http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/languages/messages/MessagesNl.php in our current SVN setup) it'll be fine. I know Github has this, for one. --Catrope 20:38, 22 March 2011 (UTC)

?
Any other outstanding issues?

Lessons Learned from Postgres's SVN-to-git Migration
from http://lwn.net/Articles/409635/


 * Assuming that there are any projects out there who have not yet switched to their distributed version control system of choice, here's a few things to learn from our migration:


 * Start with a Git mirror.
 * Designate a specific "Git migration team". Make sure they have lots of free time.
 * Your first attempt to migrate will probably fail, so you need to be prepared for more than one.
 * Changing your infrastructure, workflow, and build tool dependencies is harder than the repository conversion.
 * Make friends with the conversion tool authors.
 * Write lots of docs about the new tools and workflow.
 * The more history you have on your current system, the more work conversion is going to be.
 * Things which are broken in your current history are not going to fix themselves when you migrate.
 * When testing the conversion, make sure to look at more than HEAD and branch-tips.

Lessons from Drupal
Melissa Anderson of Drupal writes:


 * I'm not sure there's a better writeup than the PostgreSQL document at http://lwn.net/Articles/409635/ We benefited greatly from their experience and took it to heart.


 * It's worth noting that the Drupal community is still struggling with patch/merge commit/attribution issues and hasn't yet settled on process. You can see some of that discussion here: http://groups.drupal.org/node/148184


 * The bulk of the Git migration history can be found at http://groups.drupal.org/drupal-org-git-team.