Git/Conversion/issues

There's a lot of folks wanting to migrate MediaWiki to git distributed source control, but there are a few issues still to sort out. This page is meant to lay out a few of those and make sure we know what needs to be done; please consider this a discussion aid, not a formal todo list.

Extensions
We're going to do this:
 * stick core in one repo and extensions in their own repos
 * good: looks nice
 * good: keeps history clean in each ext
 * bad: need to create & manage more repos (but there's infrastructure like gitorious for making farming easy)
 * bad: management is more complicated
 * bad: pulling updates from upstream to a local checkout may be more complicated

Localization
First, a brief overview of the localization process today:


 * 1) code is checked into SVN which adds or changes localized messages
 * 2) TranslateWiki.net's automated updates import the new messages into the wiki for editing
 * 3) ... translators edit new or old messages ...
 * 4) Siebrand runs TranslateWiki.net's scripts to export localized messages back to files in the source tree
 * 5) * and may have to manually resolve conflicts
 * 6) * and may have to manually edit calling source code to correct formatting errors, misuse of options, bad description comments
 * 7) * and then commits it to upstream SVN

This process isn't inherently unfriendly to git -- it works pretty much the same way for l10n updates for StatusNet -- however the issue of may complicate this.

StatusNet's git repository today stores all plugins in the same repo as core, which behaves about the same as MediaWiki's SVN situation.

If instead we put each extension in a separate repository, then the process will need many more checkouts and more commits to run a full update on everything. Not impossible, but would at least require changes to TranslateWiki's processes to make it manageable.

Production checkouts
It's common among power users (developers and folks who like running their own sites) to run MediaWiki straight out of a SVN checkout. This has some nice benefits:
 * really easy to pull updates
 * if editing code in place, easy to commit fixes

The main 'funky' issue with SVN today is with extensions. To avoid checking out *every extension ever* you can do partial checkouts, so you tree looks like:


 * root == phase3
 * extensions/ == phase3/extensions (stub dir)
 * extensions/Foo == extensions/Foo (separate dir in same repo)
 * extensions/Bar == extensions/Bar (separate dir in same repo)

A 'svn up' from the root updates everything, but you don't need to keep around extensions you don't use.

This is harder to duplicate in a git situation with 'single repo' layout: you'd just end up with all the extensions in your dir and have to live with it.

If each extension is a separate repo, then you can check them all out in their own subdirs -- much like the SVN layout earlier -- however pulling from upstream may not happen automatically.

LocalisationUpdate extension
The LocalisationUpdate extension in particular is meant to slurp in updated localization files that have been pulled (from SVN or from git) and use their data as updates to what shipped with the current live code.

I *think* this should work the same with git as with SVN, with the caveat that automatically pulling updates may be slightly harder to automate if pulling a lot of separate repos. But it's still automatable, and the actual extension should work exactly the same.
 * Being the guy that de facto does the day-to-day operations of LocalisationUpdate (i.e. fixes it when it breaks), I agree that git support in LU will be easy. LU now uses two separate checkouts (one for trunk, one for extensions) to pull from on our setup, but there's no reason that couldn't scale up to a few hundred. In vanilla setups LU pulls updates over HTTP, but as long as there's an SVN-like interface that allows pulling the raw content of the latest trunk version of a messages file from a stable URL (like http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/languages/messages/MessagesNl.php in our current SVN setup) it'll be fine. I know Github has this, for one. --Catrope 20:38, 22 March 2011 (UTC)

?
Any other outstanding issues?

Lessons Learned from Postgres's SVN-to-git Migration
from http://lwn.net/Articles/409635/


 * Assuming that there are any projects out there who have not yet switched to their distributed version control system of choice, here's a few things to learn from our migration:


 * Start with a Git mirror.
 * Designate a specific "Git migration team". Make sure they have lots of free time.
 * Your first attempt to migrate will probably fail, so you need to be prepared for more than one.
 * Changing your infrastructure, workflow, and build tool dependencies is harder than the repository conversion.
 * Make friends with the conversion tool authors.
 * Write lots of docs about the new tools and workflow.
 * The more history you have on your current system, the more work conversion is going to be.
 * Things which are broken in your current history are not going to fix themselves when you migrate.
 * When testing the conversion, make sure to look at more than HEAD and branch-tips.

Lessons from Drupal
Melissa Anderson of Drupal writes:


 * I'm not sure there's a better writeup than the PostgreSQL document at http://lwn.net/Articles/409635/ We benefited greatly from their experience and took it to heart.


 * It's worth noting that the Drupal community is still struggling with patch/merge commit/attribution issues and hasn't yet settled on process. You can see some of that discussion here: http://groups.drupal.org/node/148184


 * The bulk of the Git migration history can be found at http://groups.drupal.org/drupal-org-git-team.