Making Subversion faster

The Subversion protocol is extremely inefficient, especially in terms of the number of network round-trips that need to be completed in order to perform operations such as diff, merge and annotate. A fairly ordinary merge operation can take several minutes if you have a high latency link, as you would if you lived in, say, Australia.

It's possible to use viewvc, that works for diffs and annotates. But it doesn't really work for merges.

Using svnsync
If you can afford 1.5 GB of disk space and you're going to be doing these operations regularly, you can make a local mirror of the repository. Here's how I set up my mirror of MediaWiki.

First, create a local repository

sudo mkdir -p /var/svn/mediawiki sudo chown tstarling /var/svn/mediawiki svnadmin create /var/svn/mediawiki

Create hooks for start-commit and pre-revprop-change that do nothing, to make svnsync stop whinging about permissions. Windows users should create empty start-commit.cmd and pre-revprop-change.cmd files in the hooks directory.

cd /var/svn/mediawiki/hooks/ echo '#!/bin/bash exit 0' > start-commit chmod 755 start-commit cp start-commit pre-revprop-change

Configure svnsync:

svnsync init file:///var/svn/mediawiki svn+ssh://svn.wikimedia.org/svnroot/mediawiki

Do the initial sync. This takes a while.

svnsync sync file:///var/svn/mediawiki

Then whenever you want to update your mirror, run that command again. It starts from where it left off.

The general idea is to never check out a copy of the local mirror. If you check it out, you might accidentally change it and then svnsync's whinging about hooks would have been justified. Instead, I just diff and merge with URLs.

But URLs are slow to type, so I use the following shell function:

function mi { local dir if [ -z "$1" ]; then dir="." else dir="$1" fi dir=`readlink -f "$dir"` trailing=${dir#/home/tstarling/src/mediawiki/} if [ "$dir" == "$trailing" ]; then echo "No mirror available for $dir" >&2 return 1 fi echo "file:///var/svn/mediawiki/$trailing" return 0 }

So now instead of

cd includes svn annotate Skin.php

You can type:

cd includes svn annotate `mi Skin.php`

which is about a thousand times faster. For the current directory, omit the filename:

cd .. svn diff -c 42767 `mi`

Use git-svn
Still another option (particularly on Linux and Mac, although git on Windows is reportedly usable these days) is git, a distributed version control system. You may want to read an introduction to git-svn and git-svn's documentation.

You can check out MediaWiki's SVN repository using Git by doing:

mkdir mw-git cd mw-git git svn init \ --trunk trunk \ --branches branches \ --tags tags \ http://svn.wikimedia.org/svnroot/mediawiki

If you have commit access the last command would be:

git svn init \ --trunk trunk \ --branches branches \ --tags tags \ svn+ssh://svn.wikimedia.org/svnroot/mediawiki

Then to fetch all the revisions do:

git svn fetch

This will give you the full history of trunk, branches, and tags to work on locally. However, the slight downside is it takes about three days to actually complete.

To get the convenience of having everything without the unreasonable download times, you can piggyback off someone else's git-svn repo. Simetrical uses git-svn, so you can ask him. Copy that someplace and you should be able to update it with "git svn rebase". Alternatively, you could just check out phase3 and not everything. This is less convenient in the long term, but might complete in mere hours if you're lucky.

The major reason to use git-svn is because you like git more than Subversion. You'll wind up with a git repository that you can use like any other git repository. "git svn rebase" will fetch all commits and rebase your changes, "git svn fetch" will seemingly fetch much more including ridiculously long checkouts of new branches that you'll probably never look at, and "git svn dcommit" will commit your changes. (Only if you have commit access, sorry. git is awesome, but not awesome enough to allow you to commit without commit access.)

Everything is absurdly fast, as usual for git. Except for checking out updates from SVN. That's absurdly slow. As in "go get some coffee while you wait if you didn't rebase in the last few days". And if someone makes a new branch it takes approximately 1.47 eternities to check out with git svn fetch. I don't know why this is so slow, I asked in #git but they blamed it on SVN. Oh well. But you can use git.

Using SVK

 * Note that SVK has been end-of-life'd by its maintainer, Best Practical.

Another alternative is to use SVK, an advanced distributed version control system that works on top of Subversion. Instructions for SVK 2.2.0 (latest version):

svk mirror svn+ssh://svn.wikimedia.org/svnroot/mediawiki //mirror/mediawiki svk sync //mirror/mediawiki svk checkout //mirror/mediawiki/trunk/phase3

All history operations are now local. If you want to go offline:

svk branch --offline

While "offline", you use  and   to sync with the master repository.

It would be possible to tremendously speed up mirroring once MediaWiki provides a svn dump from the repository. Then mirroring would be:

svk mirror --bootstrap=mediawiki-repo.svndump //mirror/mediawiki ...