Making Subversion faster

From MediaWiki.org

Jump to: navigation, search

The Subversion protocol is extremely inefficient, especially in terms of the number of network round-trips that need to be completed in order to perform operations such as diff, merge and annotate. A fairly ordinary merge operation can take several minutes if you have a high latency link, as you would if you lived in, say, Australia.

It's possible to use viewvc, that works for diffs and annotates. But it doesn't really work for merges.

[edit] Using svnsync

If you can afford 1.5 GB of disk space and you're going to be doing these operations regularly, you can make a local mirror of the repository. Here's how I set up my mirror of MediaWiki.

First, create a local repository

sudo mkdir -p /var/svn/mediawiki
sudo chown tstarling /var/svn/mediawiki
svnadmin create /var/svn/mediawiki

Create hooks for start-commit and pre-revprop-change that do nothing, to make svnsync stop whinging about permissions. Windows users should create empty start-commit.cmd and pre-revprop-change.cmd files in the hooks directory.

cd /var/svn/mediawiki/hooks/
echo '#!/bin/bash
exit 0' > start-commit
chmod 755 start-commit
cp start-commit pre-revprop-change

Configure svnsync:

svnsync init file:///var/svn/mediawiki svn+ssh://svn.wikimedia.org/svnroot/mediawiki

Do the initial sync. This takes a while.

svnsync sync file:///var/svn/mediawiki

Then whenever you want to update your mirror, run that command again. It starts from where it left off.

The general idea is to never check out a copy of the local mirror. If you check it out, you might accidentally change it and then svnsync's whinging about hooks would have been justified. Instead, I just diff and merge with URLs.

But URLs are slow to type, so I use the following shell function:

function mi() {
	local dir
	if [ -z "$1" ]; then
		dir="."
	else
		dir="$1"
	fi
	dir=`readlink -f "$dir"`
	trailing=${dir#/home/tstarling/src/mediawiki/}
	if [ "$dir" == "$trailing" ]; then
		echo "No mirror available for $dir" >&2
		return 1
	fi
	echo "file:///var/svn/mediawiki/$trailing"
	return 0
}

So now instead of

cd includes
svn annotate Skin.php

You can type:

cd includes
svn annotate `mi Skin.php`

which is about a thousand times faster. For the current directory, omit the filename:

cd ..
svn diff -c 42767 `mi`

[edit] Using SVK

Another alternative is to use SVK, an advanced distributed version control system that works on top of Subversion. Instructions for SVK 2.2.0 (latest version):

svk mirror svn+ssh://svn.wikimedia.org/svnroot/mediawiki //mirror/mediawiki
svk sync //mirror/mediawiki
svk checkout //mirror/mediawiki/trunk/phase3

All history operations are now local. If you want to go offline:

svk branch --offline

While "offline", you use svk push and svk pull to sync with the master repository.

It would be possible to tremendously speed up mirroring once MediaWiki provides a svn dump from the repository. Then mirroring would be:

svk mirror --bootstrap=mediawiki-repo.svndump //mirror/mediawiki ...