Jump to content

Making Subversion faster

From mediawiki.org

The Subversion protocol is extremely inefficient, especially in terms of the number of network round-trips that need to be completed in order to perform operations such as diff, merge and annotate. A fairly ordinary merge operation can take several minutes if you have a high latency link, as you would if you lived in, say, Australia.

It's possible to use viewvc, that works for diffs and annotates. But it doesn't really work for merges.

Using svnsync

[edit]

If you can afford 1.5 GB of disk space and you're going to be doing these operations regularly, you can make a local mirror of the repository. Here's how I set up my mirror of MediaWiki.

First, create a local repository

sudo mkdir -p /var/svn/mediawiki
sudo chown tstarling /var/svn/mediawiki
svnadmin create /var/svn/mediawiki

Create hooks for start-commit and pre-revprop-change that do nothing, to make svnsync stop whinging about permissions. Windows users should create empty start-commit.cmd and pre-revprop-change.cmd files in the hooks directory.

cd /var/svn/mediawiki/hooks/
echo '#!/bin/bash
exit 0' > start-commit
chmod 755 start-commit
cp start-commit pre-revprop-change

Configure svnsync:

svnsync init file:///var/svn/mediawiki svn+ssh://svn.wikimedia.org/svnroot/mediawiki

Do the initial sync. This takes a while.

svnsync sync file:///var/svn/mediawiki

Then whenever you want to update your mirror, run that command again. It starts from where it left off.

The general idea is to never check out a copy of the local mirror. If you check it out, you might accidentally change it and then svnsync's whinging about hooks would have been justified. Instead, I just diff and merge with URLs.

But URLs are slow to type, so I use the following shell function:

function mi() {
	local dir
	if [ -z "$1" ]; then
		dir="."
	else
		dir="$1"
	fi
	dir=`readlink -f "$dir"`
	trailing=${dir#/home/tstarling/src/mediawiki/}
	if [ "$dir" == "$trailing" ]; then
		echo "No mirror available for $dir" >&2
		return 1
	fi
	echo "file:///var/svn/mediawiki/$trailing"
	return 0
}

So now instead of

cd includes
svn annotate Skin.php

You can type:

cd includes
svn annotate `mi Skin.php`

which is about a thousand times faster. For the current directory, omit the filename:

cd ..
svn diff -c 42767 `mi`

Use git-svn

[edit]

See Git

Using SVK

[edit]
Note that SVK has been end-of-life'd by its maintainer, Best Practical.

Another alternative is to use SVK, an advanced distributed version control system that works on top of Subversion. Instructions for SVK 2.2.0 (latest version):

svk mirror svn+ssh://svn.wikimedia.org/svnroot/mediawiki //mirror/mediawiki
svk sync //mirror/mediawiki
svk checkout //mirror/mediawiki/trunk/phase3

All history operations are now local. If you want to go offline:

svk branch --offline

While "offline", you use svk push and svk pull to sync with the master repository.

It would be possible to tremendously speed up mirroring once MediaWiki provides a svn dump from the repository. Then mirroring would be:

svk mirror --bootstrap=mediawiki-repo.svndump //mirror/mediawiki ...