Git/Conversion/rewriting

This article regroup some notes Antoine "hashar" Musso wrote week 11 of year 2012 while he was experimenting various git repository optimizations. All sha1 referenced below are based on mediawiki/core.git repository as it was during that week. To find those commit later on, you will want to refer to the date when it is provided.

Empty commits
We probably want to prune empty commits. They are svn commit that only changed svn metadata such as svn:eol-style or svn:keywords. They are not replicated in git and ends up being null commits. To filter them out, one has to use git filter-branch which has an option to do just that: --prune-empty.

git filter-branch --prune-empty

TODO: need to check wich branch it applies too. I guess current branch only. So we probably want to filter everything with --all ?

git filter-branch --all --prune-empty

ExtraParserTests.php 8MB string
We had a bug in the parser that segfaulted PHP whenever a string was more than some huge length.

Find the commit by Platonides on Fri May 28 14:16:46 2010 +0000

Was 3b150034a08872e858684c604d37ca92b58b920a

Rebase before that:

$ git rebase 3b150034a08872e858684c604d37ca92b58b920a^ --interactive 

In the text editor you will see a line with:

pick 3b15003 (bug 8689) Use strict php comparison, so that inserting a long numeric line doesn't produce a fatal error when php tries to convert it to a number.

Change the 'pick' at the beginning of that commit by 'edit'. Leave the rest untouched. Save and quit.

edit 3b15003 (bug 8689) Use strict php comparison, so that inserting a long numeric line doesn't produce a fatal error when php tries to convert it to a number.

Rebase start.

Stopped at 3b15003... (bug 8689) ...

You are now at the commit that introduced the huge file. Edit maintenance/ExtraParserTests.txt to get ride of the long strings. In vim I /^!! then moved down and delete the line with shift+D. We need to edit both the !!input and !!result sections.

Check the file as some sane size (wc -c):

$ wc -c maintenance/ExtraParserTests.txt 1367 maintenance/ExtraParserTests.txt $

Signal rebasing we want to keep this change: git add maintenance/ExtraParserTests.txt

Amend commit message. WARNING: make sure to edit commit message stating that the file was rewritten!

git rebase --continue

Let the rebasing job continue.

Empty commits
Sometime you well get an error about a cherry-pick being empty. This are usually commits that only altered svn metadata. We can still safely commit them using:

git commit --allow-empty

Then continue: git rebase --continue

Fix rebase conflict
Eventually the huge commit was reverted, rebasing will thus choke on that revert commit.

warning: Cannot merge binary files: maintenance/ExtraParserTests.txt (HEAD vs. d85f3ca... Bug 23715: replaced 8-megabyte txt file that crashed many text editors with one str_repeat call)

That is because we have modified it previously:

$ git status $
 * 1) Changes to be committed:
 * 2)   (use "git reset HEAD ..." to unstage)
 * 3) 	new file:  maintenance/tests/ExtraParserTest.php
 * 4) Unmerged paths:
 * 5)   (use "git reset HEAD ..." to unstage)
 * 6)   (use "git add/rm ..." as appropriate to mark resolution)
 * 7) 	both modified:     maintenance/ExtraParserTests.txt
 * 1)   (use "git add/rm ..." as appropriate to mark resolution)
 * 2) 	both modified:     maintenance/ExtraParserTests.txt
 * 1) 	both modified:     maintenance/ExtraParserTests.txt

Edit the file and manually remove the last test just like r67091 did. See for reference https://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/ExtraParserTests.txt?view=markup&pathrev=67091 ; Note that you can probably download that new version of the file instead of editing.

curl 'https://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/ExtraParserTests.txt?revision=67091&view=co&pathrev=67091' \ maintenance/ExtraParserTests.txt git add ..

$ git rebase --continue

Whenever it choke on a svn eol-style empty commit just:

$ git commit --allow-empty && git rebase --continue

This could be avoided if we run just ran a filtering job that --prune-empty before since those commits will no more exist.

Empty commit messages
While doing the rebase described above, we have met empty commits messages which emit a warning and block rebase. Then it chokes on an empty commit message:

Aborting commit due to empty commit message. Could not apply 82939e2...

Which is r70319 by Markus Glaser on 2010-08-02 07:44. A comment in code review has the correct commit message:

make the framework work with phpunit 3.0.6. credits to priyanka dhanda

So just 'git rebase --continue' and in the text editor insert that commit message.

And continue ...

$ git commit --allow-empty && git rebase --continue

Then we had a file renaming. Just add the new file name and continue. Finally it was deleted. git rm it and continue :)

r81483 has an empty commit message too. Should be:

--- Few minor fixups to r81440: ---
 * Removed leftover var_export
 * Couple of FALSE/TRUE/NULL to lowercase
 * canCreateAccounts returns a bool, don't treat it like a status.

6626ccc r83859 Reedy 2011-03-13 has empty message:

--- (bug 26629) add Special:MIMESearch to api ---

328ee7b r85922 2011-04-12 21:27:24 --- Implement tbody, thead & tfoot. Fixes Bug 4740 Original patch by bluehairedlawyer@gmail.com, rewritten at some places by me. ParserTests are changed to accommodate the new elements, but no parserTest `logic` is changed ---

8a40922 r87940 2011-05-12 23:40:53 --- ---

cea68c7 r88345 2011-05-17 20:25:48 --- Localisation updates for core and extension messages from translatewiki.net (2011-05-17 20:24:00 UTC) ---

1716dc7a r90007 2011-06-13 20:17:39 --- Release notes for r89995 ---

1f58d86 r91107 2011-06-29  20:25:45 --- Localisation updates for core and extension messages from translatewiki.net (2011-06-29)  ---

Appendices
TODO:

Fix Commit author and date to be the same as author: http://stackoverflow.com/a/5520650/639804

git filter-branch --commit-filter 'export GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME"; export GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL"; export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"; git commit-tree "$@"' -- basecommit..HEAD

Check with:  git log --format=fuller

Files in ./extensions/ /extensions