Note: The below was in draft for quite some time. We actually moved to git so I didn’t follow this to it’s ultimate conclusion. I effectively aborted this after the conversion to hg consumed the 220GB of available disk space before it manage to completely convert the entire svn repo. I didn’t bother increasing the disk space since lugging around a 220+GB repo wasn’t in any way practical.
In any case some of the following may proof useful so I’m publishing it as is and as far as I got.
We are currently looking at migrating our subversion repository to mercurial including all the history and for some reason this seemed harder than it was in the end. Maybe this post will help someone out, so here you go:
Our repository has close to 25,000 revisions and checked out is approx. 1GB in size. Most ways of converting it recommend to create a local copy of your repository with svnsync first so this is what I did (on a windows machine):
First I installed TortoiseHg from the mercurial website (the all in one 64bit installer) and TortoiseSvn with the command line tools.
Creating a local subversion clone:
cd C:\ mkdir repos cd repos svnadmin create software-mirror echo 'exit 0' > software-mirror/hooks/pre-revprop-change.bat svnsync init file:///c:/repos/software-mirror svn://myserver/software svnsync sync file:///c:/repos/software-mirror
This took about 4h. Interestingly I also did this on a Linux machine running Ubuntu 12.10 and it only took half the time (same VM hardware specs, same network, same VM server).
Now we can go on to convert the repository. When searching the Internet the first way of doing it I came across was the convert extension. So, open TortoiseHg, enable the convert extension and run:
cd C:\repos hg convert software-mirror
Now after 48h of running it managed to convert 2,000 revisions, the process was using 2.8GB of RAM (with peaks at 3.5GB) and has created close to 2 million(!) files – WTF? So convert: FAIL.
I did some more research on the web during that time and came across multiple posts saying that converting large repositories with convert might not be so good as it a) might do the wrong thing (i.e. wrong commits on the wrong branches) and b) might fail anyway with an out of memory exception (although on a 64bit system it seems like it might just go into swap hell at some point). The alternative suggested was hgsubversion.
hgsubversion needs to be installed separately as it is not bundled but that proved fairly painless even on Windows:
cd C:\ mkdir hgext cd hgext hg clone http://bitbucket.org/durin42/hgsubversion hgsubversion
And add the extension to your
mercurial.ini. For Windows 7+ (probably even Vista+) this should be located under
[extensions] hgsubversion = C:\hgext\hgsubversion\hgsubversion
Now we should be able to clone a subversion repository as mercurial repository. I combined the suggestion from with the suggestion from and cloned the first revision and then use pull to load the remainder of the revisions:
cd c:\repos hg clone -r1 --config hgsubversion.defaulthost=mycompany.com file:///c:/repos/software-mirror software-hg cd software-hg hg pull
Our subversion usernames can be easily mapped to email addresses as firstname.lastname@example.org hence the defaulthost setting. The pull made good and fast progress (took only 5min to pull the first 2,000 revision compared to the 48h for the convert extension). Unfortunately after 7,500 revision the pull failed with “trying to open a deleted file”. Huh? The revision in question was a tag which was no different from all the other tags (our build machine automatically tags all builds). Now I don’t really care about that specific tag but unfortunately there is no way to instruct hg pull to skip this revision. So hgsubversion: FAIL.
Now, what other options do we have? I guess I could try to skip the revision in question when doing the svnsync in the first place but I decided to try something else: There is this fast-import format which seems to be emerging as a repository independent exchange format. So why not do it this way?
Unfortunately there does not seem to be a good tool around which create fast-import dumps from a subversion repository. Here is what I looked at:
- svnadmin and svnrdump do not produce dumps file in the correct format.
- There is a tool in the bazaar tool chain which supposedly can do this: . Every piece of documentation claims that there is a frontend for subversion you should able to use like this
bzr fast-export-from-svnbut I could not get it to work. All I ever got was “there is no such command” (while
bzr help fast-exportwould show something meaningful from the documentation it states that this is to generate fast-import streams from a bazaar repository). All the documentation says that the frontends are in the “exporters” subdirectory of the plugin but there is no such subdirectory (try
bzr branch lp:bzr-fastimport fastimportyourself and check). So in short: I could not get this to work.
- There is a tool for migrating from subversion to git: . I switched to a Linux machine at this point as most instructions are for that and I could not get most of the tools working under windows. Unfortunately it died with a segfault on importing the second revision.
- Another tool which supposedly can do the job: but I haven’t tried that yet.
After not really getting anywhere I followed a hunch: On the intial svnsync the Windows 8 VM I used to test all of this went to sleep several times due to the default power settings of Windows 8. So I killed the software-mirror and ran svnsync again – this time on a Linux VM (as for some reason svn tools performance seems to be much better under Linux) and made sure it run uninterrupted. Then I used hgsubversion again and it got passed the revision it spewed up earlier – hmm, weird.
At some point I realized that Ubuntu 12.10 ships with Mercurial 2.2 while on Windows I used 2.5 with the latest hgsubversion clone. After upgrading to Mercurial 2.5 and checking out the latest hgsubversion from bitbucket I ran into the same “trying to open a deleted file” problem at the same revision again. Coincidentally a little while later someone posted a bug report of exactly this problem.
Anyway, I continued the pull with Mercurial 2.2 and everything seemed fine until it got to approx. 18,000 revisions (which took close to 20h and Mercurial ballooned to 5.5GB of memory usage) where it failed with an
AssertionError in subvertpy:
** unknown exception encountered, please report by visiting ** http://mercurial.selenic.com/wiki/BugTracker ** Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2] ** Mercurial Distributed SCM (version 2.2.2) ** Extensions loaded: fastimport, hgsubversion Traceback (most recent call last): File "/usr/bin/hg", line 38, in <module> mercurial.dispatch.run() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch return _runcatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch return _dispatch(req) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 696, in _dispatch cmdpats, cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 472, in runcommand ret = _runcommand(ui, options, cmd, d) File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 786, in _runcommand return checkargs() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 757, in checkargs return cmdfunc() File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 693, in <lambda> d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap util.checksignature(origfn), *args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/wrappers.py", line 538, in generic return orig(ui, repo, *args, **opts) File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check return func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/mercurial/commands.py", line 4458, in pull modheads = repo.pull(other, heads=revs, force=opts.get('force')) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnrepo.py", line 76, in wrapper return fn(self, *args, **opts) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnrepo.py", line 99, in pull return wrappers.pull(self, remote, heads, force) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/wrappers.py", line 358, in pull firstrun) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/replay.py", line 67, in convert_rev svn.get_replay(r.revnum, editor, meta.revmap.oldest) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnwrap/subvertpy_wrapper.py", line 422, in get_replay self.remote.replay(revision, oldestrev, AbstractEditor(editor)) File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/editor.py", line 357, in txdelt_window handler(window) File "/usr/lib/python2.7/dist-packages/subvertpy/delta.py", line 84, in apply_window target_stream.write(apply_txdelta_window(sbuf, window)) File "/usr/lib/python2.7/dist-packages/subvertpy/delta.py", line 57, in apply_txdelta_window raise AssertionError("%d != %d" % (len(tview), tview_len)) AssertionError: 473 != 474
Oh well, maybe a bug in an older version. As I was past the dreaded “trying to open a deleted file” revision I upgraded to Mercurial 2.5 and ran again – same problem. However this time there was a helpful message appended:
Your SVN repository may not be supplying correct replay deltas. It is strongly
advised that you repull the entire SVN repository using hg pull –stupid.
Alternatively, re-pull just this revision using –stupid and verify that the
changeset is correct.
Ok, lets try
hg pull -r 17890 --stupid
And it broke:
ValueError: 20-byte hash required
After some research into the issue I came across this bug report on bitbucket which essentially says: “-r doesn’t work like that with svn repositories, try url#revision instead”.
hg pull file://`pwd`/software-mirror#17890 --stupid
ran into the same problem – alright, lets do it without a specific revision.
hg pull --stupid
This seems to work. So I aborted it once it got passed the bad revision and continued a normal pull (without stupid) and that got it going again.