Migrating from subversion to mercurial

Note: The below was in draft for quite some time. We actually moved to git so I didn’t follow this to it’s ultimate conclusion. I effectively aborted this after the conversion to hg consumed the 220GB of available disk space before it manage to completely convert the entire svn repo. I didn’t bother increasing the disk space since lugging around a 220+GB repo wasn’t in any way practical.

In any case some of the following may proof useful so I’m publishing it as is and as far as I got.


We are currently looking at migrating our subversion repository to mercurial including all the history and for some reason this seemed harder than it was in the end. Maybe this post will help someone out, so here you go:

Our repository has close to 25,000 revisions and checked out is approx. 1GB in size. Most ways of converting it recommend to create a local copy of your repository with svnsync first so this is what I did (on a windows machine):

First I installed TortoiseHg from the mercurial website (the all in one 64bit installer) and TortoiseSvn with the command line tools.

Creating a local subversion clone:

cd C:\
mkdir repos
cd repos
svnadmin create software-mirror
echo 'exit 0' > software-mirror/hooks/pre-revprop-change.bat
svnsync init file:///c:/repos/software-mirror svn://myserver/software
svnsync sync file:///c:/repos/software-mirror

This took about 4h. Interestingly I also did this on a Linux machine running Ubuntu 12.10 and it only took half the time (same VM hardware specs, same network, same VM server).

Now we can go on to convert the repository. When searching the Internet the first way of doing it I came across was the convert extension. So, open TortoiseHg, enable the convert extension and run:

cd C:\repos
hg convert software-mirror

Now after 48h of running it managed to convert 2,000 revisions, the process was using 2.8GB of RAM (with peaks at 3.5GB) and has created close to 2 million(!) files – WTF? So convert: FAIL.

I did some more research on the web during that time and came across multiple posts saying that converting large repositories with convert might not be so good as it a) might do the wrong thing (i.e. wrong commits on the wrong branches) and b) might fail anyway with an out of memory exception (although on a 64bit system it seems like it might just go into swap hell at some point). The alternative suggested was hgsubversion.

hgsubversion needs to be installed separately as it is not bundled but that proved fairly painless even on Windows:

cd C:\
mkdir hgext
cd hgext
hg clone http://bitbucket.org/durin42/hgsubversion hgsubversion

And add the extension to your mercurial.ini. For Windows 7+ (probably even Vista+) this should be located under C:\Users\youruser\:

[extensions]
hgsubversion = C:\hgext\hgsubversion\hgsubversion

Now we should be able to clone a subversion repository as mercurial repository. I combined the suggestion from with the suggestion from and cloned the first revision and then use pull to load the remainder of the revisions:

cd c:\repos
hg clone -r1 --config hgsubversion.defaulthost=mycompany.com file:///c:/repos/software-mirror software-hg
cd software-hg
hg pull

Our subversion usernames can be easily mapped to email addresses as username@mycompany.com hence the defaulthost setting. The pull made good and fast progress (took only 5min to pull the first 2,000 revision compared to the 48h for the convert extension). Unfortunately after 7,500 revision the pull failed with “trying to open a deleted file”. Huh? The revision in question was a tag which was no different from all the other tags (our build machine automatically tags all builds). Now I don’t really care about that specific tag but unfortunately there is no way to instruct hg pull to skip this revision. So hgsubversion: FAIL.

Now, what other options do we have? I guess I could try to skip the revision in question when doing the svnsync in the first place but I decided to try something else: There is this fast-import format which seems to be emerging as a repository independent exchange format. So why not do it this way?

Unfortunately there does not seem to be a good tool around which create fast-import dumps from a subversion repository. Here is what I looked at:

  1. svnadmin and svnrdump do not produce dumps file in the correct format.
  2. There is a tool in the bazaar tool chain which supposedly can do this: . Every piece of documentation claims that there is a frontend for subversion you should able to use like this bzr fast-export-from-svn but I could not get it to work. All I ever got was “there is no such command” (while bzr help fast-export would show something meaningful from the documentation it states that this is to generate fast-import streams from a bazaar repository). All the documentation says that the frontends are in the “exporters” subdirectory of the plugin but there is no such subdirectory (try bzr branch lp:bzr-fastimport fastimport yourself and check). So in short: I could not get this to work.
  3. There is a tool for migrating from subversion to git: . I switched to a Linux machine at this point as most instructions are for that and I could not get most of the tools working under windows. Unfortunately it died with a segfault on importing the second revision.
  4. Another tool which supposedly can do the job: but I haven’t tried that yet.

After not really getting anywhere I followed a hunch: On the intial svnsync the Windows 8 VM I used to test all of this went to sleep several times due to the default power settings of Windows 8. So I killed the software-mirror and ran svnsync again – this time on a Linux VM (as for some reason svn tools performance seems to be much better under Linux) and made sure it run uninterrupted. Then I used hgsubversion again and it got passed the revision it spewed up earlier – hmm, weird.

At some point I realized that Ubuntu 12.10 ships with Mercurial 2.2 while on Windows I used 2.5 with the latest hgsubversion clone. After upgrading to Mercurial 2.5 and checking out the latest hgsubversion from bitbucket I ran into the same “trying to open a deleted file” problem at the same revision again. Coincidentally a little while later someone posted a bug report of exactly this problem.

Anyway, I continued the pull with Mercurial 2.2 and everything seemed fine until it got to approx. 18,000 revisions (which took close to 20h and Mercurial ballooned to 5.5GB of memory usage) where it failed with an AssertionError in subvertpy:

** unknown exception encountered, please report by visiting
**  http://mercurial.selenic.com/wiki/BugTracker
** Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2]
** Mercurial Distributed SCM (version 2.2.2)
** Extensions loaded: fastimport, hgsubversion
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run
    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch
    return _runcatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch
    return _dispatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 696, in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 472, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 786, in _runcommand
    return checkargs()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 757, in checkargs
    return cmdfunc()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 693, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/wrappers.py", line 538, in generic
    return orig(ui, repo, *args, **opts)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/commands.py", line 4458, in pull
    modheads = repo.pull(other, heads=revs, force=opts.get('force'))
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnrepo.py", line 76, in wrapper
    return fn(self, *args, **opts)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnrepo.py", line 99, in pull
    return wrappers.pull(self, remote, heads, force)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/wrappers.py", line 358, in pull
    firstrun)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/replay.py", line 67, in convert_rev
    svn.get_replay(r.revnum, editor, meta.revmap.oldest)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnwrap/subvertpy_wrapper.py", line 422, in get_replay
    self.remote.replay(revision, oldestrev, AbstractEditor(editor))
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/editor.py", line 357, in txdelt_window
    handler(window)
  File "/usr/lib/python2.7/dist-packages/subvertpy/delta.py", line 84, in apply_window
    target_stream.write(apply_txdelta_window(sbuf, window))
  File "/usr/lib/python2.7/dist-packages/subvertpy/delta.py", line 57, in apply_txdelta_window
    raise AssertionError("%d != %d" % (len(tview), tview_len))
AssertionError: 473 != 474

Oh well, maybe a bug in an older version. As I was past the dreaded “trying to open a deleted file” revision I upgraded to Mercurial 2.5 and ran again – same problem. However this time there was a helpful message appended:

Your SVN repository may not be supplying correct replay deltas. It is strongly
advised that you repull the entire SVN repository using hg pull –stupid.
Alternatively, re-pull just this revision using –stupid and verify that the
changeset is correct.

Ok, lets try

hg pull -r 17890 --stupid

And it broke:

ValueError: 20-byte hash required

After some research into the issue I came across this bug report on bitbucket which essentially says: “-r doesn’t work like that with svn repositories, try url#revision instead”.

Unfortunately

hg pull file://`pwd`/software-mirror#17890 --stupid

ran into the same problem – alright, lets do it without a specific revision.

hg pull --stupid

This seems to work. So I aborted it once it got passed the bad revision and continued a normal pull (without stupid) and that got it going again.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s