Getting Pydio to access a Ceph S3 backend

I’ve been experimenting around with Ceph lately and wanted to hook up a web based front end. A quick google search yielded ownCloud and Pydio. Since ownCloud advertises the S3 backend availability only for the Pro version I decided to give Pydio a go.

Unfortunately this was a bit fraught with difficulties so I just wanted to document the various errors here in case someone else is running into this.
Note that these are just some quick steps to get up and running, you should review file permissions and access rights when installing this on a production server!

The following steps were all executed on a Ubuntu 16.04.01 distribution which ships with PHP 7.0 and Apache 2.4.18

Installing Pydio

Since the community edition only has trusty packages as far as I could find I downloaded the tar archive (version 6.4.2 was at this point) and installed a new site into Apache:

wget https://download.pydio.com/pub/core/archives/pydio-core-6.4.2.tar.gz
tar -xzf pydio-core-6.4.2.tar.gz
sudo mkdir /var/www/pydio
sudo mv pydio-core-6.4.2/* /var/www/pydio
sudo chown -R root:root /var/www/pydio/
sudo chown -R www-data:www-data /var/www/pydio/data

Creating an apache config

sudo vim /etc/apache2/sites-available/pydio.conf

Put this as content:

Alias /pydio "/var/www/pydio/"
<Directory "/var/www/pydio">
  Options +FollowSymLinks
  AllowOverride All

  SetEnv HOME /var/www/pydio
  SetEnv HTTP_HOME /var/www/pydio

Make the site available and restart Apache

cd /etc/apache2/sites-enabled
sudo ln -s ../sites-available/pydio.conf pydio.conf
sudo service apache2 restart

At this stage you should be able to access you Pydio install on a browser via http://serverip/pydio
Pydio has an install wizard which will guide you through setting up an admin user and the database backend (for testing you could just go with SQLite otherwise you will have to setup a Postgres or MySQL database and an associated pydio user)

Hooking up to the Ceph S3 backend

Pydio organizes files into workspaces and there is a plugin for an S3 backed workspace which ships out of the box.
So the next step is to log into Pydio as admin user and make sure the access.S3 plugin is activated. You will probably see an error complaining about the AWS SDK not being installed, so that needs to happen first:

cd /var/www/pydio/plugins/access.s3
sudo -u www-data wget http://docs.aws.amazon.com/aws-sdk-php/v2/download/aws.phar

Since radosgw (the S3 interface into Ceph) only supports v2 signature (at this time 10.2.2 Jewel was current) you cannot use the v3 SDK.
Now the plugin should be showing status OK. Double click it and make sure it uses SDK Version 2
Next step is to create a new workspace and selecting the S3 backend as the storage driver.

  • For Key and Secret Key use the ones created for your user (how to create radosgw users for S3 can be looked up on the internet)
  • Region use US Standard (not sure if it really matters)
  • Container is the bucket you want all the files for this workspace to be stored in. Pydio won’t create the bucket for you so you’ll have to create it with another S3 capable client
  • Signature version set to Version 2 and API Version to 2006-03-01
  • Custom Storage is where you can point to your local radosgw instance, Storage URL is the setting you need for that. You should put in the full URL including protocol, e.g. http://radosgw-server-ip:7480/ (assuming you’re running radosgw on the default port which is 7480 with Jewel release)
  • I’ve disabled the Virtual Host Syntax as well since I’m not sure yet how to make this work.
  • Everything else I’ve left on default settings.

Now the fun begins. Here is the first error message I encountered when trying to access the new workspace:

Argument 1 passed to Aws\S3\S3Md5Listener::__construct() must implement interface Aws\Common\Signature\SignatureInterface, string given

Some quick google seemed to suggest a client written for SDK v3 was trying to use SDK v2, so I started trialing all the combinations combinations of plugin settings and SDKs but I only mostly got HTTP 500 errors which left no trace in any of the logfiles I could find.
Another error I encountered during my experiments was:

Missing required client configuration options:   version: (string)
A "version" configuration value is required. Specifying a version constraint
ensures that your code will not be affected by a breaking change made to the
service. For example, when using Amazon S3, you can lock your API version to
"2006-03-01".
Your build of the SDK has the following version(s) of "s3": * "2006-03-01"
You may provide "latest" to the "version" configuration value to utilize the
most recent available API version that your client's API provider can find.
Note: Using 'latest' in a production application is not recommended.
A list of available API versions can be found on each client's API documentation
page: http://docs.aws.amazon.com/aws-sdk-php/v3/api/index.html.
If you are unable to load a specific API version, then you may need to update
your copy of the SDK

I downgraded to PHP 5.6 to rule out any weird 7.0 incompatibilities which got me a little bit further so I thought that was a problem but ultimately it boiled down to the way how the backend configures the S3 client. In /var/www/pydio/plugins/access.S3/class.s3AccessWrapper.php changing

if (!empty($signatureVersion)) {
    $options['signature'] = $signatureVersion;
}

to

if (!empty($signatureVersion)) {
    $options['signature_version'] = $signatureVersion;
}

kicked everything into life. Not sure if that’s due to a recent change in the v2 SDK (current at this point was 2.8.31) or something else. Looking through the Pydio forums it seems like they tested access to a Ceph S3 backend successfully – so who knows.

Next is trying to make it connect to a self-signed SSL gateway.

Advertisements

Migrating reviewboard from MySQL to PostgreSQL

This is for Ubuntu 12.04, it may vary slightly for other distributions.

  1. Install postgres and libpq-dev (required for django backend)
    sudo apt-get install postgresql libpq-dev
  2. Install psycopg2
    sudo easy_install psycopg2
  3. Create the reviewboard database in postgres and a user with access to it.
    sudo su postgres -c psql
    postgres# CREATE ROLE myuser WITH SUPERUSER;
    postgres# CREATE DATABASE reviewboard WITH OWNER myuser;
    postgres# ALTER ROLE myuser WITH PASSWORD 'secret';
    postgres# \q
  4. Stop apache and any other service which might modify the original database
    sudo service apache2 stop
    sudo service mysql stop

    Note that stopping the mysql deamon might be a little bit drastic it will affect all databases running on that server. In my case reviewboard was the only database soI did it as a precaution.

  5. Dump the original reviewboard database (from MySQL)
    sudo rb-site manage /var/www/yourcodereviewsite dumpdb > reviewboard.dump

    Note that this can take several hours depending on the size.

  6. Edit your local reviewboard config to use Postgres instead of MySQL
    vim /var/www/yourcodereviewsite/conf/settings_local.py

    → change the django backend from mysql to postgresql_psycopg2

  7. Create the reviewboard table structures in the Postgres db
    sudo rb-site manage /var/www/yourcodereviewsite syncdb
  8. Clean default data inserted by the rb-site command (will interfere with loaddb otherwise)
    sudo su postgres -c psql
    postgres# TRUNCATE django_content_type CASCADE;
    postgres# TRUNCATE scmtools_tool CASCADE;
    postgres# \q
  9. Load the MySQL database dump
    sudo rb-site manage /var/www/yourcodereviewsite loaddb reviewboard.dump
  10. Cleanup some database meta data as per https://groups.google.com/forum/#!topic/reviewboard/Ehv0JwthROg:
    psql -t reviewboard -c "SELECT E'select setval(\'' || c.relname || E'\', (select max(id)+1 from ' || replace(c.relname, '_id_seq', '') || '), false);' FROM pg_class c WHERE c.relkind = 'S';" | psql reviewboard
  11. Restart apache
    sudo service apache2 start

Migrating from subversion to mercurial

Note: The below was in draft for quite some time. We actually moved to git so I didn’t follow this to it’s ultimate conclusion. I effectively aborted this after the conversion to hg consumed the 220GB of available disk space before it manage to completely convert the entire svn repo. I didn’t bother increasing the disk space since lugging around a 220+GB repo wasn’t in any way practical.

In any case some of the following may proof useful so I’m publishing it as is and as far as I got.


We are currently looking at migrating our subversion repository to mercurial including all the history and for some reason this seemed harder than it was in the end. Maybe this post will help someone out, so here you go:

Our repository has close to 25,000 revisions and checked out is approx. 1GB in size. Most ways of converting it recommend to create a local copy of your repository with svnsync first so this is what I did (on a windows machine):

First I installed TortoiseHg from the mercurial website (the all in one 64bit installer) and TortoiseSvn with the command line tools.

Creating a local subversion clone:

cd C:\
mkdir repos
cd repos
svnadmin create software-mirror
echo 'exit 0' > software-mirror/hooks/pre-revprop-change.bat
svnsync init file:///c:/repos/software-mirror svn://myserver/software
svnsync sync file:///c:/repos/software-mirror

This took about 4h. Interestingly I also did this on a Linux machine running Ubuntu 12.10 and it only took half the time (same VM hardware specs, same network, same VM server).

Now we can go on to convert the repository. When searching the Internet the first way of doing it I came across was the convert extension. So, open TortoiseHg, enable the convert extension and run:

cd C:\repos
hg convert software-mirror

Now after 48h of running it managed to convert 2,000 revisions, the process was using 2.8GB of RAM (with peaks at 3.5GB) and has created close to 2 million(!) files – WTF? So convert: FAIL.

I did some more research on the web during that time and came across multiple posts saying that converting large repositories with convert might not be so good as it a) might do the wrong thing (i.e. wrong commits on the wrong branches) and b) might fail anyway with an out of memory exception (although on a 64bit system it seems like it might just go into swap hell at some point). The alternative suggested was hgsubversion.

hgsubversion needs to be installed separately as it is not bundled but that proved fairly painless even on Windows:

cd C:\
mkdir hgext
cd hgext
hg clone http://bitbucket.org/durin42/hgsubversion hgsubversion

And add the extension to your mercurial.ini. For Windows 7+ (probably even Vista+) this should be located under C:\Users\youruser\:

[extensions]
hgsubversion = C:\hgext\hgsubversion\hgsubversion

Now we should be able to clone a subversion repository as mercurial repository. I combined the suggestion from with the suggestion from and cloned the first revision and then use pull to load the remainder of the revisions:

cd c:\repos
hg clone -r1 --config hgsubversion.defaulthost=mycompany.com file:///c:/repos/software-mirror software-hg
cd software-hg
hg pull

Our subversion usernames can be easily mapped to email addresses as username@mycompany.com hence the defaulthost setting. The pull made good and fast progress (took only 5min to pull the first 2,000 revision compared to the 48h for the convert extension). Unfortunately after 7,500 revision the pull failed with “trying to open a deleted file”. Huh? The revision in question was a tag which was no different from all the other tags (our build machine automatically tags all builds). Now I don’t really care about that specific tag but unfortunately there is no way to instruct hg pull to skip this revision. So hgsubversion: FAIL.

Now, what other options do we have? I guess I could try to skip the revision in question when doing the svnsync in the first place but I decided to try something else: There is this fast-import format which seems to be emerging as a repository independent exchange format. So why not do it this way?

Unfortunately there does not seem to be a good tool around which create fast-import dumps from a subversion repository. Here is what I looked at:

  1. svnadmin and svnrdump do not produce dumps file in the correct format.
  2. There is a tool in the bazaar tool chain which supposedly can do this: . Every piece of documentation claims that there is a frontend for subversion you should able to use like this bzr fast-export-from-svn but I could not get it to work. All I ever got was “there is no such command” (while bzr help fast-export would show something meaningful from the documentation it states that this is to generate fast-import streams from a bazaar repository). All the documentation says that the frontends are in the “exporters” subdirectory of the plugin but there is no such subdirectory (try bzr branch lp:bzr-fastimport fastimport yourself and check). So in short: I could not get this to work.
  3. There is a tool for migrating from subversion to git: . I switched to a Linux machine at this point as most instructions are for that and I could not get most of the tools working under windows. Unfortunately it died with a segfault on importing the second revision.
  4. Another tool which supposedly can do the job: but I haven’t tried that yet.

After not really getting anywhere I followed a hunch: On the intial svnsync the Windows 8 VM I used to test all of this went to sleep several times due to the default power settings of Windows 8. So I killed the software-mirror and ran svnsync again – this time on a Linux VM (as for some reason svn tools performance seems to be much better under Linux) and made sure it run uninterrupted. Then I used hgsubversion again and it got passed the revision it spewed up earlier – hmm, weird.

At some point I realized that Ubuntu 12.10 ships with Mercurial 2.2 while on Windows I used 2.5 with the latest hgsubversion clone. After upgrading to Mercurial 2.5 and checking out the latest hgsubversion from bitbucket I ran into the same “trying to open a deleted file” problem at the same revision again. Coincidentally a little while later someone posted a bug report of exactly this problem.

Anyway, I continued the pull with Mercurial 2.2 and everything seemed fine until it got to approx. 18,000 revisions (which took close to 20h and Mercurial ballooned to 5.5GB of memory usage) where it failed with an AssertionError in subvertpy:

** unknown exception encountered, please report by visiting
**  http://mercurial.selenic.com/wiki/BugTracker
** Python 2.7.3 (default, Sep 26 2012, 21:51:14) [GCC 4.7.2]
** Mercurial Distributed SCM (version 2.2.2)
** Extensions loaded: fastimport, hgsubversion
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 27, in run
    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 64, in dispatch
    return _runcatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 87, in _runcatch
    return _dispatch(req)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 696, in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 472, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 786, in _runcommand
    return checkargs()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 757, in checkargs
    return cmdfunc()
  File "/usr/lib/python2.7/dist-packages/mercurial/dispatch.py", line 693, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/wrappers.py", line 538, in generic
    return orig(ui, repo, *args, **opts)
  File "/usr/lib/python2.7/dist-packages/mercurial/util.py", line 463, in check
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mercurial/commands.py", line 4458, in pull
    modheads = repo.pull(other, heads=revs, force=opts.get('force'))
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnrepo.py", line 76, in wrapper
    return fn(self, *args, **opts)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnrepo.py", line 99, in pull
    return wrappers.pull(self, remote, heads, force)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/wrappers.py", line 358, in pull
    firstrun)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/replay.py", line 67, in convert_rev
    svn.get_replay(r.revnum, editor, meta.revmap.oldest)
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/svnwrap/subvertpy_wrapper.py", line 422, in get_replay
    self.remote.replay(revision, oldestrev, AbstractEditor(editor))
  File "/usr/lib/python2.7/dist-packages/hgext/hgsubversion/editor.py", line 357, in txdelt_window
    handler(window)
  File "/usr/lib/python2.7/dist-packages/subvertpy/delta.py", line 84, in apply_window
    target_stream.write(apply_txdelta_window(sbuf, window))
  File "/usr/lib/python2.7/dist-packages/subvertpy/delta.py", line 57, in apply_txdelta_window
    raise AssertionError("%d != %d" % (len(tview), tview_len))
AssertionError: 473 != 474

Oh well, maybe a bug in an older version. As I was past the dreaded “trying to open a deleted file” revision I upgraded to Mercurial 2.5 and ran again – same problem. However this time there was a helpful message appended:

Your SVN repository may not be supplying correct replay deltas. It is strongly
advised that you repull the entire SVN repository using hg pull –stupid.
Alternatively, re-pull just this revision using –stupid and verify that the
changeset is correct.

Ok, lets try

hg pull -r 17890 --stupid

And it broke:

ValueError: 20-byte hash required

After some research into the issue I came across this bug report on bitbucket which essentially says: “-r doesn’t work like that with svn repositories, try url#revision instead”.

Unfortunately

hg pull file://`pwd`/software-mirror#17890 --stupid

ran into the same problem – alright, lets do it without a specific revision.

hg pull --stupid

This seems to work. So I aborted it once it got passed the bad revision and continued a normal pull (without stupid) and that got it going again.