Getting Pydio to access a Ceph S3 backend

I’ve been experimenting around with Ceph lately and wanted to hook up a web based front end. A quick google search yielded ownCloud and Pydio. Since ownCloud advertises the S3 backend availability only for the Pro version I decided to give Pydio a go.

Unfortunately this was a bit fraught with difficulties so I just wanted to document the various errors here in case someone else is running into this.
Note that these are just some quick steps to get up and running, you should review file permissions and access rights when installing this on a production server!

The following steps were all executed on a Ubuntu 16.04.01 distribution which ships with PHP 7.0 and Apache 2.4.18

Installing Pydio

Since the community edition only has trusty packages as far as I could find I downloaded the tar archive (version 6.4.2 was at this point) and installed a new site into Apache:

wget https://download.pydio.com/pub/core/archives/pydio-core-6.4.2.tar.gz
tar -xzf pydio-core-6.4.2.tar.gz
sudo mkdir /var/www/pydio
sudo mv pydio-core-6.4.2/* /var/www/pydio
sudo chown -R root:root /var/www/pydio/
sudo chown -R www-data:www-data /var/www/pydio/data

Creating an apache config

sudo vim /etc/apache2/sites-available/pydio.conf

Put this as content:

Alias /pydio "/var/www/pydio/"
<Directory "/var/www/pydio">
  Options +FollowSymLinks
  AllowOverride All

  SetEnv HOME /var/www/pydio
  SetEnv HTTP_HOME /var/www/pydio

Make the site available and restart Apache

cd /etc/apache2/sites-enabled
sudo ln -s ../sites-available/pydio.conf pydio.conf
sudo service apache2 restart

At this stage you should be able to access you Pydio install on a browser via http://serverip/pydio
Pydio has an install wizard which will guide you through setting up an admin user and the database backend (for testing you could just go with SQLite otherwise you will have to setup a Postgres or MySQL database and an associated pydio user)

Hooking up to the Ceph S3 backend

Pydio organizes files into workspaces and there is a plugin for an S3 backed workspace which ships out of the box.
So the next step is to log into Pydio as admin user and make sure the access.S3 plugin is activated. You will probably see an error complaining about the AWS SDK not being installed, so that needs to happen first:

cd /var/www/pydio/plugins/access.s3
sudo -u www-data wget http://docs.aws.amazon.com/aws-sdk-php/v2/download/aws.phar

Since radosgw (the S3 interface into Ceph) only supports v2 signature (at this time 10.2.2 Jewel was current) you cannot use the v3 SDK.
Now the plugin should be showing status OK. Double click it and make sure it uses SDK Version 2
Next step is to create a new workspace and selecting the S3 backend as the storage driver.

  • For Key and Secret Key use the ones created for your user (how to create radosgw users for S3 can be looked up on the internet)
  • Region use US Standard (not sure if it really matters)
  • Container is the bucket you want all the files for this workspace to be stored in. Pydio won’t create the bucket for you so you’ll have to create it with another S3 capable client
  • Signature version set to Version 2 and API Version to 2006-03-01
  • Custom Storage is where you can point to your local radosgw instance, Storage URL is the setting you need for that. You should put in the full URL including protocol, e.g. http://radosgw-server-ip:7480/ (assuming you’re running radosgw on the default port which is 7480 with Jewel release)
  • I’ve disabled the Virtual Host Syntax as well since I’m not sure yet how to make this work.
  • Everything else I’ve left on default settings.

Now the fun begins. Here is the first error message I encountered when trying to access the new workspace:

Argument 1 passed to Aws\S3\S3Md5Listener::__construct() must implement interface Aws\Common\Signature\SignatureInterface, string given

Some quick google seemed to suggest a client written for SDK v3 was trying to use SDK v2, so I started trialing all the combinations combinations of plugin settings and SDKs but I only mostly got HTTP 500 errors which left no trace in any of the logfiles I could find.
Another error I encountered during my experiments was:

Missing required client configuration options:   version: (string)
A "version" configuration value is required. Specifying a version constraint
ensures that your code will not be affected by a breaking change made to the
service. For example, when using Amazon S3, you can lock your API version to
"2006-03-01".
Your build of the SDK has the following version(s) of "s3": * "2006-03-01"
You may provide "latest" to the "version" configuration value to utilize the
most recent available API version that your client's API provider can find.
Note: Using 'latest' in a production application is not recommended.
A list of available API versions can be found on each client's API documentation
page: http://docs.aws.amazon.com/aws-sdk-php/v3/api/index.html.
If you are unable to load a specific API version, then you may need to update
your copy of the SDK

I downgraded to PHP 5.6 to rule out any weird 7.0 incompatibilities which got me a little bit further so I thought that was a problem but ultimately it boiled down to the way how the backend configures the S3 client. In /var/www/pydio/plugins/access.S3/class.s3AccessWrapper.php changing

if (!empty($signatureVersion)) {
    $options['signature'] = $signatureVersion;
}

to

if (!empty($signatureVersion)) {
    $options['signature_version'] = $signatureVersion;
}

kicked everything into life. Not sure if that’s due to a recent change in the v2 SDK (current at this point was 2.8.31) or something else. Looking through the Pydio forums it seems like they tested access to a Ceph S3 backend successfully – so who knows.

Next is trying to make it connect to a self-signed SSL gateway.

Advertisements