Pimcore shared FS - Can we replace it with S3

Hey Pimcore People

I have been using Pimcore on and off with a few clients for a little while, and I’ve got a bit of a bug bear with it, that has stopped me promoting it more with some of the clients I work with - and that’s the Pimcore requirement for a shared FS.

Mounting an NFS volume (or similar) is exceptionally 1999. It’s difficult to operationally support, when it goes wrong, it goes wrong in exciting ways - and with more modern solutions like S3 from Amazon, or any other type of block storage, there are just so many other ways to solve this problem.

I am actually quite happy to write this as a customisation to Pimcore, and submit a PR to give people the option to use a service like S3 (or perhaps a generic driver to integrate with many storage providers?), to solve the problem of needing to sync things like versions across web servers (or in my case, Docker containers).

The thing that’s holding me back is understanding what in Pimcore needs to write to the FS…

Things I know will need to write to the FS:

  • Logs - Fine. Lots of PHP applications still write log files, because becoming a 12 factor PHP app is still a bit of a nightmare
  • Assets - I believe that there is a Pimcore plugin to store assets in S3? But I’ve not looked into it, if not, something to write
  • File Versions - Every time a version is saved, a file gets written to the FS containing the serialised object (shudders)

… and then what else … ?

I don’t mind things like cache files being written to FS if needs be (things like view caches and such (so I am not proposing to write a change to that) - but what else does Pimcore write to the machine FS that I will need to update, to be able to feature switch on storing files outside the local machine.

Is everything just in Pimcore\Model\Version ?

My thinking is that now that Pimcore is getting away from Zend Framework 1.2 (hurrah!) - we can bring it genuinely up to date, in terms of the methodologies that it uses - because right now, it’s not scalable without effort. And these problems have been solved before.

Thanks

Tom

Hi,
did you already have a look at
https://www.pimcore.org/docs/latest/Installation_and_Upgrade/System_Setup_and_Hosting/Amazon_AWS_Setup/index.html
and https://www.pimcore.org/docs/latest/Installation_and_Upgrade/System_Setup_and_Hosting/Amazon_AWS_Setup/Amazon_AWS_S3_Setup.html?

We didn’t had the chance to migrate and test it to Pimcore 5 but we are happy about every input in that direction.

BR
Christian

I had not seen that!

I will test this on the current version of Pimcore we use and on 5.x when I get chance, I will feed back any changes/problems.

This is suitable, but not exactly what I was thinking - This system of using S3 as the constant source without any intermediary cache, will be expensive on S3 calls - also s3 downloads will be fully blocking…

I will have a play, and if I can come up with something imaginative, I will submit a PR

thank you very much. looking forward to hear from you.

cheers…

Dear flock3, did you manage to investigate this? any updates?

FYI As you pointed out, S3 calls could be a little bit expensive, but there is another solution. If you decide (as we did inhouse) to use Ceph file system you will be able to isolate Assets. As our main goal is to 100% automate pimcore staging/deploying/dockerable etc - the biggest problem are “Assetes” becouse they live in project directory stucture.

Ceph has implementation of S3 calls. Their docs: “Ceph supports a RESTful API that is compatible with the the basic data access model of the Amazon S3 API.”). We are testing it now, and if our test will be success I will describe here how we did it.

best regards,
Mike

Hey Mike

That sounds cool - Interestingly assets are the least of our troubles at the moment, I don’t mind having an environment reset that flatterns and rebuilds the test/staging environments on every deployment

I’ve not yet had chance to investigate this, but when I do - I will pop my results in here :slight_smile:

If somebody decides to implement S3 or Ceph, I recommend using a filesystem abstraction layer, something like this:

https://github.com/thephpleague/flysystem or https://github.com/KnpLabs/Gaufrette

So, this would only need to be implemented once and you can configure different “filesystem” providers.

regards