Presentation is loading. Please wait.

Presentation is loading. Please wait.

SWITCHdrive Experience with running Owncloud on top of Openstack/Ceph

Similar presentations


Presentation on theme: "SWITCHdrive Experience with running Owncloud on top of Openstack/Ceph"— Presentation transcript:

1 SWITCHdrive Experience with running Owncloud on top of Openstack/Ceph
Experience with running Owncloud on top of Openstack/Ceph SWITCH is the Swiss NREN (National Research and Education Network) This presentation is about one of SWITCH’ services: SWITCHdrive, our Sync & Share Service which runs on an Openstack cluster Christian Schnidrig Zürich, Jan 19, 2016

2 I’m a cloud engineer in the SWITCH Peta Solutions Team
I’m a cloud engineer in the SWITCH Peta Solutions Team We offer two services to Swiss academia SWITCHdrive (Sync & Share based on Owncloud) SWITCHengines (IAAS based on Openstack) We started 2 years ago. Owncloud and Openstack were all new to us, therefore we decided to start of with a free trial for both of these services. Since our team offered these two services, we thought it to be a good idea to run Owncloud on Openstack. Was good decision since It really runs quite well We had a large engines customer right from the start Taught us a lot and helped us detect problems early We had a great incentive to fix problems fast.

3 Openstack (Juno) 118 servers Ceph (Hammer) 2008 cores 17 TiB RAM
Openstack (Juno) Ceph (Hammer) 118 servers 2008 cores 17 TiB RAM 2.25 PiB storage Our little tiny cloud…. Paid service since

4 Owncloud 7.0 (8.1 in Feb) Storage: 33 TiB Users: 14’000
Owncloud 7.0 (8.1 in Feb) Users: 14’000 Quota: 25 GiB / user Number of files: 32M Storage: 33 TiB 22 TiB live data 3 TiB trash 2.5 TiB versions 1 TiB thumbnails 4.5 TiB cache Paid service since

5 Starting Architecture (PoC)
Starting Architecture (PoC) Looking back: Our architecture looked like this…. We had no experience Started off with vanilla architecture

6 Classical 3-tier architecture Standard, Owncloud deployment using what was available on our Openstack cluster Separated sync and web customers -> input from TU-Berlin We knew this architecture had problems (NFS, DB on Ceph) but we had no other storage available and we wanted to know how far we can get with this. Very little operations work needed Worked surprisingly well up to ~ 4000 users & 12 TiB

7 Why it worked so well … Why did it work with < 5000 users? You wont find anything wrong, if you are not looking (or as in our case) not looking at the right place/things. Our customers where happy, Owncloud does not offer any benchmarks. You can’t know how well you are doing. Our monitoring was all green. Upgrade to 7.0 did not go smoothly Software from OC had severe bugs In addition it was much more heavy on db and our server was not strong enough to handle the load. We needed to get better understanding of the system, the load, and the possible bottle necks -> know roughly at what percentage of capacity we are running.

8 Collectd, Graphite & Grafana
Collectd, Graphite & Grafana We started to collect all performance measurements we could possibly get Displayed all measurements then while peeking and pocking the system watched the graphs and tried to understand what is going on and why. No longer have old graph showing the problem areas we had a year ago. But here a few counters of our todays system showing key performance measurements.

9 Understand the database Again disk utilization -> SSD needed. CPU usage -> larger VM needed DB Size -> more RAM needed Number of connections -> limited with PostgreSQL Missing indices: seq scan vs idx scan

10 Elasticsearch, Logstash & Kibana (ELK)
Elasticsearch, Logstash & Kibana (ELK) Another great tool is ELK For centralized log analysis Find errors early and quickly

11 Or some simple data mining on logs With Kibana or Grafana querying elastic search log entries.

12 One can have pretty graphs like this. No practical value, but you can pat on your own shoulder and say. Oh we’ve got customers all over the world and that with a trial service!

13 What changed Drop Requests Queue Requests One Large Volume
What changed Drop Requests Queue Requests One Large Volume Many small Volumes The things we changed that had the greatest impact on performance: Queue requests: We limited the number of session per app server to a number we knew it could handle Then we made sure, that the superfluous requests didn’t get dropped. Smaller Ceph volumes -> better utilization -> better performance -> can be spread over several NFS servers DB server: We had to pip the DB server a lot. -> SSD, RAM, CPUs (Scale UP) Not that great since it is not very scalable and not cloud friendly DB on Ceph DB on local SSD

14 Current Architecture 21.07.2018 System runs very well
We got rid of all bottle necks However architecture does not scale -> DB server SPOFs -> NFS servers

15 Desired Architecture 21.07.2018 We want our service scale better

16 What is still needed? 21.07.2018 Object store instead of NFS
The current implementation of SWIFT would not scale to the size we currently have -> OC needs to refactor this PoC implementation. limits: all object stored in one single container (Rackspace recommends < 1M files per container) Large files > 5GiB are not supported (multipart upload not implemented) Consistency problems are almost certain to happen with the current implementation. No migration path from Posix file-system to object store DB: scale out with PostgreSQL proves to be very difficult -> migration to Galera/Maxscale soon. NoSQL db would be appreciated Scale out with “federated cloud sharing” It work in principle, however we still need some fixes in the sync client in order to be able to migrate to such a solution.

17 ? christian.schnidrig@switch.ch oc@lists.geant.org 21.07.2018
is a great list with lots of smart people following it. This is where you can get help!


Download ppt "SWITCHdrive Experience with running Owncloud on top of Openstack/Ceph"

Similar presentations


Ads by Google