Ceph: de factor storage backend for OpenStack OpenStack Summit 2013 Hong Kong
Whoami Worldwide offices coverage 💥 Sébastien Han 💥 French Cloud Engineer working for eNovance 💥 Daily job focused on Ceph and OpenStack 💥 Blogger Personal blog: http://www.sebastien-han.fr/blog/ Company blog: http://techs.enovance.com/ Worldwide offices coverage We design, build and run clouds – anytime - anywhere
Ceph What is it?
The project Unified distributed storage system Started in 2006 as a PhD by Sage Weil Open source under LGPL license Written in C++ Build the future of storage on commodity hardware Insist on commodity hardware: Open source so no vendor lock-in No software nor hardware lock-in You don’t need big boxes anymore You can diverse hardware (old, crap, recent) Which means that it moves along with your needs and your budget as well And obviously it makes it easy to test
Key features Self managing/healing Self balancing Painless scaling Data placement with CRUSH It provides numerous features Self healing: if something breaks, the cluster reacts and triggers a recovery process Self balancing, as soon as you add a new disk or a new node, the cluster moves and re-balance data Self managing: periodic tasks such as scrubbing to check object consistency and if something is wrong ceph repairs the object Painless scaling: it’s fearly easy to add a new disk, node especially with all the tools outthere to deploy ceph (puppet, chef, ceph-deploy) Intelligent data placement, so you can logically reflect your physical infrastructure and you can build placement rules objects are automatically placed, balanced, migrated in a dynamic cluster
Controlled replication under scalable hashing Pseudo-random placement algorithm Statistically uniform distribution Rule-based configuration Controlled replication under scalable hashing pseudo-random placement algorithm fast calculation, no lookup repeatable, deterministic statistically uniform distribution rule-based configuration infrastructure topology aware adjustable replication The way CRUSH is configured is somewhat unique. Instead of defining pools for different data types, workgroups, subnets, or applications, CRUSH is configured with the physical topology of your storage network. You tell it how many buildings, rooms, shelves, racks, and nodes you have, and you tell it how you want data placed. For example, you could tell CRUSH that it’s okay to have two replicas in the same building, but not on the same power circuit. You also tell it how many copies to keep.
Overview RADOS is a distributed object store, and it’s the foundation for Ceph. On top of RADOS, we have built three systems that allow us to store data Several ways to access data RGW Native RESTful S3 and Swift compatible Multi-tenant and quota Multi-site capabilities Disaster recovery RBD Thinly provisioned Full and Incremental Snapshots Copy-on-write cloning Native Linux kernel driver support Supported by KVM and Xen CephFS POSIX-compliant semantics Subdirectory snapshots
Building a Ceph cluster General considerations
How to start? ➜ Use case ➜ Amount of data (usable not RAW) IO profile: Bandwidth? IOPS? Mixed? Guaranteed IOs : how many IOPS or Bandwidth per client do I want to deliver? Usage: do I use Ceph in standalone or is it combined with a software solution? ➜ Amount of data (usable not RAW) Replica count Failure ratio - How much data am I willing to rebalance if a node fail? Do I have a data growth planning? ➜ Budget :-)
Things that you must not do ➜ Don't put a RAID underneath your OSD Ceph already manages the replication Degraded RAID breaks performances Reduce usable space on the cluster ➜ Don't build high density nodes with a tiny cluster Failure consideration and data to re-balance Potential full cluster ➜ Don't run Ceph on your hypervisors (unless you're broke)
State of the integration Including best Havana’s additions
Why is Ceph so good? It unifies OpenStack components Ceph tighly interacts with openstack components
Havana’s additions Complete refactor of the Cinder driver: Librados and librbd usage Flatten volumes created from snapshots Clone depth Cinder backup with a Ceph backend: backing up within the same Ceph pool (not recommended) backing up between different Ceph pools backing up between different Ceph clusters Support RBD stripes Differentials Nova Libvirt_image_type = rbd Directly boot all the VMs in Ceph Volume QoS
Today’s Havana integration
Is Havana the perfect stack? …
Well, almost…
What’s missing? Direct URL download for Nova Already on the pipe, probably for 2013.2.1 Nova’s snapshots integration Ceph snapshot https://github.com/jdurgin/nova/commits/havana-ephemeral-rbd
Icehouse and beyond Future
Tomorrow’s integration
Icehouse roadmap « J » potential roadmap Implement “bricks” for RBD Re-implement snapshotting function to use RBD snapshot RBD on Nova bare metal Volume migration support RBD stripes support « J » potential roadmap Manila support
Ceph, what’s coming up? Roadmap
Firefly Tiering - cache pool overlay Erasure code Ceph OSD ZFS Full support of OpenStack Icehouse
Many thanks! Questions? Contact: sebastien@enovance.com Twitter: @sebastien_han IRC: leseb