Download presentation
Presentation is loading. Please wait.
Published byScott Stewart Modified over 9 years ago
1
Sheepdog: Yet Another All-In-One Storage For Openstack Openstack Hong Kong Summit Liu Yuan 2013.11.8
2
Who I Am Active contributor to various open source projects such as Sheepdog, QEMU, Linux Kernel, Xen, Openstack, etc. Primary top contributor of Sheepdog project and co-maintains it with Kazutaka Morita from NTT Japan since 2011.1 Technically lead the storage projects based on Sheepdog for internal uses of www.taobao.com Contacts – Email: namei.unix@gmail.com – Micro Blog: @ 淘泰来
3
Agenda Introduction - Sheepdog Overview Exploration - Sheepdog Internals Openstack - Sheepdog Goal Roadmap - Features From The Future Industry - How Industry Use Sheepdog
4
Sheepdog Overview Introduction
5
What Is Sheepdog Distributed Object Storage System In User Space – Manage Disks and Nodes Aggregate the capacity and the power (IOPS + throughput) Hide the failure of hardware Dynamically grow or shrink the scale – Manage Data Provide redundancy mechanisms (replication and erasure code) for high-availability Secure the data with auto-healing and auto-rebalanced mechanisms – Provide Services Virtual volume for QEMU VM, iSCSI TGT (Perfectly supported by upstream) RESTful container (Openstack Swift and Amazon S3 Compatible, in progress) Storage for Openstack Cinder, Glance, Nova (Available for Havana)
6
Sheepdog Architecture Gateway Store 1TB Gateway Store 1TB 2TB Gateway Store 1TB2TB X Private Hash Ring Global Consistent Hash Ring and P2P Node event: global rebalance Disk event: local rebalance No meta servers! Zookeeper: membership management and message queue 4TB sheep daemon dog admin tool Hot-plugged Auto unplugged on EIO
7
Why Sheepdog Minimal assumptions of underlying kernel and file system – Any type of file systems that support extended attribute(xattr) – Only require kernel version >= 2.6.32 Full of features – snapshot, clone, incremental backup, cluster-wide snapshot, discard, etc. – User-defined replication/erasure code scheme on VDI(Virtual Disk Image) basis – Auto node/disk management Easy to set up the cluster with thousands of nodes – Single daemon can manage unlimited number of disks in one node as efficient as RAID0 – as many as 6k+ for a single cluster Small – Fast and very small memory footprint (less than 50 MB even when busy) – Easy to hack and maintain, 35K lines of code in C as of now
8
Sheepdog Internals Exploration
9
Sheepdog In A Nutshell /dev/vda Shared Persistent Cache Gateway 1TB 2TB Journal Net Data Replicated or Erasure coded Weighted Nodes & Disks 256G SSD 4G Store
10
Sheepdog Volume Copy-On-Write Snapshot – Disk-only snapshot, disk & memory snapshot – Live snapshot, offline snapshot – Rollback(tree structure), clone – Incremental backup – Instant operation, only create 4M inode object Push many logics into client -> simple and fast code ! – Only 4 opcodes for store, read/write/create/remove, snapshot is done by QEMU block driver or dog – Requests serialization is not handled by Sheepdog but client – Inode object is treated the same as data object
11
Gateway - Request Engine 1TB 2TB 4TB Store Req Queue Node Ring Cache Req Manager Route Retry Net Sheep X X Degraded to GW-only On EIO Retry On Error(Timeout, EIO) Return Only All Succeed Concurrent Handling Socket Cache
12
Store - Data Engine 1TB 2TB X Disk Ring Journal Disk Manager Net Sheep Auto unplugged on EIO 1. Fake network err to ask GW retry 2. Update disk ring 3. Start local data rebalance GW
13
Redundancy Scheme Sheep Full Replication Sheep Erasure Coding Parity
14
Erasure Coding Over Full Replication Advantages – Far less storage overhead – Rumors breaking Better R/W performance Support random R/W Can run VM Images ! Disadvantages – Generate more traffic for recovery (X + Y)/(Y + 1) times data ( Suppose X data, Y parity strips) – 2 times data for 4:2 compared to 3 full copies
15
Recovery - Redundancy Repair & Data Rebalance Node Ring Recovery Manager X Net Sleep Queue Req Queue Sheep If lost, read or rebuild from other copy Sheep Migrate to other node for rebalance Update Version Schedule
16
Recovery Cont. Eager Recovery As Default – Allow users to stop it and do manual recovery temporarily Node/Disk Events Handling – Disk event and node event share the same algorithm Handle mixed node and disk events nicely – Subsequent event will supersede previous one Handle group join/leave of disks and nodes gracefully Recovery Handling Transparently to the Client – Put requests for objects being recovered on sleep queue and wake it up later – Serve the request directly if object is right there in the store
17
Farm - Cluster Wide Snapshot Sheep Net Dedicated Backup Storage Slicer Hash Dedupper 128K Meta Data Generator Incremental backup Up to 50% dedup ratio Compression doesn't help Think of Sheepdog On Sheepdog ? Yeah!
18
Sheepdog Goal Openstack
19
Openstack Storage Components Cinder - Block Storage – Support since day 1 Glance - Image Storage – Support merged at Havana version Nova - Ephemeral Storage – Not yet started Swift - Object Storage – Swift API compatible In progress Final Goal - Unified Storage – Copy-On-Write anywhere ? – Data dedup ? Sheep CinderGlance Unified Storage NovaSwift
20
Features From The Future Roadmap
21
Look Into The Future RESTful Container – Plans to be Openstack Swift API compatible first, coming soon Hyper Volume – 256PB Volume, coming soon Geo-Replication Sheepdog On Sheepdog – Storage for cluster wide snapshot Slow Disk & Broken Disk Detecter – Deal with dead D state process hang because of broken disk in massive deployment
22
How Industry Use Sheepdog Industry
23
Sheepdog In Taobao & NTT SD VMSD VM SD VMSD VM VM running inside Sheepdog Cluster for test & dev at Taobao SD HTTP Ongoing project with 10k+ ARM nodes for cold data at Taobao SD LUN device pool Sheepdog cluster run as iSCSI TGT backend storage at NTT
24
Other Users In Production Any more users I don't know ?
25
Go Sheepdog ! Q & A Try me out git clone git://github.com/sheepdog/sheepdog.git Homepage http://sheepdog.github.io/sheepdog/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.