Download presentation
Presentation is loading. Please wait.
Published byEunice Reeves Modified over 8 years ago
2
Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM
3
What's It For? ● “Filesystem as a Service” ● Managed by one provider, used by many tenants Familiarity ScalabilityFlexibility Privacy
4
What About Existing Filesystems? ● GlusterFS, PVFS2, Ceph,... ● Not all the same (distributed vs. cluster) ● Even the best don't cover all the bases
5
Privacy Part 1: Separate Namespace tenantX# ls /mnt/shared_fs/tenantY a.txt b.txt my_secret_file.txt ● Tenant X's files should be completely invisible to any other tenant ● Ditto for space usage ● Solvable with subvolume mounts and directory permissions, but watch out for symlinks etc.
6
Privacy Part 2: Separate ID Space ● Tenant X's “joe” has the same UID as tenant Y's “fred” ● Two tenants should not have the same UIDs ●...but server only has one UID space ● must map between per-server and per-tenant spaces server# ls /shared/tenantX/joe/foo -rw-r--r-- 1 joe joe 9262 Jan 20 12:00 foo server# ls /shared/tenantY/fred/bar -rw-r--r-- 1 joe joe 6481 Mar 09 13:47 bar
7
“Register your users with our ID service” Add another step? I create thousands of users every day! I already run my own ID service, to sync across the company. Amazon doesn't require that! It was nice knowing you.
8
Privacy Part 3: At Rest Encryption ● Where did it come from? Whose data is on it? ● Moral: encrypt and store key separately
9
Privacy Part 4: Wire Encryption + Authentication ● Know who you're talking to ● Make sure nobody else can listen (or spoof) Picture credit:http://www.owasp.org/index.php/Man-in-the-middle_attack
10
CloudFS ● Builds on established technology ● Adds specific functionality for cloud deployment Familiarity ScalabilityFlexibility Privacy GlusterFS SSLCloudFS
11
GlusterFS Core Concept: Translators ● So named because they translate upper-level I/O requests into lower-level I/O requests using the same interface ● stackable in any order ● can be deployed on either client or server ● Lowest-level “bricks” are just directories on servers ● GlusterFS is an engine to route filesystem requests through translators to bricks
12
Translator Patterns Brick 1 (do nothing) Brick 2 (do nothing) Caching (read XXXX) Brick 1 (read XX) Brick 2 (read YY) Splitting (read XXYY) Brick 1 (write XXYY) Brick 2 (write XXYY) Replicating (write XXYY) Brick 1 (do nothing) Brick 2 (write XXYY) Routing (write XXYY)
13
Translator Types ● Protocols: client, (native) server, NFS server ● Core: distribute (DHT), replicate (AFR), stripe ● Features: locks, access control, quota ● Performance: prefetching, caching, write-behind ● Debugging: trace, latency measurement ● CloudFS: ID mapping, authentication/encryption, future “Dynamo” and async replication
14
Typical Translator Structure Mount (FUSE) Cache Distribute Replicate Client AClient B Client C Client D Replicate Server A /export Server B /foo Server C /bar Server D /x/y/z client side
15
Let's Discuss Performance
16
Test Hardware ● Testing on Westmere EP Server class machines ● Two Socket, HT on ● 12 boxes total ● 48 GB fast memory ● 15K drives ● 10Gbit – 9K Jumbo frame enabled ● 4 Servers with fully populated Internal SAS drives (7) ● 8 boxes used as clients / VM hosts
17
Hardware Server s 10 Gbit Network Switch Client s
18
First Performance Question We Get Asked ● How does it Stack up to NFS ? ● One of the first tests we ran before we tuned ● Tests conducted on same box / storage
19
GlusterFS vs. NFS (Writes)
20
GlusterFS vs. NFS (Reads) IO Bound
21
Second Question ● How Does it Scale ? ● Tests run on combinations of Servers, clients, Vms ● Representative sample shown here ● Scale up across servers, hosts and threads
22
Read Scalability - Baremetal
23
Write Scalability – Bare metal
24
Tuning Fun ● Now that we have the basics, lets play ● Initial tests on RAID0 ● Lets try JBOD
25
Tuning Tips - Storage Layout
27
Virtualized Performance ● All this bare metal stuff is interesting but this is a CloudFS, lets see some virt data ● Use KVM Guests running RHEL6.1
28
Virtualized Performance - RHEL6.1 KVM Guests
30
Virtualized Performance ● Guest was CPU Bound in previous slides ● Up guest from 2 -> 4 VCPUs
31
Tuning Tips – Sizing the Guest
33
CloudFS Implementation
34
CloudFS Namespace Isolation ● Clients mount subdirectories on each brick ● Subdirectories are combined into per-tenant volumes tenantC# mount server1:brick /mnt/xxx
35
CloudFS ID Isolation tenantC# stat -c '%A %u %n' blah -rw-r--r-- 92 blah tenantC# stat -c '%A %u %n' /shared/blah -rw-r--r-- 92 /shared/blah provider# stat -c '%A %u %n' /bricks/C/blah -rw-r--r-- 1138 /bricks/C/blah
36
CloudFS Authentication ● OpenSSL with provider-signed certificates ● Identity used by other CloudFS functions Tenant (one time) Client certificate request ID=x Provider Server-signed certificate Client owned by tenant (every time) SSL connection using certificate Provides authentication and encryption
37
CloudFS Encryption ● Purely client side, not even escrow on server ● Provides privacy and indemnity ● Problem: partial-block writes partial cipher-block being written remainder fetched from server all input bytes affect all output bytes
38
Gluster to CloudFS ● So far we have been talking about Gluster performance ● Now lets look at the overhead of the CloudFS specific components
39
CloudFS Encryption Overhead
40
CloudFS Multi-Tenancy Overhead
41
For More Information ● CloudFS blog: http://cloudfs.org ● Mailing lists: ● https://fedorahosted.org/mailman/listinfo/cloudfs-general ● https://fedorahosted.org/mailman/listinfo/cloudfs-devel ● Code: http://git.fedorahosted.org/git/?p=CloudFS.git ● More to come (wikis, bug tracker, etc.)
43
Backup: CloudFS “Dynamo” Translator (future) ● Greater scalability ● Faster replication ● Faster replica repair ● Faster rebalancing ● Variable # of replicas “Dynamo” Consistent Hashing S1S1 S2S2 S3S3 A B C D
44
Backup: CloudFS Async Replication (future) ● Multiple masters ● Partition tolerant ● writes accepted everywhere ● Eventually consistent ● version vectors etc. ● Preserves client-side encryption security ● Unrelated to Gluster geosync Site A S1S1 S2S2 S3S3 Site B S4S4 S5S5 Site C S6S6 S7S7
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.