Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM.

Similar presentations


Presentation on theme: "Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM."— Presentation transcript:

1

2 Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM

3 What's It For? ● “Filesystem as a Service” ● Managed by one provider, used by many tenants Familiarity ScalabilityFlexibility Privacy

4 What About Existing Filesystems? ● GlusterFS, PVFS2, Ceph,... ● Not all the same (distributed vs. cluster) ● Even the best don't cover all the bases

5 Privacy Part 1: Separate Namespace tenantX# ls /mnt/shared_fs/tenantY a.txt b.txt my_secret_file.txt ● Tenant X's files should be completely invisible to any other tenant ● Ditto for space usage ● Solvable with subvolume mounts and directory permissions, but watch out for symlinks etc.

6 Privacy Part 2: Separate ID Space ● Tenant X's “joe” has the same UID as tenant Y's “fred” ● Two tenants should not have the same UIDs ●...but server only has one UID space ● must map between per-server and per-tenant spaces server# ls /shared/tenantX/joe/foo -rw-r--r-- 1 joe joe 9262 Jan 20 12:00 foo server# ls /shared/tenantY/fred/bar -rw-r--r-- 1 joe joe 6481 Mar 09 13:47 bar

7 “Register your users with our ID service” Add another step? I create thousands of users every day! I already run my own ID service, to sync across the company. Amazon doesn't require that! It was nice knowing you.

8 Privacy Part 3: At Rest Encryption ● Where did it come from? Whose data is on it? ● Moral: encrypt and store key separately

9 Privacy Part 4: Wire Encryption + Authentication ● Know who you're talking to ● Make sure nobody else can listen (or spoof) Picture credit:http://www.owasp.org/index.php/Man-in-the-middle_attack

10 CloudFS ● Builds on established technology ● Adds specific functionality for cloud deployment Familiarity ScalabilityFlexibility Privacy GlusterFS SSLCloudFS

11 GlusterFS Core Concept: Translators ● So named because they translate upper-level I/O requests into lower-level I/O requests using the same interface ● stackable in any order ● can be deployed on either client or server ● Lowest-level “bricks” are just directories on servers ● GlusterFS is an engine to route filesystem requests through translators to bricks

12 Translator Patterns Brick 1 (do nothing) Brick 2 (do nothing) Caching (read XXXX) Brick 1 (read XX) Brick 2 (read YY) Splitting (read XXYY) Brick 1 (write XXYY) Brick 2 (write XXYY) Replicating (write XXYY) Brick 1 (do nothing) Brick 2 (write XXYY) Routing (write XXYY)

13 Translator Types ● Protocols: client, (native) server, NFS server ● Core: distribute (DHT), replicate (AFR), stripe ● Features: locks, access control, quota ● Performance: prefetching, caching, write-behind ● Debugging: trace, latency measurement ● CloudFS: ID mapping, authentication/encryption, future “Dynamo” and async replication

14 Typical Translator Structure Mount (FUSE) Cache Distribute Replicate Client AClient B Client C Client D Replicate Server A /export Server B /foo Server C /bar Server D /x/y/z client side

15 Let's Discuss Performance

16 Test Hardware ● Testing on Westmere EP Server class machines ● Two Socket, HT on ● 12 boxes total ● 48 GB fast memory ● 15K drives ● 10Gbit – 9K Jumbo frame enabled ● 4 Servers with fully populated Internal SAS drives (7) ● 8 boxes used as clients / VM hosts

17 Hardware Server s 10 Gbit Network Switch Client s

18 First Performance Question We Get Asked ● How does it Stack up to NFS ? ● One of the first tests we ran before we tuned ● Tests conducted on same box / storage

19 GlusterFS vs. NFS (Writes)

20 GlusterFS vs. NFS (Reads) IO Bound

21 Second Question ● How Does it Scale ? ● Tests run on combinations of Servers, clients, Vms ● Representative sample shown here ● Scale up across servers, hosts and threads

22 Read Scalability - Baremetal

23 Write Scalability – Bare metal

24 Tuning Fun ● Now that we have the basics, lets play ● Initial tests on RAID0 ● Lets try JBOD

25 Tuning Tips - Storage Layout

26

27 Virtualized Performance ● All this bare metal stuff is interesting but this is a CloudFS, lets see some virt data ● Use KVM Guests running RHEL6.1

28 Virtualized Performance - RHEL6.1 KVM Guests

29

30 Virtualized Performance ● Guest was CPU Bound in previous slides ● Up guest from 2 -> 4 VCPUs

31 Tuning Tips – Sizing the Guest

32

33 CloudFS Implementation

34 CloudFS Namespace Isolation ● Clients mount subdirectories on each brick ● Subdirectories are combined into per-tenant volumes tenantC# mount server1:brick /mnt/xxx

35 CloudFS ID Isolation tenantC# stat -c '%A %u %n' blah -rw-r--r-- 92 blah tenantC# stat -c '%A %u %n' /shared/blah -rw-r--r-- 92 /shared/blah provider# stat -c '%A %u %n' /bricks/C/blah -rw-r--r-- 1138 /bricks/C/blah

36 CloudFS Authentication ● OpenSSL with provider-signed certificates ● Identity used by other CloudFS functions Tenant (one time) Client certificate request ID=x Provider Server-signed certificate Client owned by tenant (every time) SSL connection using certificate Provides authentication and encryption

37 CloudFS Encryption ● Purely client side, not even escrow on server ● Provides privacy and indemnity ● Problem: partial-block writes partial cipher-block being written remainder fetched from server all input bytes affect all output bytes

38 Gluster to CloudFS ● So far we have been talking about Gluster performance ● Now lets look at the overhead of the CloudFS specific components

39 CloudFS Encryption Overhead

40 CloudFS Multi-Tenancy Overhead

41 For More Information ● CloudFS blog: http://cloudfs.org ● Mailing lists: ● https://fedorahosted.org/mailman/listinfo/cloudfs-general ● https://fedorahosted.org/mailman/listinfo/cloudfs-devel ● Code: http://git.fedorahosted.org/git/?p=CloudFS.git ● More to come (wikis, bug tracker, etc.)

42

43 Backup: CloudFS “Dynamo” Translator (future) ● Greater scalability ● Faster replication ● Faster replica repair ● Faster rebalancing ● Variable # of replicas “Dynamo” Consistent Hashing S1S1 S2S2 S3S3 A B C D

44 Backup: CloudFS Async Replication (future) ● Multiple masters ● Partition tolerant ● writes accepted everywhere ● Eventually consistent ● version vectors etc. ● Preserves client-side encryption security ● Unrelated to Gluster geosync Site A S1S1 S2S2 S3S3 Site B S4S4 S5S5 Site C S6S6 S7S7


Download ppt "Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM."

Similar presentations


Ads by Google