Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM.

Slides:



Advertisements
Similar presentations
“It’s going to take a month to get a proof of concept going.” “I know VMM, but don’t know how it works with SPF and the Portal” “I know Azure, but.
Advertisements

Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
QTIP Version 0.2 4th August 2015.
Tales from the Trenches About
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
Configuring File Services Lesson 6. Skills Matrix Technology SkillObjective DomainObjective # Configuring a File ServerConfigure a file server4.1 Using.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
1 Objectives Discuss File Services in Windows Server 2008 Install the Distributed File System in Windows Server 2008 Discuss and create shared file resources.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
CEG 2400 FALL 2012 Linux/UNIX Network Operating Systems.
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)
An Introduction to GPFS
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Virtuozzo 4.0 Carla Safigan Virtuozzo Marketing Jack Zubarev COO.
Cloud Filesystem Jeff Darcy for BBLISA, October 2011.
Azure.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Platform as a Service (PaaS)
Course: Cluster, grid and cloud computing systems Course author: Prof
Storage Area Networks The Basics.
Chapter 6: Securing the Cloud
Configuring File Services
Containers as a Service with Docker to Extend an Open Platform
Platform as a Service (PaaS)
Scalable sync-and-share service with dCache
Introduction to Distributed Platforms
CSE 451: Operating Systems
Efficient data maintenance in GlusterFS using databases
Section 6 Object Storage Gateway (RADOS-GW)
Linux Containers Overview & Roadmap
File System Implementation
Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. --
VIDIZMO Deployment Options
Virtualization overview
Introduction to Networks
Introduction to Networks
XenFS Sharing data in a virtualised environment
Red Hat User Group June 2014 Marco Berube, Cloud Solutions Architect
Operating Systems and Systems Programming
Replication Middleware for Cloud Based Storage Service
Dr. John P. Abraham Professor, Computer Engineering UTPA
OS Virtualization.
Today: Coda, xFS Case Study: Coda File System
Distributed File Systems
Lecture 15 Reading: Bacon 7.6, 7.7
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Windows Virtual PC / Hyper-V
Distributed File Systems
Chapter 15: File System Internals
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Distributed File Systems
OpenStack Summit Berlin – November 14, 2018
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Distributed File Systems
Task 36a Scope – Storage (L=ChrisH)
Presentation transcript:

Jeff Darcy / Mark Wagner Principal Software Engineers, Red Hat 4 May, 2011 BUILDING A CLOUD FILESYSTEM

What's It For? ● “Filesystem as a Service” ● Managed by one provider, used by many tenants Familiarity ScalabilityFlexibility Privacy

What About Existing Filesystems? ● GlusterFS, PVFS2, Ceph,... ● Not all the same (distributed vs. cluster) ● Even the best don't cover all the bases

Privacy Part 1: Separate Namespace tenantX# ls /mnt/shared_fs/tenantY a.txt b.txt my_secret_file.txt ● Tenant X's files should be completely invisible to any other tenant ● Ditto for space usage ● Solvable with subvolume mounts and directory permissions, but watch out for symlinks etc.

Privacy Part 2: Separate ID Space ● Tenant X's “joe” has the same UID as tenant Y's “fred” ● Two tenants should not have the same UIDs ●...but server only has one UID space ● must map between per-server and per-tenant spaces server# ls /shared/tenantX/joe/foo -rw-r--r-- 1 joe joe 9262 Jan 20 12:00 foo server# ls /shared/tenantY/fred/bar -rw-r--r-- 1 joe joe 6481 Mar 09 13:47 bar

“Register your users with our ID service” Add another step? I create thousands of users every day! I already run my own ID service, to sync across the company. Amazon doesn't require that! It was nice knowing you.

Privacy Part 3: At Rest Encryption ● Where did it come from? Whose data is on it? ● Moral: encrypt and store key separately

Privacy Part 4: Wire Encryption + Authentication ● Know who you're talking to ● Make sure nobody else can listen (or spoof) Picture credit:

CloudFS ● Builds on established technology ● Adds specific functionality for cloud deployment Familiarity ScalabilityFlexibility Privacy GlusterFS SSLCloudFS

GlusterFS Core Concept: Translators ● So named because they translate upper-level I/O requests into lower-level I/O requests using the same interface ● stackable in any order ● can be deployed on either client or server ● Lowest-level “bricks” are just directories on servers ● GlusterFS is an engine to route filesystem requests through translators to bricks

Translator Patterns Brick 1 (do nothing) Brick 2 (do nothing) Caching (read XXXX) Brick 1 (read XX) Brick 2 (read YY) Splitting (read XXYY) Brick 1 (write XXYY) Brick 2 (write XXYY) Replicating (write XXYY) Brick 1 (do nothing) Brick 2 (write XXYY) Routing (write XXYY)

Translator Types ● Protocols: client, (native) server, NFS server ● Core: distribute (DHT), replicate (AFR), stripe ● Features: locks, access control, quota ● Performance: prefetching, caching, write-behind ● Debugging: trace, latency measurement ● CloudFS: ID mapping, authentication/encryption, future “Dynamo” and async replication

Typical Translator Structure Mount (FUSE) Cache Distribute Replicate Client AClient B Client C Client D Replicate Server A /export Server B /foo Server C /bar Server D /x/y/z client side

Let's Discuss Performance

Test Hardware ● Testing on Westmere EP Server class machines ● Two Socket, HT on ● 12 boxes total ● 48 GB fast memory ● 15K drives ● 10Gbit – 9K Jumbo frame enabled ● 4 Servers with fully populated Internal SAS drives (7) ● 8 boxes used as clients / VM hosts

Hardware Server s 10 Gbit Network Switch Client s

First Performance Question We Get Asked ● How does it Stack up to NFS ? ● One of the first tests we ran before we tuned ● Tests conducted on same box / storage

GlusterFS vs. NFS (Writes)

GlusterFS vs. NFS (Reads) IO Bound

Second Question ● How Does it Scale ? ● Tests run on combinations of Servers, clients, Vms ● Representative sample shown here ● Scale up across servers, hosts and threads

Read Scalability - Baremetal

Write Scalability – Bare metal

Tuning Fun ● Now that we have the basics, lets play ● Initial tests on RAID0 ● Lets try JBOD

Tuning Tips - Storage Layout

Virtualized Performance ● All this bare metal stuff is interesting but this is a CloudFS, lets see some virt data ● Use KVM Guests running RHEL6.1

Virtualized Performance - RHEL6.1 KVM Guests

Virtualized Performance ● Guest was CPU Bound in previous slides ● Up guest from 2 -> 4 VCPUs

Tuning Tips – Sizing the Guest

CloudFS Implementation

CloudFS Namespace Isolation ● Clients mount subdirectories on each brick ● Subdirectories are combined into per-tenant volumes tenantC# mount server1:brick /mnt/xxx

CloudFS ID Isolation tenantC# stat -c '%A %u %n' blah -rw-r--r-- 92 blah tenantC# stat -c '%A %u %n' /shared/blah -rw-r--r-- 92 /shared/blah provider# stat -c '%A %u %n' /bricks/C/blah -rw-r--r /bricks/C/blah

CloudFS Authentication ● OpenSSL with provider-signed certificates ● Identity used by other CloudFS functions Tenant (one time) Client certificate request ID=x Provider Server-signed certificate Client owned by tenant (every time) SSL connection using certificate Provides authentication and encryption

CloudFS Encryption ● Purely client side, not even escrow on server ● Provides privacy and indemnity ● Problem: partial-block writes partial cipher-block being written remainder fetched from server all input bytes affect all output bytes

Gluster to CloudFS ● So far we have been talking about Gluster performance ● Now lets look at the overhead of the CloudFS specific components

CloudFS Encryption Overhead

CloudFS Multi-Tenancy Overhead

For More Information ● CloudFS blog: ● Mailing lists: ● ● ● Code: ● More to come (wikis, bug tracker, etc.)

Backup: CloudFS “Dynamo” Translator (future) ● Greater scalability ● Faster replication ● Faster replica repair ● Faster rebalancing ● Variable # of replicas “Dynamo” Consistent Hashing S1S1 S2S2 S3S3 A B C D

Backup: CloudFS Async Replication (future) ● Multiple masters ● Partition tolerant ● writes accepted everywhere ● Eventually consistent ● version vectors etc. ● Preserves client-side encryption security ● Unrelated to Gluster geosync Site A S1S1 S2S2 S3S3 Site B S4S4 S5S5 Site C S6S6 S7S7