A Redundant Global Storage Architecture

Slides:



Advertisements
Similar presentations
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
Advertisements

The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
File System Implementation
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Midterm 2: April 28th Material:   Query processing and Optimization, Chapters 12 and 13 (ignore , 12.7, and 13.5)   Transactions, Chapter.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Chapter 12 File Management Systems
The Google File System.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Google File System.
Case Study - GFS.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
Implementation of Simple Cloud-based Distributed File System Group ID: 4 Baolin Wu, Liushan Yang, Pengyu Ji.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Solutions for Fourth Quiz COSC 6360 Fall First question What do we mean when we say that NFS client requests are: (2×10 pts)  self-contained? 
Map reduce Cs 595 Lecture 11.
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CSE 486/586 Distributed Systems Distributed File Systems
Distributed File Systems
Google File System.
Solutions for Fourth Quiz
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Unit OS10: Fault Tolerance
Google Filesystem Some slides taken from Alan Sussman.
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
2018 Huawei H Real Questions Killtest
Chapter 12: File System Implementation
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
RAID RAID Mukesh N Tekwani
Today: Coda, xFS Case Study: Coda File System
Printed on Monday, December 31, 2018 at 2:03 PM.
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Distributed File Systems
ONLINE SECURE DATA SERVICE
CSE 486/586 Distributed Systems Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
RAID RAID Mukesh N Tekwani April 23, 2019
Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie.
CSE 486/586 Distributed Systems Distributed File Systems
Presentation transcript:

A Redundant Global Storage Architecture Jim Leek, David Schultz, Ben Schwarz {jrleek,dschultz,bschwarz}@CS

Storage as a Service Many companies already provide access to massive data sets as a service (e.g. Google) Provide access to raw storage as a service Advantages: Already know how to manage storage clusters More reliable than personal storage Available anywhere Disadvantages: Security?

Storage as a Service Clients allocate large “chunks” of storage 64 MB for now Cost based on bandwidth and total space Bandwidth is most important Disk BW increasing at 40%/yr, capacity at 60%/yr Space costs money too Making data always available consumes power Backups More to recover when a disk is lost Why backups? Because backups to WORM storage are a good way to protect against erasure of data in the case of a security compromise.

Network Architecture Storage network is a collection of chunk servers Space allocation and deallocation is centralized at the master Stateless design; simple recovery

Network/Translation Layer Master Server (lookups) Chunk Server (storage) Responsible for replicating Fault tolerant Higher bandwidth requests Buffers requests Central authority Small number Fault tolerant Low bandwidth Distributes load Block location cache Forward bytestream allocate free locateBlock read write openIOGroup closeIOGroup whoIsMaster

Filesystem Design Encrypted on the client side Always consistent Structured as a Merkle tree Copy on write Inexpensive snapshots reduce the impact of user error Block allocation/cleaning similar to LFS Local WAL to provide fast synchronous writes when required

Updating a Block

Updating a Block

Updating a Block

Updating a Block

Updating a Block

I/O Model Servers provide no transaction support Servers only see encrypted data anyway xFS designers showed how to do isolation entirely on the client side Server respects WAW dependencies Clients attach each write to an I/O group Groups are committed in order

Security Thread Model Attacks CS MIM CS Deduce data loc. Alice Timing attacks Statistical attacks Compromised CS? Alice CS e.g. file size Bob CS Improvement over existing architectures! Even chunkservers are not in-the-know

Security II Filesystem - Encrypt data prior to storing. Decrypt on retrieval. Not a fundamental limitation of OceanStore. How do we handle data that should be read by a set of people (e.g. group privileges)? One public key. Distribute multiple private keys. Use GPG! DATA = Privi(Pub(DATA)) = Privk(Pub(DATA)) Multiple private keys for member identification Or use a single private key for anonymity

Security Guarantees MACs guarantee that the server can’t change the data Hypothetical attack: server selectively ignores writes, inconsistent versions trick the client Merkle tree structure provides fork consistency Informally, if the server prevents clients A and B from seeing each other’s updates, the clients will either detect the problem or live in parallel universes from then on Freshness?

Conclusion Improvement in security and privacy over existing architectures Simple stateless server design makes recovery easier AFS and xFS have problems with recovery Virtually unconstrained write ordering should provide good performance

Questions

Related Work OceanStore Distributed object location; Google Filesystem Master/Chunkserver architecture LFS Large sequential writes WAFL WAL for synchronous semantics Tree of blocks for consistency