Cloud Scale Storage Systems Sean Ogden October 30, 2013.

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
GFS: The Google File System Brad Karp UCL Computer Science CS Z03 / th October, 2006.
NFS, AFS, GFS Yunji Zhong. Distributed File Systems Support access to files on remote servers Must support concurrency – Make varying guarantees about.
The Google File System (GFS). Introduction Special Assumptions Consistency Model System Design System Interactions Fault Tolerance (Results)
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Distributed Systems Tutorial 9 – Windows Azure Storage written by Alex Libov Based on SOSP 2011 presentation winter semester,
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
1 The File System Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
The Google File System.
Google File System.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Lecture #8 Giant-Scale Services CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Google File System Simulator Pratima Kolan Vinod Ramachandran.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Larisa kocsis priya ragupathy
The Google File System Presenter: Gladon Almeida Authors: Sanjay Ghemawat Howard Gobioff Shun-Tak Leung Year: OCT’2003 Google File System14/9/2013.
Data in the Cloud – I Parallel Databases The Google File System Parallel File Systems.
1 An Update Model for Network Coding in Cloud Storage Systems th Annual Allerton Conference on Communication, Control, and Computing Mohammad Reza.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Network File System Except as.
Presenters: Rezan Amiri Sahar Delroshan
The Google File System by S. Ghemawat, H. Gobioff, and S-T. Leung CSCI 485 lecture by Shahram Ghandeharizadeh Computer Science Department University of.
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Google File System Sanjay Ghemwat, Howard Gobioff, Shun-Tak Leung Vijay Reddy Mara Radhika Malladi.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
1 CMPT 431© A. Fedorova Google File System A real massive distributed file system Hundreds of servers and clients –The largest cluster has >1000 storage.
Windows Azure storage(WAS)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Business Continuity & Disaster Recovery
Google File System.
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Business Continuity & Disaster Recovery
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
EECS 498 Introduction to Distributed Systems Fall 2017
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
The Google File System (GFS)
Cloud scale storage: The Google File system
CENG334 Introduction to Operating Systems
The Google File System (GFS)
Cloud Computing Storage Systems
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP’03, October 19–22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae.
The Google File System (GFS)
Presentation transcript:

Cloud Scale Storage Systems Sean Ogden October 30, 2013

Evolution P2P routing/DHTs (Chord, CAN, Pastry, etc.) P2P Storage (Pond, Antiquity)  Storing Greg's baby pictures on machines of untrusted strangers that are connected with wifi Cloud storage  Store Greg's baby pictures on trusted data center network at Google

Cloud storage – Why? Centralized control, one administrative domain Can buy seemingly infinite resources Network links are high bandwidth Availability is important Many connected commodity machines with disks is cheap to build  Reliability from software

The Google File System Sanjay Ghemawat, Howard Gobioff, Shun-tak Leung

GFS Assumptions and Goals Given  Large files, large sequential writes  Many concurrent appending applications  Infrequent updates  Trusted network Provide  Fast, well defined append operations  High throughput I/O  Fault tolerance

GFS Components Centralized master Chunk Server Clients

GFS Architecture

GFS Chunk Server

GFS Chunk server Holds chunks of data, 64MB by default Holds checksums of the chunks Responds to queries from master Receives data directly from clients Can be a delegate authority for a block

GFS Master

Holds file system metadata  What chunk server holds which chunk  Metadata table is not persistent Directs clients Centralized  Ease of implementation  Can do load balancing  Not in the data path Replicated for fault tolerance

GFS Client

Queries master for metadata Reads/writes data directly to chunk servers

Write control and Data Flow

Read control and data flow

Supported operations Open Close Create Read Write Delete Atomic record append Snapshot

Consistency Relaxed consistency model File namespace mutations are atomic Files may be consistent and/or defined Consistent  All clients will see the same data Defined  Consistent and entire mutation is visible by clients

Consistency WriteRecord Append Serial successdefined defined interspersed with inconsistent Concurrent successesconsistent but not defined Failureinconsistent

“Atomic” record appends Most frequently used operation “At least once” guarantee Failed append operation can cause blocks to have result of partially complete mutation Suppose we have a block that contains “DEAD”, and we append(f, “BEEF”) Replica 1DEADBEEF Replica 2DEADBEBEEF Replica 3DEADBEEF

Performance

Performance notes It goes up and to the right Write throughput limited by network due to replication Master saw 200 ops/second

GFS Takeaways There can be benefits to a centralized master  If it is not in the write path Treat failure as the norm Ditching old standards can lead to drastically different designs that better fit a specific goal

Discussion Does GFS work for anyone outside of Google? Are industry papers useful to the rest of us? What are the pros/cons of single master in this system? Will there ever be a case where single master could be a problem? Could we take components of this and improve on them in some way for different work loads?

Windows Azure Storage Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, Leonidas Riga

Azure Storage Goals and Assumptions Given  Multi tenant storage service  Publicly accessible – untrusted clients  Myriad of different usage patterns, not just large files Provide  Strong consistency  Atomic transactions (within partitions)  Synchronous local replication + asynchronous georeplication  Some useful high level abstractions for storage

Azure vs. GFS GFSAzure Minimum block size64 MB~4MB Unit of replicationBlockExtent Mutable blocks?YesNo ConsistencyNot consistentStrong Replication3 copies of full blocksErasure coding UsagePrivate within googlePublic

Azure Architecture Stream Layer Partition Layer Front End Layer

Azure Storage Architecture

Azure Storage Stream Layer Provides file system abstraction Streams ≈ Files  Made up of pointers to extents Extents are made up of lists of blocks Blocks are the smallest unit of IO  Much smaller than in GFS (4MB vs. 64MB) Does synchronous intra-stamp replication

Anatomy of a Stream

Stream Layer Architecture

Stream Layer Optimizations Spindle anti-starvation  Custom disk scheduling predicts latencey Durability and Journaling  All writes must be durable on 3 replicas  Use an SSD and journal appends on every EN  Appends do not conflict with reads

Partition Layer Responsibilities Manages higher level abstractions  Blob  Table  Queue Asynchronous Inter-Stamp replication

Partition Layer Architecture Partition server serves requests for RangePartitions  Only one partition server can serve a given RangePartition at any point in time Partition Manager keeps track of partitioning Object Tables into RangePartitions Paxos Lock Service used for leader election for Partition Manager

Partition Layer Architecture

Azure Storage Takeaways Benefits from good layered design  Queues, blobs and tables all share underlying stream layer Append only  Simplifies design of distributed storage  Comes at cost of GC Multitenancy challenges

Azure Storage discussion Did they really “beat” CAP theorem? What do you think about their consistency guarantee?  Would it be useful to have inter-namespace consistency guarantees?

Comparison