Download presentation
Presentation is loading. Please wait.
Published byLorraine Caldwell Modified over 9 years ago
1
2014. 3. 24 Presenter: Seikwon Kim @ KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
2
Contents Design Overview Ⅰ Ⅰ System Interactions Ⅱ Ⅱ Introduction Fault Tolerance IV Master Operation Ⅲ Ⅲ Conclusion V V
3
2014 Internet System Technology 3/27 【 Introduction 】 What is GFS? - Distributed file system - Goal: performance, scalability, reliability, availability Why GFS? - To meet Google’s data processing needs - Different in design assumptions Component failures are the norm Files are huge No data overwriting Co-designing the client and the file system
4
Design Overview Ⅰ Ⅰ 1.1 Assumptions 1.2 Interface 1.3 Architecture 1.3 Chunk 1.4 Metadata 1.5 Consistency Model
5
2014 Internet System Technology 5/27 1.1 Assumptions Design Overview Cheap Component System Optimization for General Sized Files Workloads - Large stream reads - Small random reads - Large sequential appends High bandwidth > Low latency Basic Assumptions
6
2014 Internet System Technology 6/27 1.2 Interface create delete open close read write snapshot : Copy file or directory record append : Multiple clients to append data to the same file concurrently POSIX-like APIs but not POSIX APIs Files are organized hierarchically in directories. Features Design Overview
7
2014 Internet System Technology 7/27 1.3 Architecture Single Master, Multiple Chunk Servers, Multiple Clients Files are divided into 64MB chunks Chunks are replicated in multiple chunk servers : Default 3 Master communicates with chunk servers with Heart Beat msg. Master maintains all file system metadata No file cache, but Yes metadata cache Architecture Overview Design Overview
8
2014 Internet System Technology 8/27 1.4 Chunk Unit of data stored in GFS Large chunk size is key design parameter : 64MB Chunk replica is stored as a plain Linux file on a chunk server Design Overview Minimize interaction between client and master Reduce network overhead Reduce metadata size stored in master. Pros of Large Chunk Size Chunk server become hot spot on one chunk Cons of Large Chunk Size
9
2014 Internet System Technology 9/27 1.5 Metadata Metadata Types: File and chunk namespaces : Persistent Mapping from files to chunks : Persistent Locations of chunk replicas : Not Persistent All metadata is in memory. Design Overview Fast Master Operations Efficient to periodically scan through entire state background Chunk garbage collection Re-replication Chunk migration Low cost of adding Extra Memory Why Stored in Memory?
10
2014 Internet System Technology 10/27 1.5 Metadata Cont. Design Overview Chunk location is not persistent Polls from chunk server at Master start up Keep up-to-date by Heart Beat Message Chunk locations Historical records of critical metadata changes Persistent record of metadata changes Logical timeline of concurrent operations If an error occurs in master, it recovers by replaying the operation logs Checkpoints to minimize the operation log Operation Logs
11
2014 Internet System Technology 11/27 1.6 Consistency Model Design Overview Relaxed Consistency Model Guarantees the Atomic File-namespace Mutation Levels of Consistency on Data Inconsistent : Different client see different data Consistent : All clients see same data, regardless of replica Defined : Client sees complete written data Append rather then overwrite Self Validation What Application has to do
12
System Interactions Ⅱ Ⅱ 2.1 Write Control 2.2 Data Flow 2.3 Atomic Record Append 2.4 Snapshot
13
2014 Internet System Technology 13/27 Write Process 2.1 Write Control System Interactions Lease Primary Chunk Replica that is Granted by Master Mutation Operation that Changes the Content or Metadata What Application has to do
14
2014 Internet System Technology 14/27 2.2 Data Flow System Interactions Data is pushed linearly along a chain of chunk servers Forwards the data to the closest machine Distance - estimated from IP addresses Line topology Full outbound bandwidth Pipelining: to minimize latency and maximize throughput Elapsed time for transferring Elapsed time = B/T + RL B : bytes for transfer T : network throughput R : # of replicas L : latency Network Construction in GFS
15
2014 Internet System Technology 15/27 2.3 Atomic Record Appends System Interactions In Traditional Writes Clients specifies offset where the data to be written Data fragmentation In Record Append Client specifies only the data Similar to write in GFS Much like write in GFS GFS appends data to the file at least once atomically The chunk is padded - when (record > maximum size) Retry append when error occurs Record Append Process
16
2014 Internet System Technology 16/27 2.4 Snapshot System Interactions Instant File/Directory Copy Master receive snapshot request Revokes leases on chunks Master logs the operation Duplicate the metadata for file/directory New snapshot Duplicate local chunk when write operation comes Snapshot Process
17
2014 Internet System Technology 17/27 Master Operation Ⅲ Ⅲ 3.1 Namespace Management 3.2 Replica Placement 3.3 Creation, Re-replication, Rebalancing 3.4 Garbage Collection 3.5 Stale Relica Detection
18
2014 Internet System Technology 18/27 3.1 Namespace Management Master Operation Allows Multiple Operations at Same Time Master Operations Need Lock Prefix Compression Snapshot of /home/user read-lock on /home read-lock on /home/user write-lock on /home/user File creation for /home/user/foo read-lock on /home read-lock on /home/user write-lock on /home/user/foo Locks conflicts Serialize operations Locking Example
19
2014 Internet System Technology 19/27 3.2 Replica Placement Master Operation Purpose of Replica Placement Maximize data reliability and availability Maximize bandwidth utilization Spread Chunks across Rack Available even when power circuit problem occurs Rack
20
2014 Internet System Technology 20/27 3.3 Creation, Re-replication, Rebalancing Master Operation Movements for Chunk Replicas Chunk creation Re-replication Load balancing Creation Place chunk at below-average-disk-space chunk server Spread replicas across racks Re-replication Re-replicates a chunk when criteria falls below specified level Rebalancing Periodically examines for load balancing.
21
2014 Internet System Technology 21/27 3.4 Garbage Collection Master Operation Garbage Collection in GFS Garbage Collection + Delete Process Delete Process ① User Deletes a file ② Master renames or hides the file ③ During masters regular scan, removes the file Regular Garbage Collection ① Receives regular Heart Beat Message ② Compare data with master metadata ③ Remove orphaned chunks Garbage Collection Process
22
2014 Internet System Technology 22/27 3.5 Stale Replica Detection Master Operation Stale Data Created When mutation data is missed Server is down Master manages Chunk Version Number Distinguish between up-to-date and stale Stale Chunk Removed in Regular GC
23
2014 Internet System Technology 23/27 FAULT TOLERANCE IV 4.1 High Availability 4.2 Data Integrity
24
2014 Internet System Technology 24/27 4.1 High Availability Fault Tolerance Fast Recovery Start-up time is in seconds Chunk Replication Master clones replicas as needed Master Replication Master state replicated synchronously Shadow masters for read-only For simplicity, only one master processes. Restart is fast.
25
2014 Internet System Technology 25/27 4.2 Data Integrity Fault Tolerance Checksumming to Detect Data Corruption Checksums are kept in memory as well as disk. On read error, error is reported to master. Master will re-replicate the chunk. Requestor read from other replicas.
26
Conclusion V V
27
2014 Internet System Technology 27/27 5 Conclusion Conclusion Supports Large-scale Data Processing Workloads on Commodity Hardware. Provides Fault Tolerance By constant monitoring By replicating crucial data By fast and automatic recovery Delivers High Throughput to Concurrent Clients
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.