Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.

Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

Previous Classes Why Distribute? MapReduce  Why  Implementation Details of RPC Other Large Systems: Web Services But… how is all this data stored?

Distributed File Systems Trade Offs in Distributed File Systems  Performance  Scalability  Reliability  Availability How do you decide? Two Core Approaches  Super Computer?  Or many cheap computers?

Motivation Google went the cheap commodity route…  Lots of data on cheap machines! Yikes! Why not use an existing file system?  Unique problems  GFS is designed for Google apps and workloads  Google apps are designed for GFS

Assumptions Failures are the norm  Detect errors, recover, tolerate faults, etc  Software errors (we’re only human) Huge files  Multi-GB files are common  But there aren’t THAT many files

Assumptions (cont.) Mutations are typically appending new data  Random writes are rare  Once written, files are only read, and typically sequentially  Optimize for this! Large consecutive reads, small random reads Want high sustained bandwidth – low latency is not that important Google is designing apps AND file system

GFS Interface Supports usual commands  Create, delete, open, close, read, write Snapshot  Copies a file or a directory tree Record Append  Allows multiple concurrent appends to same file

GFS Architecture Single master Multiple chunkservers …Can anyone see a potential weakness in this design?

Architecture Files divided into fixed-sized chunks (64 MB)  Each chunk gets a chunk handle from master  Stored as linux files One Master  Maintains all filesystem metadata  Talks to each chunksever periodically Multiple Chunkservers  Store chunks on local disks  No caching of chunks. (Why not?) Multiple Clients  Clients talk to master for metadata operations  Read / write data from chunkservers

Single master From distributed systems we know this is a:  Single point of failure  Scalability bottleneck GFS solutions:  Shadow masters  Minimize master involvement never move data through it, use only for metadata  and cache metadata at clients large chunk size (64 MB) master delegates authority to primary replicas in data mutations (chunk leases) Simple, and good enough!

Master’s responsibilities (1/2) Metadata storage Namespace management/locking Periodic communication with chunkservers  give instructions, collect state, track cluster health Garbage Collection

Master’s responsibilities (2/2) Chunk creation  Place new replicas on chunkservers with below average disk-space utilization  Limit number of recent creations on each chunk server  Spread replicas across racks Re-Replicate when replicas fall below user goal  How do you assign priorities? Periodic rebalancing  Better disk space usage  Load balancing

Metadata (1/2) Global metadata is stored on the master  File and chunk namespaces  Mapping from files to chunks  Locations of each chunk’s replicas All in memory (64 bytes / chunk)  Fast  Easily accessible  Any problems?

Metadata (2/2) Master has an operation log for persistent logging of critical metadata updates  persistent on local disk  replicated  checkpoints for faster recovery

Garbage Collection First… how would you do it?

Garbage Collection How to…  Master logs deletion immediately, and renames to hidden file  Lazily garbage collects hidden files via re-scans Also identifies orphaned chunks Why?  Simple and reliable when failures are common  Merges storage reclamation into background activities  Delay is safety net against background activities

Mutations Mutation = write or append  must be done for all replicas Goal: minimize master involvement Lease mechanism:  master picks one replica as primary; gives it a “lease” for mutations  primary defines a serial order of mutations  all replicas follow this order Data flow decoupled from control flow

Atomic record append Client specifies data GFS appends it to the file atomically at least once  GFS picks the offset  works for concurrent appends Used heavily by Google apps  e.g., for files that serve as multiple-producer/single- consumer queues

Relaxed consistency model (1/2) “Consistent” = all replicas have the same value “Defined” = replica reflects the mutation, consistent Some properties:  concurrent writes leave region consistent, but possibly undefined  failed writes leave the region inconsistent Some work has moved into the applications:  e.g., self-validating, self-identifying records

Relaxed consistency model (2/2) Simple, efficient  Google apps can live with it  what about other apps? Namespace updates atomic and serializable

Fault Tolerance High availability  fast recovery master and chunkservers restartable in a few seconds  chunk replication default: 3 replicas.  shadow masters Data integrity  checksum every 64KB block in each chunk

Deployment in Google 50+ GFS clusters Each with thousands of storage nodes Managing petabytes of data GFS is under BigTable, etc.

Conclusion GFS demonstrates how to support large-scale processing workloads on commodity hardware  design to tolerate frequent component failures  optimize for huge files that are mostly appended and read  feel free to relax and extend FS interface as required  go for simple solutions (e.g., single master) GFS has met Google’s storage needs… it must be good!

Discussion Is GFS useful as a general-purpose commercial product?

Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.

Similar presentations

Presentation on theme: "Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.

Similar presentations

Presentation on theme: "Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation."— Presentation transcript:

Similar presentations

About project

Feedback