CSE 451: Operating Systems Autumn Module 22 Distributed File Systems

Slides:



Advertisements
Similar presentations
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
1DT057 DISTRIBUTED INFORMATION SYSTEM DISTRIBUTED FILE SYSTEM 1.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
File System Implementation
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
The Google File System.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
CSE 451: Operating Systems Spring 2012 Module 23 Distributed File Systems Ed Lazowska Allen Center 570.
Case Study - GFS.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 11-2: File System Implementation Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 24: GFS.
Chapter 12: File System Implementation
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CREATED BY: JEAN LOIZIN CLASS: CS 345 DATE: 12/05/2016
Chapter 11: File System Implementation
Google File System.
File System Implementation
GFS.
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
Chapter 12: File System Implementation
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
IT344 – Operating Systems Winter 2011, Dale Rowe.
The Google File System (GFS)
Today: Coda, xFS Case Study: Coda File System
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CSE 451: Operating Systems Winter 2007 Module 20 Remote Procedure Call (RPC) Ed Lazowska Allen Center
The Google File System (GFS)
CSE 451: Operating Systems Autumn Module 22 Distributed File Systems
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Distributed File Systems
The Google File System (GFS)
The Google File System (GFS)
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Autumn 2009 Module 21 Remote Procedure Call (RPC) Ed Lazowska Allen Center
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
CSE 451: Operating Systems Distributed File Systems
The Google File System (GFS)
Chapter 15: File System Internals
Today: Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
Distributed File Systems
CSE 451: Operating Systems Autumn 2010 Module 21 Remote Procedure Call (RPC) Ed Lazowska Allen Center
Distributed File Systems
The Google File System (GFS)
Presentation transcript:

CSE 451: Operating Systems Autumn 2010 Module 22 Distributed File Systems Ed Lazowska lazowska@cs.washington.edu Allen Center 570 1

Distributed File Systems A distributed file systems supports network-wide sharing of files and devices A DFS typically presents clients with a “traditional” file system view there is a single file system namespace that all clients see a client can observe the side-effects of other clients’ file system activities: sharing is possible in many (but not all) ways, an ideal distributed file system provides clients with the illusion of a shared, local file system But …with a distributed implementation read blocks / files from remote machines across a network, instead of from a local disk 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 2

© 2010 Gribble, Lazowska, Levy, Zahorjan DFS issues What is the basic abstraction a remote file system? open, close, read, write, … a remote disk? read block, write block Naming how are files named? are those names location transparent? is the file location visible to the user? are those names location independent? do the names change if the file moves? do the names change if the user moves? 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

© 2010 Gribble, Lazowska, Levy, Zahorjan Caching caching exists for performance reasons where are file blocks cached? on the file server? on the client machine? both? Sharing and coherency what are the semantics of sharing? what happens when a cached block/file is modified? how does a node know when its cached blocks are stale? if we cache on the client side, we’re presumably caching on multiple client machines if a file is being shared 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

© 2010 Gribble, Lazowska, Levy, Zahorjan Replication replication can exist for performance and/or availability can there be multiple copies of a file in the network? if multiple copies, how are updates handled? what if there’s a network partition? Can clients work on separate copies? If so, how does reconciliation take place? Performance what is the performance of remote operations? what is the additional cost of file sharing? how does the system scale as the number of clients grows? what are the performance bottlenecks: network, CPU, disks, protocols, data copying? 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

Reminder: Single-system Unix file sharing User 1 User 2 User 3 channel table channel table channel table open file table file offset file offset memory-resident i-node table disk file buffer cache 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

Example: Sun’s Network File System (NFS) The Sun Network File System (NFS) has become a common standard for distributed UNIX file access NFS runs over LANs (even over WANs – slowly) Basic idea allow a remote directory to be “mounted” (spliced) onto a local directory Gives access to that remote directory and all its descendants as if they were part of the local hierarchy Pretty similar (except for implementation and performance) to a “local mount” or “link” on UNIX I might link /cse/www/education/courses/451/10au/ as /u4/lazowska/451 to allow easy access to my web data from my home directory cd ln –s /cse/www/education/courses/451/10au 451 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

© 2010 Gribble, Lazowska, Levy, Zahorjan NFS particulars ginger.cs exports the directory ginger.cs:/u4/lazowska norton.cs mounts this on /faculty/edl programs on norton.cs can access the remote directory ginger.cs:/u4/lazowska using the local path /faculty/edl if, on ginger.cs, I had a file /u4/lazowska/myfile.txt programs on norton.cs could access it as /faculty/edl/myfile.txt note that different clients might mount the same exported directory, but on different local paths e.g., forkbomb.cs might mount it on /facultyfiles/edlazowska then, the file ginger.cs:/u4/lazowska/myfile.txt could be accessed with three different names on ginger.cs: /u4/lazowska/myfile.txt on norton.cs: /faculty/edl/myfile.txt on forkbomb.cs: /facultyfiles/edlazowska/myfile.txt 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 8

© 2010 Gribble, Lazowska, Levy, Zahorjan NFS implementation NFS defines a set of RPC operations for remote file access: searching a directory reading directory entries manipulating links and directories reading/writing files Every node may be a client, a server, or both E.g., a given machine might export some directories and import others 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

NFS defines new layers in the Unix file system The virtual file system (VFS) provides a standard interface, using v-nodes as file handles. A v-node describes either a local or remote file. System Call Interface Virtual File System UFS NFS RPCs to other (server) nodes (local files) (remote files) RPC requests from remote clients, and server responses buffer cache / i-node table 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

© 2010 Gribble, Lazowska, Levy, Zahorjan NFS caching / sharing On a file open, the client asks the server whether the client’s cached blocks are up to date. (good!) but, once a file is open, different clients can perform concurrent reads and writes to it and get inconsistent data (bad!) Modified data is flushed back to the server every 30 seconds the good news is this bounds the amount of inconsistency to a window of 30 seconds, and that this is simple to implement and understand the bad news is that the inconsistency can be severe e.g., data can be lost, different clients can see inconsistent states of the files at the same time 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 11

Example: CMU’s Andrew File System (AFS) Developed at CMU to support all of its student computing Consists of workstation clients and dedicated file server machines (differs from NFS) Workstations have local disks, used to cache files being used locally (originally whole files, subsequently 64K file chunks) (differs from NFS) Andrew has a single name space – your files have the same names everywhere in the world (differs from NFS) Andrew is good for distant operation because of its local disk caching: after a slow startup, most accesses are to local disk 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

© 2010 Gribble, Lazowska, Levy, Zahorjan AFS caching/sharing Need for scaling required reduction of client-server message traffic Once a file is cached, all operations are performed locally On close, if the file has been modified, it is replaced on the server The client assumes that its cache is up to date, unless it receives a callback message from the server saying otherwise on file open, if the client has received a callback on the file, it must fetch a new copy; otherwise it uses its locally-cached copy (differs from NFS) 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

Example: Berkeley Sprite File System Unix file system developed for diskless workstations with large memories (differs from NFS, AFS) Considers memory as a huge cache of disk blocks memory is shared between file system and VM Files are permanently stored on servers servers have a large memory that acts as a cache as well Several workstations can cache blocks for read-only files If a file is being written by more than 1 machine, client caching is turned off – all requests go to the server (differs from NFS, AFS) 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

Example: Google’s File System (GFS) NFS, etc. GFS Independence Small Scale Variety of workloads Cooperation Large scale Very specific, well-understood workloads 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 15

© 2010 Gribble, Lazowska, Levy, Zahorjan GFS: Environment Why did Google build its own file system? Google has unique FS requirements huge volume of data huge read/write bandwidth reliability over tens of thousands of nodes with frequent failures mostly operating on large data blocks needs efficient distributed operations Google has somewhat of an unfair advantage…it has control over, and customizes, its: applications libraries operating system networks even its computers! 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 16

© 2010 Gribble, Lazowska, Levy, Zahorjan GFS: Files Files are huge by traditional standards (GB, TB, PB) Most files are mutated by appending new data rather than overwriting existing data Once written, the files are only read, and often only sequentially. Appending becomes the focus of performance optimization and atomicity guarantees NOTE: A major use of GFS is for storing event logs – what did you search for, which link did you follow, etc. Then these logs are mined for patterns. Hence huge, append-only, read sequentially. 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 17

© 2010 Gribble, Lazowska, Levy, Zahorjan GFS: Architecture A GFS cluster consists of a replicated master and multiple chunk servers and is accessed by multiple clients Each computer in the GFS cluster is typically a commodity Linux machine running a user-level server process Files are divided into fixed-size chunks identified by an immutable and globally unique 64-bit chunk handle For reliability, each chunk is replicated on multiple chunk servers The master maintains all file system metadata (like, on which chunk servers specific chunks are stored) The master periodically communicates with each chunk server in HeartBeat messages to determine its state Clients communicate with the master (to access metadata (e.g., to find the location of specific chunks)) and directly with chunk servers (to actually access the data) Neither clients nor chunk servers cache file data, eliminating cache coherence issues Clients do cache metadata, however If the master croaks, Paxos is used to select a new master from among the replicas 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 18

© 2010 Gribble, Lazowska, Levy, Zahorjan GFS: Architecture GFS Master Client Masters Replicas GFS Master Client Client C0 C1 C1 C0 C5 … C5 C2 C5 C3 C2 Chunkserver 1 Chunkserver 2 Chunkserver N Masters manage metadata (naming, chunk location, etc.) Data transfers happen directly between clients/chunkservers Files are broken into chunks (typically 64 MB) each chunk replicated on 3 chunkservers Clients do not cache data! 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 19

© 2010 Gribble, Lazowska, Levy, Zahorjan GFS: Reading Single master vastly simplifies design Clients never read and write file data through the master. Instead, a client asks the master which chunk servers it should contact Using the fixed chunk size, the client translates the file name and byte offset specified by the application into a chunk index within the file It sends the master a request containing the file name and chunk index. The master replies with the corresponding chunk handle and locations of the replicas. The client caches this information using the file name and chunk index as the key The client then sends a request to one of the replicas, most likely the closest one. The request specifies the chunk handle and a byte range within that chunk 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 20

© 2010 Gribble, Lazowska, Levy, Zahorjan GFS: Writing Client asks master for identity of primary and secondary replicas (chunk servers) Client pushes data to memory at all replicas via a replica-to-replica “chain” Client sends write request to primary Primary orders concurrent requests, and triggers disk writes at all replicas Primary reports success or failure to client The write is transactional 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

Summary of Distributed File Systems There are a number of issues to deal with: what is the basic abstraction? naming caching sharing and coherency replication performance workload No right answer! Different systems make different tradeoffs… 5/7/2019 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan 22

© 2010 Gribble, Lazowska, Levy, Zahorjan Performance is always an issue always a tradeoff between performance and the semantics of file operations (e.g., for shared files). Caching of file blocks is crucial in any file system maintaining coherency is a crucial design issue. Newer systems are dealing with issues such as disconnected operation for mobile computers, and huge workloads (e.g., Google) 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan

© 2010 Gribble, Lazowska, Levy, Zahorjan Think about … NFS, AFS, Sprite, GFS How do they differ? What are the key properties of each? What about the intended environment for each drove these differences? 5/7/2019 © 2010 Gribble, Lazowska, Levy, Zahorjan