Distributed File Systems

Slides:



Advertisements
Similar presentations
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Google File System.
Distributed File SystemsCS-4513, D-Term Distributed File Systems CS-4513 Distributed Computing Systems (Slides include materials from Operating System.
NETWORK FILE SYSTEM (NFS) By Ameeta.Jakate. NFS NFS was introduced in 1985 as a means of providing transparent access to remote file systems. NFS Architecture.
Case Study - GFS.
File Systems (2). Readings r Silbershatz et al: 11.8.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Networked File System CS Introduction to Operating Systems.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
NFS : Network File System SMU CSE8343 Prof. Khalil September 27, 2003 Group 1 Group members: Payal Patel, Malka Samata, Wael Faheem, Hazem Morsy, Poramate.
Distributed File SystemsCS-502 Fall Distributed File Systems CS-502 Operating Systems (Slides include materials from Operating System Concepts, 7.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Distributed File SystemsCS-502 (EMC) Fall Distributed File Systems (and related topics) CS-502, Operating Systems Fall 2009 (EMC) (Slides include.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
EE324 INTRO TO DISTRIBUTED SYSTEMS. Distributed File System  What is a file system?
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Chapter 17: Distributed-File Systems
Distributed File Systems
Google File System.
Andrew File System (AFS)
File System Implementation
Nache: Design and Implementation of a Caching Proxy for NFSv4
Nache: Design and Implementation of a Caching Proxy for NFSv4
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Chapter 17: Distributed-File Systems
Outline Distributed File Systems – continued 11/27/2018 COP5611.
Today: Coda, xFS Case Study: Coda File System
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Distributed File Systems
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Chapter 15: File System Internals
Today: Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
THE GOOGLE FILE SYSTEM.
Distributed File Systems
Database System Architectures
Distributed File Systems
Chapter 17: Distributed-File Systems
Distributed File Systems
Network File System (NFS)
M05 DISTRIBUTED FILE SYSTEM
Presentation transcript:

Distributed File Systems CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

What is a Distributed File System DFS: distributed implementation of a file system Files, file servers, and users are all dispersed around the network Main Goal: DFS should look like a centralized file system to a user Ability to open and update any file on any machine on the network Ability to share files, the same as shared local files DESIGN GOALS Network Transparency High Availability Scalability Concurrency INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Design Goals Network transparency Same set of file operations for both local and remote files Uniform naming scheme for local and remote files Similar access time for local and remote files High availability Files should not become unavailable for “small” failures Scalability No. of users, file servers, files handled etc. Concurrency Handle concurrent access by different clients in the network INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

DFS Architecture Client Server Architecture Network File System (NFS) Cluster Based Architecture Google File System INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Design Issues : Naming Schemes Files named by combination of host name and local name Not location transparent Single global namespace spanning all files Location transparent All systems see the same directory structure A server crash affects all machines Example: Andrew File System (AFS) INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

DNS Names Mount remote directories to local directories Location transparent Directory structure seen by different machines possibly different Mount only when needed, what is needed Mounted remote directories can be accessed transparently Example: Network File System (NFS) INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Mounting Remote Directories (An Example: NFS) INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Pathname Translation Consider /remote/vu/mbox in A /remote/vu looked up in A the usual way Low level entry for vu contains information that it is actually /users/steen in server Request sent to server to look up /users/steen/mbox What if the name crosses multiple machine boundaries? INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Caching to Improve Performance Reduce network traffic by retaining recently accessed disk blocks in local cache Repeated accesses to the same information can be handled locally from the cached copy If data accessed not in cache, copy of data brought from the server to the local cache Copies of parts of file may be scattered in different caches. Cache-consistency problem – keeping the cached copies consistent with the master file in the presence of write operations INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Design Issue: Caching Cache location Granularity of cached data Update policy Cache consistency INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Cache Location In client memory Faster access Good when local usage is transient Enables diskless workstations On client disk Good when local usage dominates Can cache larger files Helps protect clients from server crashes INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Granularity of Cached Data Whole file Easier to maintain consistency High transfer time, specially if only a small part of the file is accessed One or more blocks Low transfer time, transfer as and when needed More complex consistency issues In general, increasing granularity means higher chance of finding next accessed data, but also higher transfer time INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Cache Update Policies When is cached data at the client updated in the master file? Write-through – write data through to disk when cache is updated Reliable, but poor performance Delayed-write – cache and then write to the server later Write operations complete quickly; some data may be overwritten in cache, saving needless network I/O Poor reliability Unwritten data may be lost when client machine crashes Inconsistent data Variation – write dirty blocks back at regular intervals INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Cache Consistency Is locally cached copy of the data consistent with the master copy? Client-initiated approach Client initiates a validity check with server Server verifies local data with the master copy Can use timestamps, version no…. Server-initiated approach Server records (parts of) files cached in each client When server detects a potential inconsistency, it invalidates the client caches INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Sharing Semantics in DFS Unix Semantics Read after write Every change becomes instantly visible to all processes Session Semantics Same as unix semantics for local clients of a file For remote clients, changes to a file are initially visible to the process that modified the file. Only when the file is closed, the changes are visible to other processes. Immutable files INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Stateless vs Stateful Service Stateless service No open and close calls to access files Each request identifies the file and position in file No client state information in server by making each request self-contained Advantages Better failure recovery Disadvantage Longer request messages, longer processing time INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Identifier used for subsequent accesses until session ends Stateful Service Client opens a file Server fetches information about file from disk, stores in server memory Returns to client a connection identifier unique to client and open file Identifier used for subsequent accesses until session ends Server must reclaim space used by no longer active clients Advantages Increased performance; fewer disk accesses Disadvantages Poor failure recovery INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Network File System (NFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Main Idea Server exports parts of a filesystem to clients (Ex. specified in /etc/exports file in Linux) Clients “mount” server exported namespace at specific mount points in its namespace tree (specified in /etc/fstab in Linux; can be mounted separately by mount command also) Same server filesystem can be mounted at different mount points at different clients, so no single global name space Mounts can be cascaded But client must mount from different servers explicitly itself INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Nested Mounting (NFS v3) INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Basic NFS Architecture INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Other details Location transparent naming (host id needed only during mount) Pathname translation Look up each component in the pathname recursively Lookup call made to server when a mount point is crossed Cascading mounts can involve multiple servers during translation Set of operations provided for operations on files Lookup, read, write, mkdir, rename, rmdir, readdir, getattr, setattr… No open or close calls (stateless server) Most data-modifying calls are synchronous Most calls work on an opaque (to client) file handle returned to the client by the server Uniquely identifies the file in the server RPC (Remote Procedure Call) used to communicate between server and client INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Caching in NFSv3 Strictly not part of NFS protocol, but used in practice for performance File attributes cached along with file block. Cached file block used subject to consistency checks on file attributes 8 Kb blocks for v2, negotiable for v3 No locking, no synchronization as part of NFS Separate centralized locking mechanism using lockd No clear semantics because of caching nature INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4 Improvements Improved access and good performance on the Internet Transit across firewalls, perform well in high latency low bandwidth scenarios, scale to very large numbers of clients per server Strong security with negotiation built into the protocol. Supports the RPCSEC_GSS protocol, provides a mechanism for clients and servers to negotiate security parameters, requires clients and servers to support a minimal set of security schemes Good cross-platform interoperability INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: COMPOUND Operation Allows building more complex requests by combining one or more traditional NFS procedures Example: read data from a file in one request by combining LOOKUP, OPEN, and READ operations in a single COMPOUND RPC. Reduces the number of RPCs needed for logical file system operations The operations combined within a COMPOUND request are evaluated in order by the server. Once an operation returns a failing result, the evaluation ends and the results of all evaluated operations are returned to the client Still uses filehandle as the id of the file. Passes a filehandle from one operation to another within the sequence of operations INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Open and Close The NFS version 4 protocol introduces OPEN and CLOSE operations Stateful model The OPEN operation provides a single point where file lookup, creation, and share semantics can be combined The CLOSE operation also provides for the release of state accumulated by OPEN. INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Locking Locking No separate lock manager as in v3 Lease based locking model as part of NFS State associated with file locks maintained at the server INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Caching The client checks validity of cached file data when the file is opened Checks with server if the file has been changed Based on this, the client determines if the cached file data should kept or released When the file is closed, any modified data is written to the server Serializing access to file data can be done through locking INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Open Delegation Server can delegate some responsibilities to the client for a file At OPEN, the server may provide the client either a read or write delegation for the file Assures certain semantics to the client If the client is granted an open delegation, the client machine can locally handle open and close operations from other clients in the same machine. Allows the client to locally service operations such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate interaction with the server INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Recalling Delegation Delegations can be recalled by the server If another client requests access to the file such that the access conflicts with the granted delegation, the server can recall the delegation from initial client Problems with delegation and stateful servers Failure handling INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Replication locations attribute at server can store the location of a file system If a file system is migrated in between, the client will receive an error when operating on the file system The client can then probe the server query for the new file system location. The attribute can also hold multiple server locations to allow replication of file systems Client can use its own policies to access one of the replicas INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

NFS v4: Security Use of RPCSEC_GSS Security framework to support multiple ways of setting up secure channels Supports message integrity and confidentiality Should be configured with Kerberos Support for authentication It must also support a method LIPKEY LIPKEY is a public key system Clients are authenticated using passwords Servers are authenticated using a public key INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR