Chapter 20 Distributed File Systems Copyright © 2008.

Slides:



Advertisements
Similar presentations
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Advertisements

Distributed Storage March 12, Distributed Storage What is Distributed Storage?  Simple answer: Storage that can be shared throughout a network.
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Distributed File Systems 17: Distributed File Systems
Distributed Systems 2006 Styles of Client/Server Computing.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Computer Science Lecture 21, page 1 CS677: Distributed OS Today: Coda, xFS Case Study: Coda File System Brief overview of other recent file systems –xFS.
Jeff Chheng Jun Du.  Distributed file system  Designed for scalability, security, and high availability  Descendant of version 2 of Andrew File System.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
File Systems (2). Readings r Silbershatz et al: 11.8.
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM R. Sandberg, D. Goldberg S. Kleinman, D. Walsh, R. Lyon Sun Microsystems.
Distributed File Systems Sarah Diesburg Operating Systems CS 3430.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Distributed Systems Principles and Paradigms Chapter 10 Distributed File Systems 01 Introduction 02 Communication 03 Processes 04 Naming 05 Synchronization.
Distributed File Systems 1 CS502 Spring 2006 Distributed Files Systems CS-502 Operating Systems Spring 2006.
Networked File System CS Introduction to Operating Systems.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
Chapter 13 File Systems Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
Distributed File Systems Case Studies: Sprite Coda.
Distributed File Systems Distributed file system (DFS) – a distributed implementation of the classical time-sharing model of a file system, where multiple.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Distributed file systems, Case studies n Sun’s NFS u history u virtual file system and mounting u NFS protocol u caching in NFS u V3 n Andrew File System.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Chapter 6.5 Distributed File Systems Summary Junfei Wen Fall 2013.
Presented By: Samreen Tahir Coda is a network file system and a descendent of the Andrew File System 2. It was designed to be: Highly Highly secure Available.
DISTRIBUTED FILE SYSTEMS
Jinyong Yoon,  Andrew File System  The Prototype  Changes for Performance  Effect of Changes for Performance  Comparison with A Remote-Open.
Sun Network File System Presentation 3 Group A4 Sean Hudson, Syeda Taib, Manasi Kapadia.
Information Management NTU Distributed File Systems.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Distributed File Systems Architecture – 11.1 Processes – 11.2 Communication – 11.3 Naming – 11.4.
Distributed File Systems Group A5 Amit Sharma Dhaval Sanghvi Ali Abbas.
Review CS File Systems - Partitions What is a hard disk partition?
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
Distributed Systems: Distributed File Systems Ghada Ahmed, PhD. Assistant Prof., Computer Science Dept. Web:
Case Study -- Sun’s Network File System (NFS) NFS is popular and widely used. NFS was originally designed and implemented by Sun Microsystems for use on.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
An Introduction to GPFS
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Andrew File System (AFS)
File System Implementation
Storage Virtualization
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Today: Coda, xFS Case Study: Coda File System
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Distributed File Systems
CSE 451: Operating Systems Spring Module 21 Distributed File Systems
Distributed File Systems
CSE 451: Operating Systems Winter Module 22 Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Distributed File Systems
Distributed File Systems
Presentation transcript:

Chapter 20 Distributed File Systems Copyright © 2008

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction Design Issues in Distributed File Systems Transparency Semantics of File Sharing Fault Tolerance DFS Performance Case Studies

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere3 Design Issues in Distributed File Systems

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere4 Overview of DFS Operation Remote file processing model File server agent and client agent are analogous to RPC’s stub processes For efficiency, the client agent and the cache manager are typically rolled into a single unit

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere5 Transparency In a conventional file system, a user identifies a file through a path name –User is aware that file belongs in a specific directory, but is not aware of its location in the system Location info field of the file’s directory entry indicates the file’s location on disk Location transparency can be provided in a DFS through a similar mechanism –Location info: (node id, location) Location independence requires information in location info field to vary dynamically

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere6 Semantics of File Sharing Semantics determine manner in which effect of file manipulations performed by concurrent users of a file are visible to one another

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere7 Semantics of File Sharing (continued) A session consists of some clients of a file that are located in the same node of a system Problem with session semantics: poor portability Session semantics are easy to implement in a DFS employing file caching –File changes are not visible to clients in other nodes

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere8 Fault Tolerance File system reliability has several facets: –A file must be robust, recoverable, available Robustness is achieved using techniques for reliable storage of data Robustness and recoverability depend on how files are stored and backed up, respectively Availability depends on how files are opened and accessed Only defense against client node crashes is use of transaction semantics in file server

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere9 Fault Tolerance (continued)

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere10 Availability File is available if a copy can be opened and accessed by client –Ability to open file depends on path name resolution –Access requires functional client and server nodes An anomalous situation may arise when path names span many nodes –If a node in path crashes, file operation will fail even if the node that contains the file has not crashed Solution: cached directories File replication is transparent to clients –Updating techniques: 2PC, use of primary copies

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere11 Client and Server Node Failures File server can maintain FCBs and OFT in memory –Stateful design –Good performance –Problems in event of client and server crashes Solution: client and file server share a virtual circuit –Virtual circuit “owns” the file processing actions and resources like file server metadata –Actions and resources become orphans after crash Actions are rolled back and metadata destroyed –Client–server protocol implementing transaction semantics may be used to ensure this

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere12 Stateless File Servers File server does not maintain state information about file processing activity Client must: –Keep state information about file processing activity –Provide all relevant information in a file system call read (“ alpha ”,, ); Many actions traditionally performed only at file open time are repeated at every file operation If file server crashes, time-outs and retransmissions occur in client Cannot employ file caching

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere13 DFS Performance DFS design is scalable if DFS performance doesn’t degrade with increase in size of distributed system

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere14 Efficient File Access Inherent efficiency of file access depends on how the operation of a file server is structured Two server structures that provide efficient file access: –Multithreaded file server –Hint-based file server State information is used as a hint Server operation is stateless if hint is not available

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere15 File Caching File cache and copy of file on disk in server node form a memory hierarchy –Operation of the file cache and its benefits are analogous to those of a CPU cache Chunks of file data are loaded from the file server into the file cache Studies of file size distributions indicate small average file size –Whole-file caching is feasible File server may use a separate attributes cache

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere16 File Caching (continued) Key issues: –Location of the file cache: memory or disk –File updating policy: write-through or delayed write –Cache validation policy: client- or server- initiated –Chunk size: large or small? Fixed or variable?

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere17 Scalability DFS scalability achieved through techniques that localize most data traffic generated by file processing activities within clusters –Clusters typically represent subnets like high-speed LANs –An increase in the number of clusters does not lead to degradation of performance It does not add much network traffic

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere18 Case Studies Sun Network File System Andrew and Coda File Systems GPFS Windows

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere19 Sun Network File System VFS implements mount protocol and creates a system- wide unique vnode for each file NFS layer interacts with remote node containing file through NFS protocol

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere20 Sun Network File System (continued) Several techniques to improve performance –A directory names cache is used in each client node –A file attributes cache caches inode information Cached attributes are discarded after 3 seconds for files and after 30 seconds for directories –File blocks cache is the conventional file cache Server uses large (8 Kbytes) data blocks Cache validation performed through timestamps associated with each file, and cache block File server is stateless Neither Unix semantics nor session semantics

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere21 Andrew and Coda File Systems Targeted at gigantic distributed systems All clients have an identical shared name space –Is location transparent in nature –Implemented by dedicated servers (Vice) Clusters localize file processing activities –Traffic within cluster reduced by caching entire file on local disk A volume typically contains files of a single user 64 KB chunks (size adapted on a per-client basis) User process called Venus performs open/close

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere22 Andrew and Coda File Systems (continued) Server-initiated cache validation using callbacks Path name resolution performed on a component-by- component basis –Venus maintains a mapping cache File servers are multithreaded Client–server communication uses RPCs Two features to achieve high availability: –Replication and disconnected operation Read one, write all policy Supports hoarding of files

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere23 GPFS General parallel file system: high-performance shared- disk file system –For large computing clusters operating under Linux Uses data striping across all disks in cluster –A large-size block (strip) used to minimize seek overhead during a file read/write A smaller subblock is used for small files –Locking used to maintain consistency of file data Lock granularity is as coarse as possible, but as fine as necessary Centralized lock manager and few distributed lock managers

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere24 GPFS (continued) Notion of lock tokens to reduce latency and overhead of locking Race conditions may arise over metadata of a file –Solution: one of the nodes is designated as the metanode for the file; it performs file updates Central allocation manager partitions free space map and gives one partition to each node Each node writes a separate journal for recovery If network is partitioned, only nodes in the majority partition can perform file processing at any time

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere25 Windows Windows Server 2003 provides two features for data replication and data distribution: –Remote differential compression (RDC) –DFS namespaces Replication organized using notion of a replication group DFS namespace is created by a system administrator Other key concepts: referrals and hot standbys

Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere26 Summary Transparency concerns association between path name of a file and location of the file File sharing semantics may differ between DFSs: –Unix semantics –Session semantics –Transaction semantics (atomic transactions) Stateless server design provides high availability –Notion of a hint used to improve performance DFS uses file caching to improve performance –Cache coherence techniques are needed