Chapter 20 Distributed File Systems Copyright © 2008
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction Design Issues in Distributed File Systems Transparency Semantics of File Sharing Fault Tolerance DFS Performance Case Studies
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere3 Design Issues in Distributed File Systems
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere4 Overview of DFS Operation Remote file processing model File server agent and client agent are analogous to RPC’s stub processes For efficiency, the client agent and the cache manager are typically rolled into a single unit
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere5 Transparency In a conventional file system, a user identifies a file through a path name –User is aware that file belongs in a specific directory, but is not aware of its location in the system Location info field of the file’s directory entry indicates the file’s location on disk Location transparency can be provided in a DFS through a similar mechanism –Location info: (node id, location) Location independence requires information in location info field to vary dynamically
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere6 Semantics of File Sharing Semantics determine manner in which effect of file manipulations performed by concurrent users of a file are visible to one another
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere7 Semantics of File Sharing (continued) A session consists of some clients of a file that are located in the same node of a system Problem with session semantics: poor portability Session semantics are easy to implement in a DFS employing file caching –File changes are not visible to clients in other nodes
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere8 Fault Tolerance File system reliability has several facets: –A file must be robust, recoverable, available Robustness is achieved using techniques for reliable storage of data Robustness and recoverability depend on how files are stored and backed up, respectively Availability depends on how files are opened and accessed Only defense against client node crashes is use of transaction semantics in file server
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere9 Fault Tolerance (continued)
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere10 Availability File is available if a copy can be opened and accessed by client –Ability to open file depends on path name resolution –Access requires functional client and server nodes An anomalous situation may arise when path names span many nodes –If a node in path crashes, file operation will fail even if the node that contains the file has not crashed Solution: cached directories File replication is transparent to clients –Updating techniques: 2PC, use of primary copies
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere11 Client and Server Node Failures File server can maintain FCBs and OFT in memory –Stateful design –Good performance –Problems in event of client and server crashes Solution: client and file server share a virtual circuit –Virtual circuit “owns” the file processing actions and resources like file server metadata –Actions and resources become orphans after crash Actions are rolled back and metadata destroyed –Client–server protocol implementing transaction semantics may be used to ensure this
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere12 Stateless File Servers File server does not maintain state information about file processing activity Client must: –Keep state information about file processing activity –Provide all relevant information in a file system call read (“ alpha ”,, ); Many actions traditionally performed only at file open time are repeated at every file operation If file server crashes, time-outs and retransmissions occur in client Cannot employ file caching
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere13 DFS Performance DFS design is scalable if DFS performance doesn’t degrade with increase in size of distributed system
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere14 Efficient File Access Inherent efficiency of file access depends on how the operation of a file server is structured Two server structures that provide efficient file access: –Multithreaded file server –Hint-based file server State information is used as a hint Server operation is stateless if hint is not available
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere15 File Caching File cache and copy of file on disk in server node form a memory hierarchy –Operation of the file cache and its benefits are analogous to those of a CPU cache Chunks of file data are loaded from the file server into the file cache Studies of file size distributions indicate small average file size –Whole-file caching is feasible File server may use a separate attributes cache
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere16 File Caching (continued) Key issues: –Location of the file cache: memory or disk –File updating policy: write-through or delayed write –Cache validation policy: client- or server- initiated –Chunk size: large or small? Fixed or variable?
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere17 Scalability DFS scalability achieved through techniques that localize most data traffic generated by file processing activities within clusters –Clusters typically represent subnets like high-speed LANs –An increase in the number of clusters does not lead to degradation of performance It does not add much network traffic
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere18 Case Studies Sun Network File System Andrew and Coda File Systems GPFS Windows
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere19 Sun Network File System VFS implements mount protocol and creates a system- wide unique vnode for each file NFS layer interacts with remote node containing file through NFS protocol
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere20 Sun Network File System (continued) Several techniques to improve performance –A directory names cache is used in each client node –A file attributes cache caches inode information Cached attributes are discarded after 3 seconds for files and after 30 seconds for directories –File blocks cache is the conventional file cache Server uses large (8 Kbytes) data blocks Cache validation performed through timestamps associated with each file, and cache block File server is stateless Neither Unix semantics nor session semantics
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere21 Andrew and Coda File Systems Targeted at gigantic distributed systems All clients have an identical shared name space –Is location transparent in nature –Implemented by dedicated servers (Vice) Clusters localize file processing activities –Traffic within cluster reduced by caching entire file on local disk A volume typically contains files of a single user 64 KB chunks (size adapted on a per-client basis) User process called Venus performs open/close
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere22 Andrew and Coda File Systems (continued) Server-initiated cache validation using callbacks Path name resolution performed on a component-by- component basis –Venus maintains a mapping cache File servers are multithreaded Client–server communication uses RPCs Two features to achieve high availability: –Replication and disconnected operation Read one, write all policy Supports hoarding of files
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere23 GPFS General parallel file system: high-performance shared- disk file system –For large computing clusters operating under Linux Uses data striping across all disks in cluster –A large-size block (strip) used to minimize seek overhead during a file read/write A smaller subblock is used for small files –Locking used to maintain consistency of file data Lock granularity is as coarse as possible, but as fine as necessary Centralized lock manager and few distributed lock managers
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere24 GPFS (continued) Notion of lock tokens to reduce latency and overhead of locking Race conditions may arise over metadata of a file –Solution: one of the nodes is designated as the metanode for the file; it performs file updates Central allocation manager partitions free space map and gives one partition to each node Each node writes a separate journal for recovery If network is partitioned, only nodes in the majority partition can perform file processing at any time
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere25 Windows Windows Server 2003 provides two features for data replication and data distribution: –Remote differential compression (RDC) –DFS namespaces Replication organized using notion of a replication group DFS namespace is created by a system administrator Other key concepts: referrals and hot standbys
Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere26 Summary Transparency concerns association between path name of a file and location of the file File sharing semantics may differ between DFSs: –Unix semantics –Session semantics –Transaction semantics (atomic transactions) Stateless server design provides high availability –Notion of a hint used to improve performance DFS uses file caching to improve performance –Cache coherence techniques are needed