CS 843 - Distributed Computing Systems Chapter 8: Distributed File Systems Chin-Chih Chang, chang@cs.twsu.edu From Coulouris, Dollimore and Kindberg Distributed.

CS Distributed Computing Systems Chapter 8: Distributed File Systems Chin-Chih Chang, From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001

Distributed File System
.File Service Architecture Sun NFS Andrew File System Advances and Summary 2

 share information in the form of files throughout an intranet.
Introduction Purpose  share information in the form of files throughout an intranet. Goal  provide access to files stored at a server similar to files on local disk 3

Figure 8.1 Storage systems and their properties
Sharing Persis- Distributed Consistency Example tence cache/replicas maintenance Main memory 1 RAM File system 1 UNIX file system Distributed file system Sun NFS Web server Web Distributed shared memory Ivy (Ch. 16) Remote objects (RMI/ORB) 1 CORBA Persistent object store 1 CORBA Persistent Object Service Persistent distributed object store PerDiS, Khazana 4

Characteristics of file systems -- files contains data and attributes
Introduction (cont…) Characteristics of file systems -- files contains data and attributes -- file system is for storing and managing large numbers of files, also controlling access to files -- modules in non-distributed systems 5

Figure 8.2 File system modules
6

Figure 8.3 File attribute record structure
File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list 7

Figure 8.4 UNIX file system operations
filedes = open(name, mode) filedes = creat(name, mode) Opens an existing file with the given name. Creates a new file with the given name. Both operations deliver a file descriptor referencing the open file. The mode is read, write or both. status = close(filedes) Closes the open file filedes. count = read(filedes, buffer, n) count = write(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the read-write pointer. pos = lseek(filedes, offset, whence) Moves the read-write pointer to offset (relative or absolute, depending on whence). status = unlink(name) Removes the file name from the directory structure. If the file has no other names, it is deleted. status = link(name1, name2) Adds a new name (name2) for a file (name1). status = stat(name, buffer) Gets the file attributes for file name into buffer. 8

Distributed File System Requirements
Introduction (cont…) Distributed File System Requirements Transparency Concurrent File Updates File Replication HW and OS heterogeneity Fault tolerance Consistency Security Efficiency 9

Transparency Introduction (cont…) Access Transparency
Location Transparency Mobility Transparency Performance Transparency Scaling Transparency 10

Case studies Introduction (cont…)
Abstract File Service Architecture Model SUN NFS (network file system) key interface published; access transparency; support for hardware and OS heterogeneity Andrew File System (from CMU) support information sharing on a large scale by minimizing client-server communication 11

File Service Architecture
Flat file service Operations on the contents of files Unique file identifiers (UFIDs) Directory service Mapping (text names – UFIDs) client of flat file service Client module integrating and extending file and directory services 12

Figure 8.5 File service architecture
Client computer Server computer Application program Client module Flat file service Directory service 13

Figure 8.6 Flat file service interface
Read(FileId, i, n) -> Data — throws BadPosition If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items from a file starting at item i and returns it in Data. Write(FileId, i, Data) If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a file, starting at item i, extending the file if necessary. Create() -> FileId Creates a new file of length 0 and delivers a UFID for it. Delete(FileId) Removes the file from the file store. GetAttributes(FileId) -> Attr Returns the file attributes for the file. SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not shaded in Figure 8.3). 14

File Service Architecture (cont…)
Comparing with UNIX file system interface -- functionally equivalent -- no open and close operation Why -- idempotent (at-least-once RPC semantics) -- Stateless Access control -- performed at server 15

Figure 8.7 Directory service interface
Lookup(Dir, Name) -> FileId — throws NotFound Locates the text name in the directory and returns the relevant UFID. If Name is not in the directory, throws an exception. AddName(Dir, Name, File) NameDuplicate If Name is not in the directory, adds (Name, File) to the directory and updates the file’s attribute record. If Name is already in the directory: throws an exception. UnName(Dir, Name) If Name is in the directory: the entry containing Name is removed from the directory. If Name is not in the directory: throws an exception. GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match the regular expression Pattern. 16

File Service Architecture (cont…)
Hierarchic file system -- tree structure -- implemented by client module -- UNIX file naming scheme is not strictly hierarchy (link) File grouping -- moved between servers -- unique ID = IP address + date 17

SUN NFS architecture UNIX kernel protocol Client computer
Server computer system calls Local Remote UNIX file system NFS client server Application program Virtual file system Other file system 18

NFS protocol is OS independent
SUN NFS (cont…) NFS protocol is OS independent NFS server and client modules reside in kernel, communicate using RPC NFS provides access transparency, achieved by a virtual file system -- File handle: File-system-ID + i-node-number-of- file + i-node-generation number 19

NFS client module supplies interface for application program
SUN NFS (cont…) NFS client module supplies interface for application program Access control and authentication checked by server -- NFS server is stateless 20

Figure 8.9 NFS server operations (simplified) – 1
lookup(dirfh, name) -> fh, attr Returns file handle and attributes for the file name in the directory dirfh. create(dirfh, name, attr) -> newfh, attr Creates a new file name in directory dirfh with attributes attr and returns the new file handle and attributes. remove(dirfh, name) status Removes file name from directory dirfh. getattr(fh) -> attr Returns file attributes of file fh. (Similar to the UNIX stat system call.) setattr(fh, attr) -> attr Sets the attributes (mode, user id, group id, size, access time and modify time of a file). Setting the size to 0 truncates the file. read(fh, offset, count) -> attr, data Returns up to count bytes of data from a file starting at offset. Also returns the latest attributes of the file. write(fh, offset, count, data) -> attr Writes count bytes of data to a file starting at offset. Returns the attributes of the file after the write has taken place. rename(dirfh, name, todirfh, toname) -> status Changes the name of file name in directory dirfh to toname in directory to todirfh . link(newdirfh, newname, dirfh, name) Creates an entry newname in the directory newdirfh which refers to file name in the directory dirfh. Continues on next slide ... 21

Figure 8.9 NFS server operations (simplified) – 2
symlink(newdirfh, newname, string) -> status Creates an entry newname in the directory newdirfh of type symbolic link with the value string. The server does not interpret the string but makes a symbolic link file to hold it. readlink(fh) -> string Returns the string that is associated with the symbolic link file identified by fh. mkdir(dirfh, name, attr) -> newfh, attr Creates a new directory name with attributes attr and returns the new file handle and attributes. rmdir(dirfh, name) -> status Removes the empty directory name from the parent directory dirfh. Fails if the directory is not empty. readdir(dirfh, cookie, count) -> entries Returns up to count bytes of directory entries from the directory dirfh. Each entry contains a file name, a file handle, and an opaque pointer to the next directory entry, called a cookie. The cookie is used in subsequent readdir calls to start reading from the following entry. If the value of cookie is 0, reads from the first entry in the directory. statfs(fh) -> fsstats Returns file system information (such as block size, number of free blocks and so on) for the file system containing a file fh. 22

Mount service -- path name translate -- automounter SUN NFS (cont…)
UNIX: file pathname  i-node reference NFS: performed by client – lookup -- automounter 23

Figure 8.10 Local and remote file systems accessible on an NFS client
Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. 24

Caching -- server caching -- client caching Other optimization
SUN NFS (cont…) Caching -- server caching .caches recently read disk blocks .write-through or write-commit -- client caching . caches the results of read, write, … . timestamp-based method to validata Other optimization 25

Performance By Sandberg (1987) By Sun using LADDIS
SUN NFS (cont…) Performance By Sandberg (1987) good in access of files except frequent use of getattr call and poor performance of write operation By Sun using LADDIS NFS offers effective solution to distributed storage needs in intranets of most sizes and types of use 26

Follows abstract model in section 8.2
SUN NFS Summary Follows abstract model in section 8.2 Good location and access transparency Support heterogeneous hardware and OSs Server implementation is stateless Enhanced by caching No support for migration of file or file systems 27

Provides transparent access to remote shared files Compatible with NFS
Andrew File System Provides transparent access to remote shared files Compatible with NFS Perform well with larger number of active users 28

Andrew File System(cont…)
Two unusual design characteristics: -- whole-file serving .entire contents of files are transmitted to client computers -- whole-file caching .locally cached copies usually remind valid for long periods .files in regular use are normally retained in cache 29

Design is based on some observations: --files are small, < 10K --read operations are much more common than write --sequential access is common, random access is rare --most files are using by only one user, when being shared, usually only one user can modify. --files are referenced in bursts 30

Figure 8.11 Distribution of processes in the Andrew File System
31

Figure 8.12 File name space seen by clients of AFS
32

Figure 8.13 System call interception in AFS
33

Figure 8.14 Implementation of file system calls in AFS
34

Cache consistency -- callback promise valid cancelled -- update semantics best approximation to one-copy file semantics no mechanism for the control of concurrent updates 35

Figure 8.15 The main components of the Vice service interface
Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified file. Create() -> fid Creates a new file and records a callback promise on it. Remove(fid) Deletes the specified file. SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. ReleaseLock(fid) Unlocks the specified file or directory. RemoveCallback(fid) Informs server that a Venus process has flushed a file from its cache. BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file. 36

UNIX kernel modification Location database Bulk transfers Partial file caching Performance Wide-area support 37

NFS enhancements Recent Advances achieving one-copy update semantics
-- Spritely NFS -- Not Quite NFS -- WebNFS -- NFS version 4 38

Recent Advances (cont…)
AFS enhancements -- DCE/DFS ( -- Improvements in storage organization RAID and LFS New design approaches -- xFS -- Frangipani 39

good scalability, whole-file serving and caching
Summary Key design issues caching, consistency, failure tolerance, high throughput, scalability NFS stateless protocol, high-performance AFS good scalability, whole-file serving and caching 40

Distributed Systems Course Distributed File Systems
Chapter 2 Revision: Failure model Chapter 8: 8.1 Introduction 8.2 File service architecture 8.3 Sun Network File System (NFS) [8.4 Andrew File System (personal study)] 8.5 Recent advances 8.6 Summary George took about 3 hours to present this material.

Learning objectives Understand the requirements that affect the design of distributed services NFS: understand how a relatively simple, widely-used service is designed Obtain a knowledge of file systems, both local and networked Caching as an essential design technique Remote interfaces are not the same as APIs Security requires special consideration Recent advances: appreciate the ongoing research that often leads to major advances 42 *

Chapter 2 Revision: Failure model
Figure 2.11 Class of failure Affects Description Fail-stop Process Process halts and remains halted. Other processes may detect this state. Crash not be able to detect this state. Omission Channel A message inserted in an outgoing message buffer never arrives at the other end’s incoming message buffer. Send-omission A process completes a send, but the message is not put in its outgoing message buffer. Receive-omission A message is put in a process’s incoming message buffer, but that process does not receive it. Arbitrary (Byzantine) or channel Process/channel exhibits arbitrary behaviour: it may send/transmit arbitrary messages at arbitrary times, commit omissions; a process may stop or take an incorrect step. Note that the independence of failure modes is a special characteristic of distributed system. Programs running on a single computer usually all fail together (with some exceptions). So we must designed distributed systems to tolerate the independent failures (fail-stop, crash or omission). I.e. a file server should tolerate the repeated excecution of requests to deal with omission errors and it should it should offer a service that can recover from crashes and continue serving without requiring special action by clients. 43 *

Storage systems and their properties
In first generation of distributed systems ( ), file systems (e.g. NFS) were the only networked storage systems. With the advent of distributed object systems (CORBA, Java) and the web, the picture has become more complex. The table of storage systems and their properties will appear on next slide. 44 *

Storage systems and their properties
Types of consistency between copies: 1 - strict one-copy consistency √ - approximate consistency X - no automatic consistency Figure 8.1 Sharing Persis- tence Distributed cache/replicas Consistency maintenance Example Main memory RAM File system UNIX file system Distributed file system Sun NFS Web Web server Distributed shared memory Ivy (Ch. 16) Remote objects (RMI/ORB) CORBA Persistent object store 1 CORBA Persistent Object Service Persistent distributed object store PerDiS, Khazana Popup: The items appear in pairs. 45 *

What is a file system? 1 * Persistent stored data sets
Hierarchic name space visible to all processes API with the following characteristics: access and update operations on persistently stored data sets Sequential access model (with additional random facilities) Sharing of data between users, with access control Concurrent access: certainly for read-only access what about updates? Other features: mountable file stores more? ... 46 *

What is a file system? 2 * Figure 8.4 UNIX file system operations
filedes = open(name, mode) filedes = creat(name, mode) Opens an existing file with the given name. Creates a new file with the given name. Both operations deliver a file descriptor referencing the open file. The mode is read, write or both. status = close(filedes) Closes the open file filedes. count = read(filedes, buffer, n) count = write(filedes, buffer, n) Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the read-write pointer. pos = lseek(filedes, offset, whence) Moves the read-write pointer to offset (relative or absolute, depending on whence). status = unlink(name) Removes the file name from the directory structure. If the file has no other names, it is deleted. status = link(name1, name2) Adds a new name (name2) for a file (name1). status = stat(name, buffer) Gets the file attributes for file name into buffer. Figure 8.4 UNIX file system operations Class Exercise A Write a simple C program to copy a file using the UNIX file system operations shown in Figure 8.4. copyfile(char * oldfile, * newfile) { <you write this part, using open(), creat(), read(), write()> } Note: remember that read() returns 0 when you attempt to read beyond the end of the file. 47 *

What is a file system? 3 Figure 8.2 File system modules Hidden 48 *

What is a file system? 4 * Figure 8.3 File attribute record structure
File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list E.g. for UNIX: rw-rw-r-- updated by system: updated by owner: 49 *

File service requirements
Transparency Concurrency Replication Heterogeneity Fault tolerance Consistency Security Efficiency.. Tranparencies Access: Same operations Location: Same name space after relocation of files or processes Mobility: Automatic relocation of files is possible Performance: Satisfactory performance across a specified range of system loads Scaling: Service can be expanded to meet additional loads Concurrency properties Isolation File-level or record-level locking Other forms of concurrency control to minimise contention Replication properties File service maintains multiple identical copies of files Load-sharing between servers makes service more scalable Local access has better response (lower latency) Fault tolerance Full replication is difficult to implement. Caching (of all or part of a file) gives most of the benefits (except fault tolerance) Heterogeneity properties Service can be accessed by clients running on (almost) any OS or hardware platform. Design must be compatible with the file systems of different OSes Service interfaces must be open - precise specifications of APIs are published. Consistency Unix offers one-copy update semantics for operations on local files - caching is completely transparent. Difficult to achieve the same for distributed file systems while maintaining good performance and scalability. Fault tolerance Service must continue to operate even when clients make errors or crash. at-most-once semantics at-least-once semantics requires idempotent operations Service must resume after a server machine crashes. If the service is replicated, it can continue to operate even during a server crash. Security Must maintain access control and privacy as for local files. based on identity of user making request identities of remote users must be authenticated privacy requires secure communication Service interfaces are open to all processes not excluded by a firewall. vulnerable to impersonation and other attacks Efficiency Goal for distributed file systems is usually performance comparable to local file system. Popup: There are overlays discussing each requirement 50 *

Model file service architecture
Figure 8.5 Lookup AddName UnName GetNames Read Write Create Delete GetAttributes SetAttributes Client computer Server computer Application program Client module Flat file service Directory service Popup shows names of the file and directory operations. If the Unix file primitives are mapped directly onto file server operations, it is impossible to support many of the requirements outlined on the previous slide. Instead, the file service is implemented in part by a module that runs in the the client computer. The Client module implements nearly all of the state-dependent funtionality (open files, position of read-write pointer, etc.)- supporting fault tolerance - and it maintains a cache of recently-used file data - efficiency, limited replication and scalability. It also translates the file operations performed by applications into the API of the file server - transparency and heterogeneity. It supplies authentication information and performs encryption when necessary, but access control checks must be made by the server. 51 *

Server operations for the model file service
Figures 8.6 and 8.7 Flat file service Read(FileId, i, n) -> Data Write(FileId, i, Data) Create() -> FileId Delete(FileId) GetAttributes(FileId) -> Attr SetAttributes(FileId, Attr) Directory service Lookup(Dir, Name) -> FileId AddName(Dir, Name, File) UnName(Dir, Name) GetNames(Dir, Pattern) -> NameSeq position of first byte FileId Note that open files and RW pointer are eliminated from this interface. All Flat File service (and Directory Service) operations are repeatable - idempotent - except Create. Neither of the server modules maintains any state on behalf of the client (unless encryption is used). Class Exercise B Show how each file operation of the program that you wrote in Class Exercise A would be executed using the operations of the Model File Service in Figures 8.6 and 8.7. Pathname lookup Pathnames such as '/usr/bin/tar' are resolved by iterative calls to lookup(), one call for each component of the path, starting with the ID of the root directory '/' which is known in every client. FileId A unique identifier for files anywhere in the network. Similar to the remote object references described in Section 52 *

File Group A collection of files that can be located on any server or moved between servers while maintaining the same names. Similar to a UNIX filesystem Helps with distributing the load of file serving between several servers. File groups have identifiers which are unique throughout the system (and hence for an open system, they must be globally unique). Used to refer to file groups and files To construct a globally unique ID we use some unique attribute of the machine on which it is created, e.g. IP number, even though the file group may move subsequently. IP address date 32 bits 16 bits File Group ID: Pop-up about construction of globally unique ID. 53 *

Case Study: Sun NFS An industry standard for file sharing on local networks since the 1980s An open standard with clear and simple interfaces Closely follows the abstract file service model defined above Supports many of the design requirements already mentioned: transparency heterogeneity efficiency fault tolerance Limited achievement of: concurrency replication consistency security 54 *

NFS protocol (remote operations)
NFS architecture Application program NFS Client Kernel Client computer Figure 8.8 Client computer Server computer Application Application program program Other file system UNIX kernel system calls NFS protocol (remote operations) UNIX Operations on local files Operations on remote files Virtual file system Virtual file system Popup 1 adds more labels UNIX UNIX NFS NFS file file client server system system 55 *

But, for a Unix implementation there are advantages:
NFS architecture: does the implementation have to be in the system kernel? No: there are examples of NFS clients and servers that run at application-level as libraries or processes (e.g. early Windows and MacOS implementations, current PocketPC, etc.) But, for a Unix implementation there are advantages: Binary code compatible - no need to recompile applications Standard system calls that access remote files can be routed through the NFS client module by the kernel Shared cache of recently-used blocks at client Kernel-level server can access i-nodes and file blocks directly but a privileged (root) application program could do almost the same. Security of the encryption key used for authentication. Although the NFS client and server modules reside in the kernel in standard Unix implementations, it is perfectly possible to construct application-level versions. Examples of application-level clients or servers: Windows 3.0, Win 95?, early MacOS, PocketPC 56 *

NFS server operations (simplified)
Figure 8.9 read(fh, offset, count) -> attr, data write(fh, offset, count, data) -> attr create(dirfh, name, attr) -> newfh, attr remove(dirfh, name) status getattr(fh) -> attr setattr(fh, attr) -> attr lookup(dirfh, name) -> fh, attr rename(dirfh, name, todirfh, toname) link(newdirfh, newname, dirfh, name) readdir(dirfh, cookie, count) -> entries symlink(newdirfh, newname, string) -> status readlink(fh) -> string mkdir(dirfh, name, attr) -> newfh, attr rmdir(dirfh, name) -> status statfs(fh) -> fsstats Model flat file service Read(FileId, i, n) -> Data Write(FileId, i, Data) Create() -> FileId Delete(FileId) GetAttributes(FileId) -> Attr SetAttributes(FileId, Attr) Model directory service Lookup(Dir, Name) -> FileId AddName(Dir, Name, File) UnName(Dir, Name) GetNames(Dir, Pattern) ->NameSeq fh = file handle: Filesystem identifier i-node number i-node generation Idempotent read/write getattr is used to implement fstat(), setattr is used to update the attributes NFS Create adds file to directory, whereas we use two operations in MFS Popup shows Model directory service ops Optional: Class Exercise C Show how each file operation of the program that you wrote in Class Exercise A would be executed using the operations of the NFS server operations in Figure 8.9. 57 *

NFS access control and authentication
Stateless server, so the user's identity and access rights must be checked by the server on each request. In the local file system they are checked only on open() Every client request is accompanied by the userID and groupID not shown in the Figure 8.9 because they are inserted by the RPC system Server is exposed to imposter attacks unless the userID and groupID are protected by encryption Kerberos has been integrated with NFS to provide a stronger and more comprehensive security solution Kerberos is described in Chapter 7. Integration of NFS with Kerberos is covered later in this chapter. Kerberized NFS is described below. 58 *

mount(remotehost, remotedirectory, localdirectory)
Mount service Mount operation: mount(remotehost, remotedirectory, localdirectory) Server maintains a table of clients who have mounted filesystems at that server Each client maintains a table of mounted file systems holding: < IP address, port number, file handle> Hard versus soft mounts 59 *

Local and remote file systems accessible on an NFS client
Figure 8.10 Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1; the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2. 60 *

Keeps the mount table small
Automounter NFS client catches attempts to access 'empty' mount points and routes them to the Automounter Automounter has a table of mount points and multiple candidate serves for each it sends a probe message to each candidate server and then uses the mount service to mount the filesystem at the first server to respond Keeps the mount table small Provides a simple form of replication for read-only filesystems E.g. if there are several servers with identical copies of /usr/lib then each server will have a chance of being mounted at some clients. Hidden 61 *

Kerberized NFS Kerberos protocol is too costly to apply on each file access request Kerberos is used in the mount service: to authenticate the user's identity User's UserID and GroupID are stored at the server with the client's IP address For each file request: The UserID and GroupID sent must match those stored at the server IP addresses must also match This approach has some problems can't accommodate multiple users sharing the same client computer all remote filestores must be mounted each time a user logs in 62 *

NFS optimization - server caching
Similar to UNIX file caching for local files: pages (blocks) from disk are held in a main memory buffer cache until the space is required for newer pages. Read-ahead and delayed-write optimizations. For local files, writes are deferred to next sync event (30 second intervals) Works well in local context, where files are always accessed through the local cache, but in the remote case it doesn't offer necessary synchronization guarantees to clients. NFS v3 servers offers two strategies for updating the disk: write-through - altered pages are written to disk as soon as they are received at the server. When a write() RPC returns, the NFS client knows that the page is on the disk. delayed commit - pages are held only in the cache until a commit() call is received for the relevant file. This is the default mode used by NFS v3 clients. A commit() is issued by the client whenever a file is closed. 63 *

NFS optimization - client caching
Server caching does nothing to reduce RPC traffic between client and server further optimization is essential to reduce server load in large networks NFS client module caches the results of read, write, getattr, lookup and readdir operations synchronization of file contents (one-copy semantics) is not guaranteed when two or more clients are sharing the same file. Timestamp-based validity check reduces inconsistency, but doesn't eliminate it validity condition for cache entries at the client: (T - Tc < t) v (Tmclient = Tmserver) t is configurable (per file) but is typically set to 3 seconds for files and 30 secs. for directories it remains difficult to write distributed applications that share files with NFS t freshness guarantee Tc time when cache entry was last validated Tm time when block was last updated at server T current time 64 *

Other NFS optimizations
Sun RPC runs over UDP by default (can use TCP if required) Uses UNIX BSD Fast File System with 8-kbyte blocks reads() and writes() can be of any size (negotiated between client and server) the guaranteed freshness interval t is set adaptively for individual files to reduce gettattr() calls needed to update Tm file attribute information (including Tm) is piggybacked in replies to all file requests 65 *

NFS performance Early measurements (1987) established that:
write() operations are responsible for only 5% of server calls in typical UNIX environments hence write-through at server is acceptable lookup() accounts for 50% of operations -due to step-by-step pathname resolution necessitated by the naming and mounting semantics. More recent measurements (1993) show high performance: 1 x 450 MHz Pentium III: > 5000 server ops/sec, < 4 millisec. average latency 24 x 450 MHz IBM RS64: > 29,000 server ops/sec, < 4 millisec. average latency see for more recent measurements Provides a good solution for many environments including: large networks of UNIX and PC clients multiple web server installations sharing a single file store Hidden 66 *

NFS summary 1 An excellent example of a simple, robust, high-performance distributed service. Achievement of transparencies (See section 1.4.7): Access: Excellent; the API is the UNIX system call interface for both local and remote files. Location: Not guaranteed but normally achieved; naming of filesystems is controlled by client mount operations, but transparency can be ensured by an appropriate system configuration. Concurrency: Limited but adequate for most purposes; when read-write files are shared concurrently between clients, consistency is not perfect. Replication: Limited to read-only file systems; for writable files, the SUN Network Information Service (NIS) runs over NFS and is used to replicate essential system files, see Chapter 14. 67 cont'd *

NFS summary 2 Achievement of transparencies (continued): *
Failure: Limited but effective; service is suspended if a server fails. Recovery from failures is aided by the simple stateless design. Mobility: Hardly achieved; relocation of files is not possible, relocation of filesystems is possible, but requires updates to client configurations. Performance: Good; multiprocessor servers achieve very high performance, but for a single filesystem it's not possible to go beyond the throughput of a multiprocessor server. Scaling: Good; filesystems (file groups) may be subdivided and allocated to separate servers. Ultimately, the performance limit is determined by the load on the server holding the most heavily-used filesystem (file group). 68 *

Figure 8.11 Distribution of processes in the Andrew File System
Hidden 69 *

Figure 8.12 File name space seen by clients of AFS
Hidden 70 *

Figure 8.13 System call interception in AFS
Hidden 71 *

Figure 8.14 Implementation of file system calls in AFS
Hidden 72 *

Figure 8.15 The main components of the Vice service interface
Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified file. Create() -> fid Creates a new file and records a callback promise on it. Remove(fid) Deletes the specified file. SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. ReleaseLock(fid) Unlocks the specified file or directory. RemoveCallback(fid) Informs server that a Venus process has flushed a file from its cache. BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file. Hidden 73 *

Recent advances in file services
NFS enhancements WebNFS - NFS server implements a web-like service on a well-known port. Requests use a 'public file handle' and a pathname-capable variant of lookup(). Enables applications to access NFS servers directly, e.g. to read a portion of a large file. One-copy update semantics (Spritely NFS, NQNFS) - Include an open() operation and maintain tables of open files at servers, which are used to prevent multiple writers and to generate callbacks to clients notifying them of updates. Performance was improved by reduction in gettattr() traffic. Improvements in disk storage organisation RAID - improves performance and reliability by striping data redundantly across several disk drives Log-structured file storage - updated pages are stored contiguously in memory and committed to disk in large contiguous blocks (~ 1 Mbyte). File maps are modified whenever an update occurs. Garbage collection to recover disk space. RAID = redundant arrays of inexpensive disks 74 *

New design approaches 1 Distribute file data across several servers
Exploits high-speed networks (ATM, Gigabit Ethernet) Layered approach, lowest level is like a 'distributed virtual disk' Achieves scalability even for a single heavily-used file 'Serverless' architecture Exploits processing and disk resources in all available network nodes Service is distributed at the level of individual files Examples: xFS (section 8.5): Experimental implementation demonstrated a substantial performance gain over NFS and AFS Frangipani (section 8.5): Performance similar to local UNIX file access Tiger Video File System (see Chapter 15) Peer-to-peer systems: Napster, OceanStore (UCB), Farsite (MSR), Publius (AT&T research) - see web for documentation on these very recent systems 75 *

Replicated read-write files
New design approaches 2 Replicated read-write files High availability Disconnected working re-integration after disconnection is a major problem if conflicting updates have ocurred Examples: Bayou system (Section ) Coda system (Section ) 76 *

Summary Sun NFS is an excellent example of a distributed service designed to meet many important design requirements Effective client caching can produce file service performance equal to or better than local file systems Consistency versus update semantics versus fault tolerance remains an issue Most client and server failures can be masked Superior scalability can be achieved with whole-file serving (Andrew FS) or the distributed virtual disk approach Future requirements: support for mobile users, disconnected operation, automatic re-integration (Cf. Coda file system, Chapter 14) support for data streaming and quality of service (Cf. Tiger file system, Chapter 15) 77 *

Exercise A solution Write a simple C program to copy a file using the UNIX file system operations shown in Figure 8.4. #define BUFSIZE 1024 #define READ 0 #define FILEMODE 0644 void copyfile(char* oldfile, char* newfile) { char buf[BUFSIZE]; int i,n=1, fdold, fdnew; if((fdold = open(oldfile, READ))>=0) { fdnew = creat(newfile, FILEMODE); while (n>0) { n = read(fdold, buf, BUFSIZE); if(write(fdnew, buf, n) < 0) break; } close(fdold); close(fdnew); else printf("Copyfile: couldn't open file: %s \n", oldfile); main(int argc, char **argv) { copyfile(argv[1], argv[2]); 78 *

server operations for: copyfile("/usr/include/glob.h", "/foo")
fdold = open('/usr/include/glob.h", READ) Client module actions: FileId = Lookup(Root, "usr") - remote invocation FileId = Lookup(FileId, "include") - remote invocation FileId = Lookup(FileId, "glob.h") - remote invocation client module makes an entry in an open files table with file = FileId, mode = READ, and RWpointer = 0. It returns the table row number as the value for fdold Exercise B solution Show how each file operation of the program that you wrote in Class Exercise A would be executed using the operations of the Model File Service in Figures 8.6 and 8.7. if((fdold = open(oldfile, READ))>=0) { fdnew = creat(newfile, FILEMODE); while (n>0) { n = read(fdold, buf, BUFSIZE); if(write(fdnew, buf, n) < 0) break; } close(fdold); close(fdnew); fdnew = creat("/foo", FILEMODE) Client module actions: FileId = create() - remote invocation AddName(Root, "foo", FileId) - remote invocation SetAttributes(FileId, attributes) - remote invocation client module makes an entry in its openfiles table with file = FileId, mode = WRITE, and RWpointer = 0. It returns the table row number as the value for fdnew n = read(fdold, buf, BUFSIZE) Client module actions: Read(openfiles[fdold].file, openfiles[fdold].RWpointer, BUFSIZE) - remote invocation increment the RWpointer in the openfiles table by BUFSIZE and assign the resulting array of data to buf 79 *

CS 843 - Distributed Computing Systems Chapter 8: Distributed File Systems Chin-Chih Chang, chang@cs.twsu.edu From Coulouris, Dollimore and Kindberg Distributed.

Similar presentations

Presentation on theme: "CS 843 - Distributed Computing Systems Chapter 8: Distributed File Systems Chin-Chih Chang, chang@cs.twsu.edu From Coulouris, Dollimore and Kindberg Distributed."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 843 - Distributed Computing Systems Chapter 8: Distributed File Systems Chin-Chih Chang, chang@cs.twsu.edu From Coulouris, Dollimore and Kindberg Distributed.

Similar presentations

Presentation on theme: "CS 843 - Distributed Computing Systems Chapter 8: Distributed File Systems Chin-Chih Chang, chang@cs.twsu.edu From Coulouris, Dollimore and Kindberg Distributed."— Presentation transcript:

Similar presentations

About project

Feedback