Distributed Processing Systems (Distributed File System) 오 상 규 서강대학교 정보통신 대학원

Distributed Processing Systems (Distributed File System) 오 상 규 서강대학교 정보통신 대학원 Email : sgoh@macrmimpact.com Email : sgoh@macrmimpact.com

Distributed File System 서강대학교 정보통신 대학원 Page 2 Definitions File  an abstraction of permanent storage  a sequence of similar-sized data items (typically 8-bit bytes) Directory  a file, of a special type, that provides a mapping from text names to internal file identifiers. File system  responsible for the organization, storage, retrieval, naming, sharing and protection of files. File storage  implemented on magnetic disks and non-volatile storage media.

Distributed File System 서강대학교 정보통신 대학원 Page 3 Definitions (cont.) File System File Computer OS... Computer OS... File System Process Mgmt. Memory Mgmt. Storage Computer Program of Management Mechanism Digitization of File Directory or Folder

Distributed File System 서강대학교 정보통신 대학원 Page 4 Definitions (cont.) Unique File Identifiers (UFIDs):  file server creates a UFID for a file  directory service records the UFID with its name Access Control : combination of below two  Capability Approach: file can be accessed with a valid capability  Identity Based Approach: list users and their entitled services Mutable and Immutable files  Mutable means that there is only one stored version of a file. (SUN NFS, CFS,LOCUS)  Immutable means that the file cannot be modified once it has been created.

Distributed File System 서강대학교 정보통신 대학원 Page 5 Definition (cont.) Repeatable Idem-potent Operations  multiple execution have the same effect as a single execution. Stateless File Servers  no information has been stored about previous operations. Atomicity  if an operation terminates successfully, the new state is consistent and semantically correct.  if the operation fails, the file state will remain unchanged.

Distributed File System 서강대학교 정보통신 대학원 File System Taxonomy LEVEL 1 : One user performs computation via a single process as in IBM PC and Apple MAC. File system design issues include naming structure, application programming interface, mapping to physical storage media, and the integrity against failures : abstracts network interfaces and communication resource. : does the protocol processing. LEVEL 2 LEVEL 3 LEVEL 4 : A single user computing with multiple processes, OS/2 File system needs to address concurrency control issue. : Multiple users share data and resources. File system needs to specify and enforce security. : distributed file systems, multiple users who are physically dispersed in a network of autonomous computers and share one common file system.

Distributed File System 서강대학교 정보통신 대학원 Page 7 File System Modules Directory Service Directory model : relates file names to file IDs Access control module : checks permission for operation requested File Service File addressing module : uses file location map to relate file IDs to files File access module : uses file index to find file pages for reads or writes Block Service Block module : accesses and allocates disk blocks Device module : disk IO and buffering

Distributed File System 서강대학교 정보통신 대학원 Page 8 Distributed file service requirements Access transparency : Client programs should be unaware of the distr5ibution of files. Location transparency : Client programs should see a uniform file name space. Concurrency transparency : Changes to a file by one client should not interfere with the operation of other clients simultaneously accessing or changing the same file. Failure transparency : the correct operation of servers after the failure of a client and the correct operation of client programs in the face of lost messages.

Distributed File System 서강대학교 정보통신 대학원 Page 9 Distributed file service requirements (cont.) Performance transparency : Client programs should continue to perform satisfactorily while the load on the service varies within a specified a range. Hardware and operation system heterogeneity : The service interface should be defined so that client and server S/W can be implemented for different operating systems and computers. Scalability : The service can be extended by incremental growth to deal with a wide range of loads and network sizes. Replication transparency : A file may be represented by several copies of its contents at different locations.

Distributed File System 서강대학교 정보통신 대학원 Page 10 Distributed file service requirements (cont.) Migration transparency : Neither client programs nor system administration tables in client nodes need to be changed when files moved. Support for fine-grained distribution of data Tolerance to network partitioning and detached operation

Distributed File System 서강대학교 정보통신 대학원 Page 11 File Service Components Flat File Service  concerned with implementing operations on the contents of files.  Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat file service operations. Directory Service  provides a mapping between text names for files and their UFIDs.  provides the functions needed to generate and update directories and to obtain UFIDs from directories.  a client of the flat file service.

Distributed File System 서강대학교 정보통신 대학원 Page 12 File Service Components (cont.) Client Module  an extension of the user package  runs in each client computer, integrating and extending the operations of the flat file service and the directory service under a single API  holds information about the network locations of the flat fileserver and directory server processes.  play an important role in achieving satisfactory performance through the implementation of a cache.

Distributed File System 서강대학교 정보통신 대학원 Page 13 File Service Components (cont.) NETWORK File Service RPC Interface User Program User Program File Service Directory Service File Service RPC Interface Application Programming Interface Client Module User Program User Program User Program User Program User Program User Program

Distributed File System 서강대학교 정보통신 대학원 Page 14 Design Issues Flat file service  offer a simple, general purpose set of operations. Fault tolerance  the service continue to operate in the face of the client and server failure.  The RPC interfaces can be designed in terms of idempotent operations ensuring that duplicated requests do not result in invalid updates to files.  the servers can be stateless.

Distributed File System 서강대학교 정보통신 대학원 Page 15 Design Issues (cont.) Directory Service  The separation of the directory service from the file service enables a variety of directory services to be designed and offered for use with a single file service. Client Module  hides low level constructs such as the UFIDs used in the RPC interfaces of the flat file service and the directory service from the user level programs.

Distributed File System 서강대학교 정보통신 대학원 Page 16 Attribute record structure Read timestamp Creation timestamp File Length Attribute timestamp Write timestamp Owner Reference Count Access control list File type Maintained by Flat file service Maintained by Directory service

Distributed File System 서강대학교 정보통신 대학원 Page 17 Mechanisms for Building DFS Mounting  allow the building of different file name spaces to form a single hierarchical name space.  mount table in the kernel maps mount points to storage devices. Caching ( file caching )  exploit the temporal locality of reference  data can be either cached in the main memory or on local disk of clients  data can also be cached at servers to reduce access latency.

Distributed File System 서강대학교 정보통신 대학원 Mechanisms for Building DFS (cont.) a b c d e f g h i j k Mounting Point Server X Server Y Server Z Name Space Hierarchy

Distributed File System 서강대학교 정보통신 대학원 Page 19 Mechanisms for Building DFS (cont.) Cache Consistency  Server initiated approach : servers inform cache managers whenever the client cache data becomes stale.  Client initiated approach : client cache mangers validate data with server before returning it to clients.  No file caching : during concurrent-write sharing  Sequential write sharing : a client opened a file that has already modified and closed by another client. Timestamps are used to handle this problem.

Distributed File System 서강대학교 정보통신 대학원 Page 20 Mechanisms for Building DFS (cont.) Replication  how to keep replicas update  how to detect inconsistencies. Scalability  suitability to handle system expansion Semantics  read operation will return the value of the latest write operation

Distributed File System 서강대학교 정보통신 대학원 Page 21 Mechanisms for Building DFS (cont.) Location Transparency  Files are named and accessed independently of their locations and from where they are called Security  Authentication and Access Control

Distributed File System 서강대학교 정보통신 대학원 Page 22 Flat file service operations Read(file, i, n)  (Data) –-- REPORTS(BadPosition) : If 1  i  Length(File) : Reads a sequence of up to n items in File starting at item i and returns it in Data if i  Length(File) : Returns the empty sequence, reports an error. Write(File, i, Data) –-- REPORTS(BadPosition) : If 1  i  Length(File) + 1 : Writes a sequence of Data to File, Starting at item i, extending the file if necessary. If i  Length(File) + 1 : null operation, reports an error. Create( )  File : Creates a new file of length 0 and delivers a UFID for it.

Distributed File System 서강대학교 정보통신 대학원 Page 23 Flat file service operations (cont.) Truncate(File,l) : If l  Length(File) : shortens the file to length l ; else does nothing. Delete(File) : Removes the file from the files store. GetAttributes(File)  Attr : Returns the file attributes for the file. SetAttributes(File, Attr) : Sets the file attributes (Only those attributes that are not shaded in Slide 20p.)

Distributed File System 서강대학교 정보통신 대학원 Page 24 Flat file service Interface definition DEFINITION MODULE Files; EXPORT QUALIFIED Read, Write, Length, Turncate, Create,Delete, ErrorType, Sequence, Seqptr, MAX, UFID, ErrorReport; CONST MAX = 2048 TYPE Sequence = Record l : CARDINAL; s : ARRAY[1..MAX] OF CHAR; END; VAR ErrorReport: ErrorType; DEFINITION MODULE Files; EXPORT QUALIFIED Read, Write, Length, Turncate, Create,Delete, ErrorType, Sequence, Seqptr, MAX, UFID, ErrorReport; CONST MAX = 2048 TYPE Sequence = Record l : CARDINAL; s : ARRAY[1..MAX] OF CHAR; END; VAR ErrorReport: ErrorType;

Distributed File System 서강대학교 정보통신 대학원 Page 25 Flat file service Interface definition (cont.) PROCEDURE Read(File:UFID; i, n : CARDINAL) : seqptr; PROCEDURE Write(File:UFID; I : CARDINAL; Data : Seqptr); PROCEDURE Length(File : UFID) : CARDINAL; (* Implemented in terms of GetAttributes *) PROCEDURE Turncate(File : UFID; l :CARDINAL); PROCEDURE Create() : UFID; PROCEDURE Delete(File : UFID); END Files. PROCEDURE Read(File:UFID; i, n : CARDINAL) : seqptr; PROCEDURE Write(File:UFID; I : CARDINAL; Data : Seqptr); PROCEDURE Length(File : UFID) : CARDINAL; (* Implemented in terms of GetAttributes *) PROCEDURE Turncate(File : UFID; l :CARDINAL); PROCEDURE Create() : UFID; PROCEDURE Delete(File : UFID); END Files.

Distributed File System 서강대학교 정보통신 대학원 Page 26 CopyFile using flat file operations MODULE CopyFile; FROM InOut IMPORT WriteString, WriteLn; FROM Files IMPORT Read, Write, Length, Turncate, UFID, ErrorType, MAX, ErrorReport; PROCEDURE CopuFile(File1, File2 : UFID); VAR i,l:CARDINAL; BEGIN l := Length(File1); Turncate(File2,l); MODULE CopyFile; FROM InOut IMPORT WriteString, WriteLn; FROM Files IMPORT Read, Write, Length, Turncate, UFID, ErrorType, MAX, ErrorReport; PROCEDURE CopuFile(File1, File2 : UFID); VAR i,l:CARDINAL; BEGIN l := Length(File1); Turncate(File2,l);

Distributed File System 서강대학교 정보통신 대학원 Page 27 CopyFile using flat file operations (cont.) FOR i := 1 TO l BY MAX DO Write(File2, I, Read(File1, I, MAX)); END; IF ErrorReport != NONE THEN WriteString(“CopyFile failed”); WriteLn; END; END CopyFile; END CopyFile. FOR i := 1 TO l BY MAX DO Write(File2, I, Read(File1, I, MAX)); END; IF ErrorReport != NONE THEN WriteString(“CopyFile failed”); WriteLn; END; END CopyFile; END CopyFile.

Distributed File System 서강대학교 정보통신 대학원 Page 28 Directory service operations (1) Lookup(Dir, Name, AccessMode, UserID)  (File) ---- REPORTS(NotFound, NoAccess) : Locates the next name in the directory and returns the relevant UFID ; reports an error if it cannot be found or if the client making the request is not authorized to access the file in the manner specified By AccessMode. AddName(Dir, Name, File, UserID) –-- REPORTS(NameDuplicate) : If Name is not in the directory : Adds the (Name, File) pair to the directory and updates the attribute record accordingly. If Name is already in the directory : reports an error.

Distributed File System 서강대학교 정보통신 대학원 Page 29 Directory service operations (2) UnName(Dir, Name) --– REPORTS(Not Found) : If Name is in the directory : The entry containing Name is removed from the directory. If Name is not in the directory : reports an error ReName(Dir, OldName, NewName) –-- REPORTS(NotFound) : If Name is in the directory : The entry containing Name gets the new name. If Name is not in the directory: reports an error. GetName(Dir, Pattern)  NameSeq : Returns all of the text names in the directory that match the regular expression given by Pattern.

Distributed File System 서강대학교 정보통신 대학원 Page 30 Implementation techniques (1) File Group  a collection of files mounted on a server computer.  file groups support the allocation of files to file servers in larger logical units and enable the service to be implemented with files stored on several servers.  In a file system that supports file groups, the representation of UFIDs includes a file group identifier component. Internet address date file group identifier 32 bits 16bits

Distributed File System 서강대학교 정보통신 대학원 Page 31 Implementation techniques (2) Space leaks  a disk space leak occurs whenever the program responsible for creating a file terminates without having entered the UFID of the file into any directory and without deleting the file.  Thus the client module should include composite operation : CreateFile(Name, Dir) : The operation for creating a new file takes the text name to be assigned to the new file and Dir - the UFID of a directory into which the file is to be entered. It creates a new file and Name and the UFID of the new file to Dir.

Distributed File System 서강대학교 정보통신 대학원 Page 32 Implementation techniques (3) Capabilities and access control  A capability is a ‘digital-key’ - a large integer selected in a manner that makes it difficult to counterfeit.  The directory service requires client to states their identity using an access control list. Construction of UFIDs  The flat file service must generate UFIDs in a manner that not only ensures uniqueness but makes them difficult to counterfeit. File Group ID File number 48 bits 32 bits 32 bits Random number

Distributed File System 서강대학교 정보통신 대학원 Page 33 Implementation techniques (4) Access modes  Access control to files is based upon tha fact that a UFID constitutes a ‘key’ or capability to access a file.  This can be achieved by extending the UFIDs to include a permission filed. 48 bits 32 bits 32 bits 5bits File Group ID File number Random number Read Write / Truncate Delete GetAttributes SetAttributes

Distributed File System 서강대학교 정보통신 대학원 Page 34 Implementation techniques (5) Encryption of the permission field  To avoid attempts to penetrate the security of the file service, the permission field and the random number are encrypted to produce a single 37-bit number.  unencrypted permission field is also included so that client and server programs can determine by examination what permissions are included in a UFID. 48 bits 32 bits 37 bits 5bits File Group ID File number Encrypted permission bits + Random number Unencrypted permission bits

Distributed File System 서강대학교 정보통신 대학원 Page 35 Implementation techniques (6) File representation Block Index Unused Attribute record Unused Page 1 Page 2 Page 3 Page 4

Distributed File System 서강대학교 정보통신 대학원 Page 36 *On disk Inode B Sinode listdata blocks direct indirect

Distributed File System 서강대학교 정보통신 대학원 Page 37 Implementation techniques (7) File location  The flat file service must translate UFIDs to file server locations and file address.  Implementation : 1st step - identify the server that holds the required file group. (This is done by the client module.) 2nd step - locate the required file’s block index. (This is done by the server that holds the file.) Group location  A group location database, giving the current locations of all accessible file groups in the form of pairs is replicated in each participating server.

Distributed File System 서강대학교 정보통신 대학원 Page 38 Implementation techniques (8) File addressing  When a server receives a flat file service request, it uses the file group identifier and the file number to locate the required file’s block index.  B-trees are effective method for structuring a set of data for searching. Server cache  avoid repeated access to disk storage for the same block.  write-through cache operation should be used. Client Cache  the client cache also uses write-through.

Distributed File System 서강대학교 정보통신 대학원 Page 39 CASE STUDY : The Sun Network File System

Distributed File System 서강대학교 정보통신 대학원 Page 40 Sun Microsystem’s Network File System the first file service that was designed as a product.(1985) To encourage its adoption as a standard, the definitions of key interfaces were placed in the public domain. [ Sun 1989 ] provides a working solution to many requirements for distributed file access, but it does not address some issues ( replication transparency, concurrency transparency, scalability) whose importance is likely to grow as the size and range of applications for distributed systems increase.

Distributed File System 서강대학교 정보통신 대학원 Page 41 *Design Goal of NFS (1) to achieve a high level of support for hardware and operating system heterogeneity. Access transparency : provides an API to local processes that is identical to the local operating system’s interface. Location transparency : Each client establishes a file name space by adding remote file systems to its local name space. Failure transparency : the stateless and idempotent nature of the NFS file access protocol ensures that the failure modes observed by clients when accessing remote files are similar to those for local file access.

Distributed File System 서강대학교 정보통신 대학원 Page 42 *Design Goal of NFS (2) Performance transparency : Both the client and the server employ caching to achieve satisfactory performance. Migration transparency : File systems may be moved between servers, but the remote mount tables in each client must then be separately updated. So migration transparency is not fully achieved by NFS.

Distributed File System 서강대학교 정보통신 대학원 Page 43 *Requirement not addressed by NFS Replication transparency : NFS does not support file replication. The Sun Network Information Service(NIS) is a separate service available for use with NFS that supports the replication of simple databases. Concurrency transparency : NFS does not aim to improve upon the UNIX approach to the control of concurrent updates to files. Scalability : NFS was originally designed to allow each server to support approximately 5-10 clients.

Distributed File System 서강대학교 정보통신 대학원 Page 44 #Sun NFS features NFS is a distributed file system that provides transparent, remote access to file systems on UNIX and other systems. NFS uses an External Data Representation (XDR) It is implemented on top of an RPC package NFS uses UDP and IP as its network protocol Client machines mount file systems located on servers so they can be accessed as if they were local.

Distributed File System 서강대학교 정보통신 대학원 Page 45 Remote mounting on an NFS client Server 1 / (root) export people big jon bob …. Client / (root) …. vmunix usr student x staff Server 2 / (root) nfs users jim ann jane joe Remotemount Remotemount The file system mounted at /usr/students in the client is actually the sub tree located at /export/people in Server1 ; the file system mounted at /usr/staff in the client is actually the sub tree located at /nfs/usrs in Server2.

Distributed File System 서강대학교 정보통신 대학원 Page 46 #VFS(Virtual File System) to allow different file system types to be mounted on a single machine. It separates file system operations from implementation. It dynamically selects the appropriate file system based on what file or directory needs to be accessed. The VFS interface to any underlying file systems is through Virtual Node Interface (Vnode). Vnodes are data structures that uniquely identify files similar to inodes in Unix.

Distributed File System 서강대학교 정보통신 대학원 Page 47 #VFS mounting

Distributed File System 서강대학교 정보통신 대학원 Page 48 #Client side Vnode Interface File system Operations mount(varies) : system call to mount file system. mount_root( ) : mount file system as root. VFS Operations unmount(vfs) : Unmount file system. sync(vfs) : Flush delayed write blocks. Vnode operations open(VP,flags) : Mark file open. rdwr(vp,uio,rwflag, flags) : read or write a file. mkdir(dvp, name) : create a directory.

Distributed File System 서강대학교 정보통신 대학원 Page 49 #Sun NFS Stateless Protocol Stateless Protocol  ensure robustness when clients or servers or network experience failures. if client fails, the server does not need to take any action. if a server fails, the client retransmits its request until it receives a response. Disadvantages of Stateless Protocols  a server may receive multiple copies of the same request.  server must save any modified data to a stable storage before completing a client request.

Distributed File System 서강대학교 정보통신 대학원 Page 50 NFS software architecture Client Computer USER-LEVEL CLIENT PROCESS USER-LEVEL CLIENT PROCESS UNIX KERNEL NFS PROTOCOL UNIX FILE SYSTEM NFS CLIENT VIRTUAL FILE SYSTEM LocalRemote System calls Server Computer UNIX KERNEL NFS SERVER UNIX FILE SYSTEM VIRTUAL FILE SYSTEM NETWORK Process using NFS

Distributed File System 서강대학교 정보통신 대학원 Page 51 NFS Server operations (RPC interface) lookup(dirfh, name)  fh,attr : Returns a file handle and attributes for the file name in the directory dirfh. create(dirfh, name, attr)  newfh, attr : Create a new file name in directory dirfh with attributes attr and returns the new file handle and attributes. remove(dirfh, name)  status : Removes file name from directory dirfh getattr(fh)  attr : Returns file attributes of file fh.(Similar to the UNIX stat system call.)

Distributed File System 서강대학교 정보통신 대학원 Page 52 NFS Server operations (2) setattr(fh, attr)  attr : Sets the attributes (mode, user id, group id, size, access time and modify time of a file). Setting the size to 0 truncates the file. read(fh, offset, count)  attr, data : Returns up to count bytes of data from a file starting at offset. Also returns the latest attributes of the file. write(fh,offset, count, data)  attr : Writes count bytes of data to a file starting at offset. Returns the attributes of the file after the write has taken place.

Distributed File System 서강대학교 정보통신 대학원 Page 53 NFS Server operations (3) rename(dirfh, offset, todirfh, toname)  status : Changes the name of file name in directory dirfh to toname in directory todirfh. link(newdirfh, newname, dirfh, name)  status : Creates an entry newname in the dirctory newdirfh which refers to file name in the directory dirfh. symlink(newdirfh, newname, string)  status : Creates an entry newname in the directory newdirfh of type symbolic link with the value string. The server does not interpret the string, but makes a symbolic link file to hold it.

Distributed File System 서강대학교 정보통신 대학원 Page 54 NFS Server operations (4) readlink(fh)  string : Returns the string that is associated with the symbolic link file identified by fh. mkdir(dirfh, name, attr)  newfh, attr : Creates a new directory name with attributes attr and returns the new file handle and attributes. rmdir(dirfh, name)  status : Removes the directory empty name from the parent directory dirfh. Fails if the directory is not empty.

Distributed File System 서강대학교 정보통신 대학원 Page 55 NFS Server operations (5) readdir(dirfh, cookie, count)  entries : Returns up to count bytes of directory entries from the directory dirfh, Each entry contains a file name, file id, and an opaque pointer to the next directory entry, called cookie. The cookie is used in subsequent readdir calls to start reading from the subsequent entry. A readdir with a 0 value for the cookie reads from the first entry in the directory. statfs(fh)  fsstats : Returns file system information (such as block size, number of free blocks, and so on) for the file system containing a file fh.

Distributed File System 서강대학교 정보통신 대학원 Page 56 *Implementation (1) The NFS client and server modules communicate using remote procedure call. ( Sun’s RPC system was developed for use in NFS.) Because the file and directory operations are integrated in a single service, the space leek problem cannot arise. Virtual file system  VFS module has been added to the UNIX kernel.  Role distinguish between local and remote files. translate between the UNIX-independent file identifiers used by NFS and the internal file identifiers used in UNIX and other file systems.

Distributed File System 서강대학교 정보통신 대학원 Page 57 Virtual file system (cont.)  File handle file system identifier : a unique number that is allocated to each file system when it is created. i-node generation number : incremented each time the i-node number is reused. ( needed because in the UNIX file system i-node numbers are reused after a file is removed.) *Implementation (2) File system Identifier i-node number of file i-node number of file i-node generation number i-node generation number

Distributed File System 서강대학교 정보통신 대학원 Page 58 Virtual file system (cont.)  the virtual file system layer one VFS structure for each mounted file system. one v-node per open file. The v-node contains an indicator to show whether a file is local or remote. *Implementation (3)

Distributed File System 서강대학교 정보통신 대학원 Page 59 *Implementation (4) Client integration  emulates the semantics of the standard UNIX file system primitives.  integrated with the UNIX kernel. user programs can access files via UNIX system calls without recompilation or reloading. a single client module serves all user level processes, with a shared cache of recently -used block. The encryption key used to protect User Ids passed to the server can be retained in the kernel.  cooperates with the virtual file system in each client machine.

Distributed File System 서강대학교 정보통신 대학원 Page 60 *Implementation (5) Server integration  integrated with the UNIX kernel mainly for performance reasons. user level NFS server achieved approximately 80% of the performance of the kernel version. Access control and authentication  Since the NFS server is stateless, it does not keep files open on behalf of its client. the server must check the user’s identity against the file’s access permission attributes a fresh on each request.  use the DES encryption of the user’s authentication information in the RPC protocol. (NFS 4.0)

Distributed File System 서강대학교 정보통신 대학원 Page 61 *Implementation (6) Path name translation  path name parsing and their translation is controlled by the client. Each part of a name that refers to a remote mounted directory is translated to a file handle using a separate lookup request to the remote server. Mount service  The mounting of remote file is supported by a separate mount service process that runs at user level on each NFS server computer.  Client use a modified version of the UNIX mount command, specifying the remote host name, path name and the local name.

Distributed File System 서강대학교 정보통신 대학원 Page 62 *Implementation (7) Mount service (cont.)  hard mounting when a user-level process accesses a file in a file system, the process is suspended until the request can be completed. In the case of server failure, user-level processes are suspended until the server restarts.  soft mounting In the case of server failure, the NFS client module returns a failure indication to user-level processes after a small number of retries.

Distributed File System 서강대학교 정보통신 대학원 Page 63 *Implementation (8) Mount service (cont.)  mount request performed as a part of system initialization process in the client. ( by editing the UNIX startup script(/etc/rc) ). an individual user can change the configuration using mount. Automounter  dynamically mount a file system whenever an ‘empty’ mount point is referenced by a client.  runs as a user level UNIX process in each client.  maintains a table of mount points(path name).

Distributed File System 서강대학교 정보통신 대학원 Page 64 *Implementation (9) Automounter (cont.)  Automounter behaves like a local NFS server at the client machine.  read-only replication can be achieved by listing several servers containing identical file systems against a name in the Automounter table. useful for heavily-used file systems that change infrequently.

Distributed File System 서강대학교 정보통신 대학원 Page 65 #Implementation (10) The new autofs Automounter

Distributed File System 서강대학교 정보통신 대학원 Page 66 *Implementation (11) Server caching  conventional UNIX system read-ahead protocol : anticipates read accesses and fetches the pages following those that have most recently been read. delayed-write protocol : when a page has been altered, its new contents are written to disk only when the the buffer page is required for another page.  NFS server write-through protocol : write each modification to disk immediately because a failure of the server might otherwise result in the undetected loss of data by clients.

Distributed File System 서강대학교 정보통신 대학원 Page 67 *Implementation (12) Client caching  The NFS client module caches the results of read, write, getattr, lookup and readdir operations in order to reduce the number of requests transmitted to servers.  A timestamp-based method is used to validate cached blocks.  the validation check is performed whenever a file is opened and whenever the server is contacted to fetch a new block from a file.  when a cached page is modified it is marked as dirty and is scheduled to be flushed to the server asynchronously.  Since NFS clients cannot determine whether a file is shared or not, the validation procedure must be used for all file accesses.

Distributed File System 서강대학교 정보통신 대학원 Page 68 *Implementation (13) Client caching (cont.)  asynchronous reads and writes are achieved by the inclusion of one or more bio-daemon processes at each client. Bio : block input-output ; daemon : user level processes that perform system task.  Bio-daemon processes enhance the performance and reduce chances of inconsistency between caches at different client.

Distributed File System 서강대학교 정보통신 대학원 Page 69 #Implementation (14) Using Local Disk Caching with NFS

Distributed File System 서강대학교 정보통신 대학원 Page 70 *Implementation (15) Other optimizations  The Sun file system based on the UNIX BSD 4.2 Fast File System uses 8 Kbyte disk blocks : fewer file system calls for sequential file access.  The UDP packet extended to 9 Kbytes : an entire block as an argument be transferred in a single packet. Performance  The relatively poor write performance has been addressed by the use of battery-backed non-volatile RAM in the server’s disk controller.

Distributed File System 서강대학교 정보통신 대학원 Page 71 #NFS Operation NFS service is provided by a number of daemons  nfsd : NFS server daemon that handles client file system requests.  a number of nfsds might be running concurrently. NFS is a Request/Reply Protocol  client issues a request to access a remote file.  kernel interprets the request and forwards to the appropriate VFS routines to forward it to the client agent.  client agent prepare an RPC, it assigns a transaction ID, encode it using XDR, and transmits the request to server.

Distributed File System 서강대학교 정보통신 대학원 Page 72 #NFS Operation (1) NFS Server Side Operation  for each incoming request, IP and UDP protocol processing takes place.  the request arguments and RPC header are decoded according to XDR.  one of the nfsd is selected to execute the request.  after completion, a reply is prepared and sent to client.  NFS daemon returns to its idle state.

Distributed File System 서강대학교 정보통신 대학원 Page 73 #NFS Operation (2) Other Important Daemons  protmap converts RPC program numbers into protocol port numbers, keeps a list of available RPC servers and their ports, and the program numbers they are serving.  mountd handles file system mount requests and determines which file systems are available to which machines and users.  biod are asynchronous block I/O daemons that run on the client and perform read-ahead and write-behind from client buffer cache.

Distributed File System 서강대학교 정보통신 대학원 Page 74 #NFS Operation (3) Communication Protocol Daemons : to run TCP/UDP and IP protocol functions.  inetd : listens for connections on internet addresses for certain services, invokes service specific server daemons when a connection is found.  routed : manages network routing tables.

Distributed File System 서강대학교 정보통신 대학원 Page 75 #NFS Performance two buffer caches are used at client side : to reduce the number of remote requests that go to the servers  one for data  second for file attributes Support of block read-ahead at both server and client sides  I/O request is issued for first block.  Second I/O request for remaining data blocks while the first one being processed. Write-behind  flush critical information to stable storage.  NFS uses synchronous writes to save modified data to stable storage.

Distributed File System 서강대학교 정보통신 대학원 Page 76 CASE STUDY : The Andrew File System

Distributed File System 서강대학교 정보통신 대학원 Page 77 *AFS : Andrew File System (1) Andrew : a distributed computing environment developed at Carnegie Mellon University for use a as campus computing and information system. Andrew File System : a file service designed to provide an information sharing mechanism to its user.  The main goals were to build a scalable and secure distributed file system.  It is based on client/server model and its initial goals were to support at least 7000 workstations on a campus-wide network.  AFS was extended in CODA project to develop a highly-available distributed file system.

Distributed File System 서강대학교 정보통신 대학원 Page 78 *AFS : Andrew File System (2) AFS is implemented on a network of workstations and servers running BSD 4.3 UNIX or the Mach operating system  AFS is compatible with NFS. AFS is designed to perform well with larger numbers of active users. (scalability)

Distributed File System 서강대학교 정보통신 대학원 Page 79 *Design characteristics for Scalability Whole-file serving  The entire contents of files are transmitted to client computers by AFS servers. Whole-file caching  Once a copy of a file has been transferred to a client computer it is stored in a cache on the local disk.  The cache is permanent, surviving reboots of the client computer.

Distributed File System 서강대학교 정보통신 대학원 Page 80 *The operation scenario of AFS A user process in a client computer issues an open system call for a file in the shared file space. The server holding the file is located and is sent a request for a copy of the file. The copy is stored in the local UNIX file system in the client computer, the copy is then opened Subsequent read, write and other operations on the file by processes in the client computer are applied to the local copy. When the process in the client issues a close system call, if the local copy has been updated its contents are sent back to the server. The server updates the file contents and the timestamps on the file.

Distributed File System 서강대학교 정보통신 대학원 Page 81 *Assumption for Design strategy locally-cached copies are likely to remain valid for long periods. ( infrequently updated shared files and a single user access) The local cache can be allocated a substantial proportion of the disk space on each workstation. Files are small ; most are less than 10 kilobytes in size Read operations on files are much more common than writes. Sequential access is common and files are referenced in bursts. Most files are read and written by only one user. explicitly excluded the provision of storage facilities for databases.

Distributed File System 서강대학교 정보통신 대학원 Page 82 Distribution of processes in the AFS NETWORK USER PROGRAM VENUS UNIX KERNEL USER PROGRAM VENUS UNIX KERNEL USER PROGRAM VENUS UNIX KERNEL WORKSTATIONS VICE UNIX KERNEL VICE UNIX KERNEL SERVERS

Distributed File System 서강대학교 정보통신 대학원 Page 83 *AFS Implementation (1) two software components  Vice : the server S/W that runs as a user-level UNIX process in each server. ( the information sharing backbone and it consists of a collection of dedicated file servers.)  Venus : a user-level process that runs in each client computer. ( finds file in Vice, caches them locally, and performs shared file access.) user process files  Local files : handled as normal UNIX files.  Shared files : stored on servers and copies of them are cached on the local disks of workstations.

Distributed File System 서강대학교 정보통신 대학원 Page 84 System call interception in the AFS USER PROGRAM USER PROGRAM VENUS UNIX KERNEL UNIX FILE SYSTEM WORKSTATION UNIX file system calls Non local file operations Local Disk

Distributed File System 서강대학교 정보통신 대학원 Page 85 *AFS Implementation (2) The UNIX kernel in each workstation and server is a modified version of BSD 4.3 UNIX.  to intercept open, close and some other file system calls. One of the file partitions on the local disk of each workstation is used as a cache, holding the cached copies of files from the shared space.

Distributed File System 서강대학교 정보통신 대학원 Page 86 File name space seen by clients of the AFS Local / (root) tmp bin …. vmunix Shared cmu Symbolic Link bin

Distributed File System 서강대학교 정보통신 대학원 Page 87 *AFS file service features Files are grouped into volumes for ease of location and movement. A flat file service : implemented by the Vice servers. the hierarchic directory structure : implemented by the set of Venus. Each file and directory in the shared file space is identified by a unique, 96-bit file identifier (fid). The Venus processes translate the pathnames issued by clients to fids. fids are used only for internal communication between AFS modules (Venus and Vice processes ).

Distributed File System 서강대학교 정보통신 대학원 Page 88 *AFS file identifier 32 bits 32 bits 32 bits Volume Number File handle Uniquifier the volume containing the file identifying the file within the volume ensure that file identifiers are not reused

Distributed File System 서강대학교 정보통신 대학원 Page 89 *Cache coherence Callback-based mechanism  callback : a remote procedure call from a server to a Venus process.  callback promises ( have two states: valid or cancelled ) : a token issued by the Vice server that is the custodian of the file, guaranteeing that it will notify the Venus process when any other client modifies the file. The goal of Callback-based cache coherence mechanism  to achieve the best approximation to one-copy file semantics that is practicable without serious Performance degradation.

Distributed File System 서강대학교 정보통신 대학원 Page 90 Implementation of file system calls in AFS User processUNIX kernelVenusNetVice Open (FileName, mode) If FileName refers to a file in shared file space, pass the request to Venus Open the local file and return the file descriptor to the application. Check list of files in local cache. If no present or there is no valid callback promise, send a request for the file to the Vice server that is custodian of the volume containing the file. Place the copy of the file in the local file system, enter its local name in the local cache list and return the local name to UNIX. Transfer a copy of the file and callback promise to the workstation.Log the callback promise Read (FileDescriptor, Buffer, length) Perform a normal UNIX read operation on the local copy Write (FileDescriptor, Buffer,length) Perform a normal UNIXwrite operation on the local copy Close (FileDesciptor) Close the local copy and notify Venus that the file has been closed If the local copy has been changed, send a copy to the Vice server that is the custodian of the file Replace the file contents and send a callback to all other clients holding callback promise on the file

Distributed File System 서강대학교 정보통신 대학원 Page 91 The Vice service Interface (1) Fetch(fid)  attr, data : Returns the attributes(status) and, optionally, the contents of file identified by the fid and records a callback promise on it. Store(fid, attr, data) : Updates the attributes and (optionally) the contents of a specified file. Create( )  fid : Creates a new file and records a callback promise on it. Remove(fid) : Deletes the specified file.

Distributed File System 서강대학교 정보통신 대학원 Page 92 The Vice service Interface (2) RemoveCallBack(fid) : Informs server that a Venus process has flushed a file form its cache. BreakCallBack(fid) : This call is made by a Vice Server to a Venus process. It cancels the callback promise on the relevant file. SelLock(fid, mode) : Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes. ReleaseLock(fid) : Unlocks the specified file or directory.

Distributed File System 서강대학교 정보통신 대학원 Page 93 *Update Semantics (1) Update semantics ( a client : C / a file : F/ a server : S ) after a successful open : latest ( F, S ) after a failed open : failure ( S ) after a successful close : updated ( F, S ) after a failed close : failure ( S )  latest ( F, S ) : denotes a guarantee that the current value of F at C is the same as the value at S.  failure ( S ) : denotes that the open or close operation has not been performed at S.  updated ( F, S ) : denotes that C's value of F has been successfully propagated to S.

Distributed File System 서강대학교 정보통신 대학원 Page 94 *Update Semantics (2) the currency guarantee for open after a successful open : latest ( F, S, 0 ) or ( lostCallback ( S, T ) and inCache ( F ) and latest ( F, S, T ) )  latest ( F, S, T ) : denotes that the copy of F seen by the client is no more than T seconds out of date.  lostCallback ( S, T ) : denotes that a callback message from S to C has been lost at some time during the last T seconds.  inCache ( F ) : the file F was in the cache at C before the open operation was attempted.

Distributed File System 서강대학교 정보통신 대학원 Page 95 *Update Semantics (3) If clients in different workstations open, Write and close the same file concurrently, all but the update resulting from the last close will be silently lost.  Client must implement concurrency control independently. When two client processes in the same workstation open a file, they share the same cached copy and updates are performed in the normal UNIX fashion-block-by-block

Distributed File System 서강대학교 정보통신 대학원 Page 96 #AFS Scalability (1) The scalability of the system is achieved by reducing static binding to the minimum, and by maximizing the number of active clients that can be supported by a server. AFS cache manager intercepts requests for remotely stored files and either obtains the requested data from the cache, or requests the appropriate chunk from the appropriate file server All machines using AFS refer to any file using a common name, in AFS3.0, one can use pathname /afs/athena.mit.edu/user/a/xyz. In AFS4.0, both DIGITAL's DNS and X.500 are used to navigate through the top-most directories of the name space.

Distributed File System 서강대학교 정보통신 대학원 Page 97 #AFS Scalability (2) the key strategy for achieving scalability  Whole file serving : the entire contents of files are transmitted to client computers by AFS servers.  Whole file caching : Once a copy of a file has been transferred to a client computer it is stored in a cache on the local disk. The cache contains several hundred of the files most recently used on that computer. Local copies of files are used to satisfy clients’ open requests in preference to remote copies whenever possible.

Distributed File System 서강대학교 정보통신 대학원 Page 98 #AFS Security (1) Security in AFS depends on the integrity of a small number of VICE servers. No user software will ever run on VICE servers and Andrew assumes that the hardware and software on workstations may be modified in arbitrary manner. Protection Domain  It is composed of users and groups. A user can authenticate itself to the system, be held responsible for its actions, and be charged for resource consumption.  A group of other groups and users, associated with a user, called its owner.

Distributed File System 서강대학교 정보통신 대학원 Page 99 #AFS Security (2) In AFS2, cache coherence was achieved based on Callback ; the server promises to notify workstations chasing a file before allowing modification.  Callback made it feasible for clients to cache directories and to translate path names locally.  AFS2 used a single process to service all clients ; non-preemptive lightweight processes supported concurrency and convenient programming abstraction at clients and servers.  Volumes, collection of files, are used in disk storage allocation.  Read-only replication of volumes to increase availability for frequently read, but rarely updated files, such as system programs.

Distributed File System 서강대학교 정보통신 대학원 Page 100 *Other aspects (1) UNIX KERNEL MODIFICATIONS  The UNIX kernel in AFS hosts is altered so that Vice can perform file operations in terms of file handles instead of the conventional UNIX file descriptors. LOCATION DATABASE  Each server contains a copy of a fully replicated location database giving a mapping volume names to servers. THREADS  The implementations of Vice and Venus make use of a non-preemptive threads package to enable requests to be processed concurrently.

Distributed File System 서강대학교 정보통신 대학원 Page 101 *Other aspects (2) READ-ONLY REPLICAS  Volumes containing files that are frequently read but rarely modified, can be replicated as read-only volumes at several servers. BULK TRANSFERS  The use of such a large packet size(64 kilobyte chunks) is an important aid to performance, minimizing the effect of network latency. PERFORMANCE  whole-file caching leads to dramatically reduced loads on the servers. ( a server load of 40% was measured against a load of 100 % for NFS running the same benchmark. ).

Distributed File System 서강대학교 정보통신 대학원 Page 102 #Main benefits of AFS Data Sharing is simplified : a workstation can access any file in AFS since file system is location transparent, the user can access any file by just using names. User mobility is supported : user can access any shared file stored on any workstation in the system. System administration is easier. Better security is possible. The servers in VICE are secure and run trusted system software. Client autonomy is improved : workstations can be moved, turned off without affecting users at other workstations.

Distributed File System 서강대학교 정보통신 대학원 #AFS-2 versus Sun NFS performance

Distributed File System 서강대학교 정보통신 대학원 Page 104 CASE STUDY : The Coda File System

Distributed File System 서강대학교 정보통신 대학원 Page 105 The limitation of AFS the limited form of replication (restricted to read-only volume) Fault tolerance of the service the mobile use of portable computers.

Distributed File System 서강대학교 정보통신 대학원 Page 106 Coda File system developed in a research project undertaken by Satyanarayanan and his co-worker at Carnegie Mellon University. A descendent of AFS. (the Coda design requirements were derived from experience with AFS.) developed as a solution to the drawbacks of AFS and to meet the need for disconnected operation of portable workstations. a principle of the design of Coda  the copies of files residing on servers are more reliable than those residing in the cache of workstations.

Distributed File System 서강대학교 정보통신 대학원 Page 107 #Coda File system features (1) disconnected operation for mobile clients  reintegration of data from disconnected clients  bandwidth adaptation Failure Resilience  read/write replication servers  resolution of server/server conflicts  handles of network failures which partition the servers  handles disconnection of clients client

Distributed File System 서강대학교 정보통신 대학원 Page 108 #Coda File system features (2) Performance and scalability  client side persistent caching of files, directories and attributes for high performance  write back caching Security  kerberos like authentication  access control lists (ACL's) Well defined semantics of sharing Freely available source code

Distributed File System 서강대학교 정보통신 대학원 Page 109 *Coda File system component VSG(Volume Storage Group)  the set of servers holding replicas of a file volume. AVSG(Available Volume Storage Group)  access available subset of the VSG.  the membership of the AVSG varies as servers become accessible or are made inaccessible by network/server failure.

Distributed File System 서강대학교 정보통신 대학원 Page 110 #Client / Venus / Vice

Distributed File System 서강대학교 정보통신 대학원 Page 111 *Optimistic replication Strategy (1) allows modification of files to proceed when the network is partitioned or during disconnected operation. It relies on the attachment to each version of a file of a Coda version vector (CVV) and a timestamp.  CVV : a vector of integers with one element for each server in the relevant VSG.  Each element of the CVV is an estimate (a count) of the number of modifications performed on the version of the file.  The purpose of the CVVs : to provide sufficient information about the update history of each file version to enable inconsistencies to be detected and corrected automatically.

Distributed File System 서강대학교 정보통신 대학원 Page 112 *Optimistic replication Strategy (2) When a modified file is closed,  each site in the current AVSG is sent an update message by the Venus process at the client, containing the current CVV and the new contents for the file.  The Vice process at each site checks the CVV.  The Venus process then computes a new CVV with modification counts increased and distributes the new CVV to the members of the AVSG.  The message is sent only to the member of the AVSG.

Distributed File System 서강대학교 정보통신 대학원 Page 113 *Optimistic replication Strategy (3) The advantages deriving from the replication  The files in a replicated volume remain accessible to any client that can access at least one of the replicas.  The performance of the system can be improved by sharing some of the load. Coda enhances availability  by the replication of files across servers.  by the ability of clients to operate entirely out of their caches.

Distributed File System 서강대학교 정보통신 대학원 Page 114 *Update semantics (1) currency guarantee for open/close (a client : C / a file : F / AVSG : s ) after a successful open : s   and latest ( F, s, 0 ) or ( latest ( F, s, T ) and lostCallback ( s, T ) and inCache ( F ) ) ) or ( s =  and and inCache ( F ) ) after a failed open : s   and conflict ( F, s ) or ( s =  and and inCache ( F ) ) after a successful close : s   and update ( F, s ) or ( s =  )

Distributed File System 서강대학교 정보통신 대학원 Page 115 *Update semantics (2) currency guarantee for open/close (cont.) (a client : C / a file : F / AVSG : s ) after a failed open : s   and conflict ( F, s )  latest ( F, s, T ) : denotes that the current value of F C was the latest across all the servers in s at some instant in the last T seconds.  lostCallback(s,T) : a callback was sent by some member of s in the last T seconds and was not received at C.  conflict( F, s ) : the values of F at some servers in s are currently in conflict.

Distributed File System 서강대학교 정보통신 대학원 Page 116 *Accessing Replicas The strategy used on open and close to access the replicas of a file is a variant of the read-one, write-all approach.  Open operation if a copy of the file is not present in the local cache the client identifies a preferred server from the AVSG. The client requests a copy of the file, and on receiving it, it checks with all the other members of the AVSG to verify that the copy is the latest available version.  Close operation When a file is closed at a client after modification, its contents and attributes are transmitted in parallel to all the members of the AVSG using a multicast remote procedure calling protocol.

Distributed File System 서강대학교 정보통신 대학원 Page 117 *Cache coherence (1) events to be detect by Venus  Enlargement of an AVSG. ( due to the accessibility of a previously inaccessible server )  Shrinking of an AVSG ( due to a server becoming inaccessible )  A lost callback event To achieve this,Venus sends a probe message to all the servers in VSGs of the files that it has in its cache every T seconds.

Distributed File System 서강대학교 정보통신 대학원 Page 118 *Cache coherence (2) The problem of updates  that are missed by a server because it is not in the AVSG of a different client that performs an update.  Venus is sent a volume version vector(volume CVV) in response to each probe message. Volume CVV : a summary of the CVVs.  Venus detects any mismatch between the volume CVVs.

Distributed File System 서강대학교 정보통신 대학원 Page 119 *Disconnected Operation During brief disconnections, the least-recently-used cache replacement policy normally adopted by Venus ( to avoid cache missed on the disconnected volumes) Coda allows users to specify a prioritized list of files and directories that Venus should strive to retain in the cache. When disconnected operation ends, a process of reintegration begins. Conflicts may be detected during reintegration.  the cache copy is stored in a temporary location(covolume) on the server and the user that initiated the reintegration is informed.

Distributed File System 서강대학교 정보통신 대학원 Page 120 #Failure resilience methods

Distributed File System 서강대학교 정보통신 대학원 Page 121 Performance compare the performance of Coda with AFS under benchmark loads designed to simulate user populations ranging from 5-50 AFS users.  With no replication : no significant difference  With 3-fold replication : the time for Coda to perform a benchmark load equivalent to 5 users exceeds that of AFS without replication by only 5%.  With 3-fold replication ( and a load equivalent to 50 users ) : the time to complete the benchmark is increased by 70%, whereas that for AFS without replication if increased by only 16%.

Distributed Processing Systems (Distributed File System) 오 상 규 서강대학교 정보통신 대학원

Similar presentations

Presentation on theme: "Distributed Processing Systems (Distributed File System) 오 상 규 서강대학교 정보통신 대학원"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Processing Systems (Distributed File System) 오 상 규 서강대학교 정보통신 대학원

Similar presentations

Presentation on theme: "Distributed Processing Systems (Distributed File System) 오 상 규 서강대학교 정보통신 대학원"— Presentation transcript:

Similar presentations

About project

Feedback