Distributed file system

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 12: Distributed.
CS6223: Distributed Systems Distributed File Systems.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Slides for Chapter 8: Distributed File Systems
File System Implementation
1 Reliable Distributed Systems Stateless and Stateful Client- Server Systems.
Chapter 12 File Management Systems
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed File Systems Steve Ko Computer Sciences and Engineering University at Buffalo.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Networked File System CS Introduction to Operating Systems.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
CH2 System models.
Distributed File Systems
Distributed system Distributed File System Nguyen Huu Tuong Vinh Huynh Thi Thu Thuy Dang Trang Tri.
What is a Distributed File System?? Allows transparent access to remote files over a network. Examples: Network File System (NFS) by Sun Microsystems.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 Chap8 Distributed File Systems  Background knowledge  8.1Introduction –8.1.1 Characteristics of File systems –8.1.2 Distributed file system requirements.
DISTRIBUTED FILE SYSTEMS Pages - all 1. Topics  Introduction  File Service Architecture  DFS: Case Studies  Case Study: Sun NFS  Case Study: The.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Lecture 27-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) December 3, 2013 Lecture 27 Distributed File Systems.
DISTRIBUTED FILE SYSTEM 1 DISTRIBUTED FILE SYSTEMS.
Information Management NTU Distributed File Systems.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
1 Reliable Distributed Systems Stateless and Stateful Client- Server Systems.
COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Dsitributed File Systems
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
1 Reliable Distributed Systems Stateless and Stateful Client- Server Systems Based on K. Birman’s of Cornell, Dusseu’s of Wisconsin.
Chapter 12: File System Implementation
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Distributed File Systems (DFS)
Andrew File System (AFS)
Lecture 25: Distributed File Systems
File System Implementation
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
4.3 Network File System (NFS)
File service architecture
Chapter 2: System Structures
Chapter 15: File System Internals
Slides for Chapter 8: Distributed File Systems
Distributed File Systems
Multiple Processor Systems
Distributed File Systems
DISTRIBUTED FILE SYSTEMS
Exercises for Chapter 8: Distributed File Systems
Distributed File System
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Lecture 25: Distributed File Systems
Distributed File Systems
Multiple Processor and Distributed Systems
Chapter 15: File System Internals
Database Recovery 1 Purpose of Database Recovery
Today: Distributed File Systems
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Database System Architectures
Chapter 15: File System Internals
Distributed File Systems
Network File System (NFS)
Presentation transcript:

Distributed file system based on Distributed Systems: Concepts and Design, Edition 5 Ali Fanian Isfahan University of Technology www.Fanian.iut.ac.ir

Distributd File system Introduction Distributd File system File service architecture Sun Network File System Recent advances

Introduction part1

Definition : distributed file system enables programs to store and access remote files as they do local ones. allowing users to access files from any computer on a network. The requirements for sharing within local networks and intranets lead to a need for a different type of service. Support: persistent storage of data and programs of all types of clients Intro

The concentration of persistent storage at a few servers result : reduces the need for local disk storage enables economies to be made in the management and archiving of the persistent data owned by an organization. (more importantly) Other services, such as the name service, the user authentication service and the print service, can be more easily implemented when they can call upon the file service to meet their needs for persistent storage Intro

Characteristics of file system consist of a sequence of data item (typically 8-bit bytes). accessible by operations to read and write any portion of the sequence. Data File as a single record containing information such as the length of the file, timestamps, file type, owner’s identity and access control lists. Attributes Intro

Characteristics of file system updated by system: File length Creation timestamp Read timestamp Write timestamp Attribute timestamp Reference count Owner File type Access control list E.g. for UNIX: rw-rw-r-- updated by owner: Figure 12.3 File attribute record structure Intro

Distributed file system requirements Transparency Concurrency Replication Heterogeneity Fault tolerance Consistency Security Efficiency.. The design must balance the flexibility from it against complexity and performance Access: client programs are unaware of distributed of files.Programs written to operate on local files are able to access remote files without modification Location: client programs should see a uniform file name space when Files may be relocated without changing their pathnames. . Mobility: Automatic relocation of files is possible (neither client programs nor system admin tables in client nodes need to be changed when files are moved). Performance: Satisfactory performance across a specified range of system loads Scaling: Service can be expanded to meet additional loads or growth. Changes to a file by one client should not interfere with the operation of other clients simultaneously accessing or changing the same file. Most current file services provide File or record-level locking File service can maintain copies of a file in different location. Enables multiple servers to share the load of providing a service to clients accessing the same set of files .enhancing the scalability of the service Fault tolerance by enabling clients to locate another server that holds a copy of the file Caching (of all or part of a file) locally Heterogeneity properties Service can be accessed by clients running on (almost) any OS or hardware platforms Service interfaces must be open - precise Service must continue to operate even when clients make errors or crash. servers can be stateless, so that they can be restarted and the service restored after a failure without any need to recover previous state. If the service is replicated, it can continue to operate even during a server crash. Unix offers one-copy update semantics for operations on local files Difficult to achieve the same for distributed file systems while the files are replicated or cached at different sites due to delay in propagation of modifications Must maintain access control as for local files. based on identity of user making request identities of remote users must be authenticated Server may rely messages with digital signatures & encryption (optionally) Service interfaces are open to all processes not excluded by a firewall. Goal for distributed file systems is usually performance comparable to local file system. The techniques used for the implementation of file services are an important part of the design of distributed systems. Intro

File service architecture part2

An architecture that offers a clear separation in providing access to files is obtained by structuring the file service as three components: A flat file service A directory service A client module. The flat file service and the directory service each export an interface for use by client programs, and their RPC interfaces, providing a set of operations for access to files. Client module that perform operations for clients on directories and on files architecture

Lookup AddName UnName GetNames Read Write Create Delete GetAttributes SetAttributes Client computer Server computer Application program Client module Directory service Flat file service Figure12.5 file service architecture architecture

Responsibilities of various modules Flat file service: Concerned with the implementation of operations on the contents of file. Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat file service operations. UFIDs are long sequences of bits chosen so that each file has a unique among all of the files in a distributed system. Directory Service: Provides mapping between text names for the files and their UFIDs. Clients may obtain the UFID of a file by quoting its text name to directory service. architecture

Responsibilities of various modules Directory service supports functions needed to add new files to directories. Client Module: It runs on each computer and provides extended service (flat file and directory) as a single API to application programs It holds information about the network locations of flat-file and directory server processes. achieve better performance through implementation of a cache of recently used file blocks at the client. architecture

Server operations/interfaces for the model file service Flat file service Read(FileId, i, n) -> Data Write(FileId, i, Data) Create() -> FileId Delete(FileId) GetAttributes(FileId) -> Attr SetAttributes(FileId, Attr) position of first byte Flat file differs from Unix repeatable operations : With the exception of Create, clients may repeat calls to which they receive no reply. Repeated execution of Create produces a different new file for each call. Stateless servers: The interface is suitable for implementation by stateless servers that without open(),close(). Directory service Lookup(Dir, Name) -> FileId AddName(Dir, Name, FileId) UnName(Dir, Name) GetNames(Dir, Pattern) -> NameSeq Pathname lookup Pathnames such as '/usr/bin/tar' are resolved by iterative calls to lookup(), one call for each component of the path, starting with the ID of the root directory '/' which is known in every client. FileId Contain an valid UFID with user’ sufficient access rights architecture

DFS: Case Studies NFS (Network File System) AFS (Andrew File System) Developed by Sun Microsystems (in 1985) NFS was the first file service that was designed as a product. Their design is operating system–independent AFS (Andrew File System) Developed by Carnegie Mellon University as part of Andrew distributed computing environments (in 1986) intention to support information sharing on a large scale by minimizing client-server communication Public domain implementation is available on Linux (Linux AFS) architecture

Sun Network File System(NFS) part3 Sun Network File System(NFS)

Sun NFS The NFS client and server modules communicate using remote procedure calls Closely follows the abstract file service model defined above. we shall describe the UNIX implementation the NFS protocol (version 3). Supports many of the design requirements already mentioned: transparency heterogeneity efficiency fault tolerance Limited achievement of: concurrency replication consistency security NFS

Figure 12.8.NFS architecture Client computer Server computer UNIX file system NFS client Application program Virtual file system NFS server UNIX file system Virtual file system UNIX system calls UNIX kernel UNIX kernel Operations on remote files Operations on local files Other file system The NFS server module resides in the kernel on each computer that acts as an NFS Server NFS protocol (remote operations) Figure 12.8.NFS architecture NFS

Virtual file system The integration is achieved by a VFS module, which has been added to the UNIX kernel to distinguish between local and remote files. it passes each request to the appropriate local system module (the UNIX file system, the NFS client module or the service module for another file system). Translate between file identifiers used by NFS and the internal file identifiers normally used in UNIX and other file systems. NFS

File handle The file identifiers used in NFS. A file handle is unclear to clients and contains whatever information the server needs . a unique number that is allocated to each file system when it is created is incremented each time the i-node number is reused Filesystem identifier i-node number i-node generation a number that serves to identify and locate the file in which the file is stored and are reused after a file is removed NFS

NFS server operations read(fh, offset, count) -> attr, data write(fh, offset, count, data) -> attr create(dirfh, name, attr) -> newfh, attr remove(dirfh, name) status getattr(fh) -> attr setattr(fh, attr) -> attr lookup(dirfh, name) -> fh, attr rename(dirfh, name, todirfh, toname) mkdir(dirfh, name, attr) -> newfh, attr rmdir(dirfh, name) -> status statfs(fh) -> fsstats Model flat file service Read(FileId, i, n) -> Data Write(FileId, i, Data) Create() -> FileId Delete(FileId) GetAttributes(FileId) -> Attr SetAttributes(FileId, Attr) Model directory service Lookup(Dir, Name) -> FileId AddName(Dir, Name, Fileid) UnName(Dir, Name) GetNames(Dir, Pattern) ->NameSeq NFS

NFS access control and authentication Stateless server, so the user's identity and authentication information ( user ID and group ID) must be checked by the server on each request. In the local file system they are checked only on open() The client can modify the RPC calls to include the user ID of any user(impersonating the user), unless the userID and groupID are protected by encryption Kerberos has been integrated with NFS to provide a stronger security solution. NFS

Mount service The mounting of subtrees of remote filesystems by clients is supported by a separate mount service that runs at each NFS server. Request mounting in operation: mount(remotehost, remotedirectory, localdirectory) Each client maintains a table of mounted file systems in NFS client and VFS layer,holding < IP address, port number, file handle> NFS

Figure 12.10 Local and remote filesystems accessible on an NFS client the meaning of this is that programs running at Client can access files at Server 1 and Server 2 by using pathnames such as /usr/students/jon and /usr/staff/ann. NFS

Hard versus soft mounts Remote filesystems may be hard or soft-mounted in a client computer. hard-mount When a user-level process accesses a file that is hard-mounted, the process is suspended until the request can be completed. if the remote host is unavailable for any reason the NFS client module continues to retry the request until it is satisfied. soft-mount In this case, the NFS client module returns a failure indication to user-level processes after a small number of retries. programs will then detect the failure and take appropriate recovery or reporting actions NFS

Automounter mount a remote directory dynamically whenever an ‘empty’ mount point is referenced by a client. has a table of mount points and one or more server for each. it sends a probe message to each candidate server and then uses the mount service to mount the filesystem at the first server to respond. Provides a simple form of replication for read-only filesystems E.g. if there are several servers with identical copies of /usr/lib then each server will have a chance of being mounted at some clients. NFS

Securing NFS with Kerberos Kerberos protocol is too costly to apply on each file access request Kerberos is used in the mount service: to authenticate the user's identity User's UserID and GroupID are stored at the server with the client's IP address For each file request: The UserID and GroupID sent must match those stored at the server IP addresses must also match This approach has some problems can't accommodate multiple users sharing the same client computer all remote filestores must be mounted each time a user logs in NFS

NFS optimization - server caching pages (blocks) from disk are held in a main memory buffer cache until the space is required for newer pages. Read-ahead and delayed-write optimizations To guard against loss of data in a system crash, the UNIX sync operation flushes altered pages to disk every 30 seconds. NFS

NFS v3 servers offers two strategies for updating the disk: Works well in local context, but in the remote case extra measures are needed to ensure that clients can be confident that the results of the write operations are persistent, even when server crashes occur. NFS v3 servers offers two strategies for updating the disk: write-through : altered pages are written to disk as soon as they are received at the server. When a reply is sent, the NFS client knows that the page is on the disk. delayed commit: pages are held only in the cache until a commit() call is received for the relevant file. A commit() is issued by the client whenever a file is closed. NFS

NFS optimization - client caching Server caching does nothing to reduce RPC traffic between client and server. NFS client module caches the results of read, write, getattr, lookup and readdir operations. synchronization of file contents (one-copy semantics) is not guaranteed when two or more clients are sharing the same file. Instead, clients are responsible for polling the server to check the currency of the cached data that they hold. NFS

(T - Tc < t) v (Tmclient = Tmserver) Timestamp-based to validate cached blocks before use: A cache entry is valid at time T if this statement is true (T - Tc < t) v (Tmclient = Tmserver) t is configurable (per file) but is typically set to 3 seconds for files and 30 secs for directories. There is one value of Tmserver for all the data blocks in a file and another for the file attributes. if the first part is false, the current value of Tmserver is obtained (by a getattr call to the server) t freshness interval Tc time when cache entry was last validated Tm time when block was last updated at server T current time NFS

Several measures are used to reduce the traffic of getattr calls to the server: Whenever a new value of Tmserver is received at a client, it is applied to all cache entries derived from the relevant file. The current attribute values are sent with the results of every operation on a file, and if the value of Tmserver has changed the client uses it to update the cache entries relating to the file. The adaptive algorithm for setting freshness interval t outlined above reduces the traffic considerably for most files. NFS

NFS performance Early measurements (1987) established that: write() operations are responsible for only 5% of server calls in typical UNIX environments hence write-through at server is acceptable lookup() accounts for 50% of operations -due to step-by-step pathname resolution necessitated by the naming and mounting semantics. Single-CPU implementations based on PC hardware achieve throughputs in excess of 12,000 server ops/sec large multi-processor configurations with many disks achieved throughputs of up to 300,000 server ops/sec. NFS

NFS summary An excellent example of a simple, high-performance distributed service. Achievement of transparencies: Access: Excellent; the UNIX system call interface for both local and remote files. No modifications to existing programs are required to enable them to operate correctly with remote files. Location: Not guaranteed but normally achieved; naming of filesystems is controlled by client mount operations, have different pathnames on different clients;but transparency can be ensured by an appropriate system configuration. NFS

Mobility: Hardly achieved; Filesystems may be moved between servers, but the remote mount tables in each client must then be updated separately to enable the clients to access the filesystems in their new locations Replication: Limited to read-only file systems; for writable files on several server, the SUN Network Information Service (NIS) separately runs over NFS and is used to replicate essential system files. Scaling: Good; NFS servers can be built to handle very large real-world loads in an efficient manner. The performance of a single server can be increased by the addition of processors, disks When the limits of that process are reached, additional servers must be installed and the filesystems must be reallocated between them that need to support replication NFS

Concurrency: Limited when read-write files are shared concurrently between clients, consistency is not perfect. Fault tolerance: Limited but effective; service is suspended if a server fails. but once it has been restarted user-level client processes proceed from the point at which the service was interrupted, unaware of the failure.(except in soft-mounted Security: The integration of Kerberos with NFS was a major step forward. Recent developments include the option to use a secure RPC implementation for authentication of the data transmitted with read and write operations. Efficiency: Good; The measured performance of several implementations show that NFS protocols can be implemented for use in situations that generate very heavy loads. NFS

Summery Distributed File systems provide illusion of a local file system and hide complexity from end users. Sun NFS is an excellent example of a distributed service designed to meet many important design requirements Effective client caching can produce file service performance equal to or better than local file systems Future requirements: support for mobile users. Full Replication support for data streaming and video file server Advance