Presentation is loading. Please wait.

Presentation is loading. Please wait.

IM NTU Distributed Information Systems 2004 Distributed File Systems -- 1 Distributed File Systems Yih-Kuen Tsay Dept. of Information Management National.

Similar presentations


Presentation on theme: "IM NTU Distributed Information Systems 2004 Distributed File Systems -- 1 Distributed File Systems Yih-Kuen Tsay Dept. of Information Management National."— Presentation transcript:

1 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 1 Distributed File Systems Yih-Kuen Tsay Dept. of Information Management National Taiwan University

2 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 2 Purposes of a Distributed File System Sharing of storage and information across a network Convenience (and efficiency) of a conventional file system Persistent storage that most other services (e.g., Web servers) need

3 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 3 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Other properties include availability, timing guarantees, etc. Properties of Storage Systems

4 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 4 Files Files are an abstraction of permanent storage. A file is typically defined as a sequence of similar-sized data items along with a set of attributes. A directory is a file that provides a mapping from text names to internal file identifiers.

5 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 5 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. File Attributes

6 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 6 File Systems Responsible for the (a) organization, (b) storage, (c) retrieval, (d) naming, (e) sharing, and (f) protection of files. Provide a set of programming operations that characterize the file abstraction, particularly operations to read and write subsequences of data items beginning at any point of a file.

7 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 7 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. A basic distributed file system implements all of the above plus modules for client-server communication and distributed naming and location of files. File System Modules

8 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 8 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. UNIX File Operations

9 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 9 Distributed File System Requirements Transparency: access, location, mobility, performance, and scaling transparency. Concurrency (and Consistency) Replication/Caching (and Consistency) Hardware/operating system heterogeneity Fault-Tolerance Security (Access Control, Authentication) Efficiency

10 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 10 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Note: The modules communicate with one another by remote procedure calls. A File Service Architecture

11 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 11 File Service Components Flat file service: implementing operations on the contents of files, which are referred to by unique file identifiers (UFIDs) Directory service: mapping text names of files (including directories) to their UFIDs Client module: integrating and extending the previous two services under a single application programming interface * Why is this structure more open and configurable?

12 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 12 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Flat File Service Operations

13 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 13 Difference from UNIX Immediate access to files using UFIDs (without open or close) Read or write starts at the position indicated by a parameter All operations, except create, are repeatable Allows a stateless implementation

14 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 14 Conventional access rights checks (at open calls) not feasible Two ‘stateless’ approaches: * Capability (by manipulating the UFID) * User identity sent with every request (adopted in NFS and AFS) Main problem: forged requests; some authentication mechanism is needed Access Control

15 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 15 Capabilities and UFIDs A capability is a binary value that acts as an access key; it can be encoded in the UFID. Basic construction of a UFID: file group id + file number + random number Additional field: permissions Additional field: encryption of the permission field

16 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 16 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Note: Each directory is stored as an ordinary file with a UFID. Directory Service Operations

17 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 17 The Network File System (NFS) Introduced by Sun Microsystems in 1985, now an Internet standard Runs on top of RPC (RFC 1831) Implemented on most operating systems Version described here: UNIX implementation of NFS Version 3 (RFC 1813, June 1995) Most recent version: NFS Version 4 (RFC 3010, December 2000)

18 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 18 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Note: Each computer can act as both a client and a server. NFS Architecture

19 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 19 The Virtual File System Module Access transparency File handles (file identifiers): –‘ filesystem indentifier’ + ‘i-node number’ + ‘i-node generation number’ One VFS structure for each mounted filesystem –relates a remote filesystem (identified by its file handle obtained at mount time) to a local directory on which it is mounted One v-node per open file – indicates whether a file is local (i-node) or remote (file handle)

20 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 20 The NFS Client Module in UNIX Integrated with the kernel Emulates the UNIX file system primitives A single client module serves all user-level processes The encryption key for authentication stored in the kernel Caches file blocks

21 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 21 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. NFS Server Operations

22 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 22 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. NFS Server Operations (cont’d)

23 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 23 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. Remote File Accesses

24 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 24 File System Information in UNIX saturn:~ 35 % df -k Filesystemkbytes capacityMounted on /dev/dsk/c0t3d0s014390391%/ /dev/dsk/c0t3d0s626794399%/usr /dev/dsk/c0t3d0s3153833%/tmp galaxy:/usr/local.real403044053%/usr/local lucky:/var/mail.real56464886%/var/mail cosmos:/home.real/student/xxx 394176060%/home/xxx galaxy:/home.real/faculty/yyy 296451251%/home/yyy * Note: The output of ‘df -k’ has been edited.

25 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 25 Caching Server caching –read-ahead –write-through –delayed-write with the commit operation Client caching –cache validation (freshness interval and validation timestamp, modification timestamp and getattr, …) –bio-daemon (for read-ahead and delayed-write caching at the client side)

26 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 26 Achievements of NFS Access and location transparency Mobility transparency (partially) Read-only file replication: the automounter Fault-tolerance: stateless servers, the automounter Efficiency: caching of disk blocks (main problem: frequent use of getattr) Nonachievements: scalability, concurrency and consistency, security (Kerberos),...

27 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 27 The Andrew File System (AFS) Developed at CMU Current versions: AFS-2, AFS-3 Compatible with NFS Main achievement over (older) NFS: better scalability by minimizing client-server communication Key characteristics: whole-file serving and caching (partial file caching allowed in AFS- 3)

28 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 28 Observations on UNIX File Usage Files are mostly small Read operations are more common Sequential accesses are more common Most files are written by one user Files are referenced in burst

29 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 29 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. AFS Architecture

30 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 30 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. AFS File Name Space

31 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 31 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. System Call Interception in AFS

32 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 32 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. AFS System Calls Implementation

33 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 33 Cache Consistency A callback promise is provided when Vice supplies a copy of file to a Venus process The callback promise stored with the cached copy is in either valid or cancelled state When Venus handles an open, it checks the cache.

34 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 34 Source: G. Coulouris et al., Distributed Systems: Concepts and Design, Third Edition. The Vice Service Interface

35 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 35 Enhancements to NFS and AFS Spritely NFS –add open and close, use callbacks NQNFS (Not Quite NFS) –use callbacks and leases WebNFS –allow browsers and other applications to interact with an NFS server directly NFS Version 4 (RFC 3010, December 2000) –incorporating all of the above and more DCE/DFS (based on AFS) –use callbacks and write tokens (with a lifetime)

36 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 36 New Features of NFS Version 4 Adoption of the RPCSEC_GSS (RFC 2203) security protocol Multiple operations in one request Better migration and replication abilities –A client may query the location(s) of a file system. Introduction of open and close operations Lease-based file locking Callback-based delegation of files

37 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 37 New Design Approaches Background –high-performance storage technology (e.g., RAID) –log-structure file systems (e.g., Sprite, BSD LFS) –high-performance switched networks (e.g., ATM, high-speed Ethernet) Goals: high scalability and fault-tolerance Main ideas: distribute file data among many nodes, separate responsibilities, … Constraints: high level of trust

38 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 38 xFS –Serverless: all data, metadata, and control can be located anywhere in the system; any machine can take over the responsibilities of a failed one Frangipani –Two-layer structure the Petal distributed virtual disk system the Frangipani server module Both designs utilize RAID-style striping, log- structured file storage, etc. More Recent File System Designs

39 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 39 Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996 Log-based Striping in xFS

40 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 40 Source: T.E. Anderson et al., Serverless Network File Systems, ACM TOCS 1996 An xFS Configuration

41 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 41 Source: C.A. Thekkath et al., Frangipani, A Scalable Distributed File System, ACM SOSP 1997 A Frangipani Configuration

42 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 42 Source: G.A. Gibson and R. van Meter, Network Attached Storage Architecture, CACM, November 2000. Storage Systems

43 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 43 Source: G.A. Gibson and R. van Meter, Network Attached Storage Architecture, CACM, November 2000. Note: the difference is disappearing. NAS and SAN

44 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 44 Source: E. Riedel, Storage Systems, Queue, June 2003. Bandwith for Disk Access

45 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 45 Source: E. Riedel, Storage Systems, Queue, June 2003. Increasing the Bandwith

46 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 46 Source: E. Riedel, Storage Systems, Queue, June 2003. Virtualization in SAN

47 IM NTU Distributed Information Systems 2004 Distributed File Systems -- 47 Basic requirements: resource consolidation, rapid deployment, central management, convenient backup, high availability, data sharing. Geographic separation Security against an increasing risk of unauthorized access Performance scalable with capacity (accesses per second or megabytes per second) Requirements for Storage Systems


Download ppt "IM NTU Distributed Information Systems 2004 Distributed File Systems -- 1 Distributed File Systems Yih-Kuen Tsay Dept. of Information Management National."

Similar presentations


Ads by Google