Presentation is loading. Please wait.

Presentation is loading. Please wait.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Similar presentations


Presentation on theme: "DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S"— Presentation transcript:

1 DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 11 DISTRIBUTED FILE SYSTEMS Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

2 DISTRIBUTED FILE SYSTEMS
Objectives: Architecture Processes Communication Synchronization Consistency and Replication Fault Tolerance Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

3 DISTRIBUTED FILE SYSTEMS
Distributed file systems allow multiple processes to share data over long periods of time in a secure and reliable way Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

4 DISTRIBUTED FILE SYSTEMS
ARCHITECTURE: - Client-Server Architectures - Cluster-Based Distributed File Systems - Symmetric Architectures (P2P-based File Systems) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

5 DISTRIBUTED FILE SYSTEMS
ARCHITECTURE: - Client-Server Architectures e.g. Network File System (NFS) from Sun Microsystems File System Model: - to access a file, a client must first look up its name in a naming service and obtain the associated file handle - each file has a number of attributes whose values can be looked up and changed Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

6 Client-Server Architectures
Figure (a) The remote access model. (b) The upload/download model. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

7 Client-Server Architectures
Figure The basic NFS architecture for UNIX systems. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

8 File System Model Figure An incomplete list of file system operations supported by NFS. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

9 File System Model Figure An incomplete list of file system operations supported by NFS. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

10 DISTRIBUTED FILE SYSTEMS
Cluster-Based Distributed File Systems - When dealing with very large data collections, following a simple client-server approach is not going to work - for speeding up file accesses, apply striping techniques by which files can be fetched in parallel file-striping techniques: by which a single file is distributed across multiple servers, it becomes possible to fetch different parts in parallel Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

11 Cluster-Based Distributed File Systems
Figure The difference between (a) distributing whole files across several servers and (b) striping files for parallel access. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

12 Cluster-Based Distributed File Systems
Example: Google, has developed its own Google file system (GFS) The Google solution: Divide files in large 64 MB chunks, and distribute/replicate chunks across many servers: - The master maintains only a (file name, chunk server) table in main memory; minimal I/O - Files are replicated using a primary-backup scheme Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

13 Cluster-Based Distributed File Systems
Figure The organization of a Google cluster of servers. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

14 Symmetric Architectures
P2P-based File Systems Example: Ivy, a distributed file system that is built using a Chord DHT-based system Basic idea: Store data blocks in the underlying P2P system: -Every data block with content D is stored on a node with hash h(D). Allows for integrity check. - Public-key blocks are signed with associated private key and looked up with public key. - A local log of file operations to keep track of {blockID,h(D)} pairs Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

15 Symmetric Architectures
Figure The organization of the Ivy distributed file system. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

16 DISTRIBUTED FILE SYSTEMS
Processes: most interesting aspect concerning file system processes is whether or not they should be stateless? Example: in the NFS latest version(4) statefull Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

17 DISTRIBUTED FILE SYSTEMS
COMMUNICATION: Communications in distributed file systems are based on remote procedure calls (RPCs) The main reason is to make the system independent from underlying operating systems, networks, and transport protocols RPC in NFS: Every NFS operation can be implemented as a single remote procedure call to a file server Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

18 Remote Procedure Calls in NFS
Figure (a) Reading data from a file in NFS version 3. (b) Reading data using a compound procedure in version 4. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

19 DISTRIBUTED FILE SYSTEMS
Synchronization: File sharing semantics When dealing with distributed file systems, we need to take into account the ordering of concurrent read/write operations and expected semantics (i.e., consistency) When two or more users share the same file at the same time, it is necessary to define the semantics of reading and writing precisely to avoid problems Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

20 DISTRIBUTED FILE SYSTEMS
Synchronization: File sharing semantics UNIX semantics: a read operation returns the effect of the last write operation can only be implemented for remote access models in which there is only a single copy of the file Transaction semantics: the file system supports transactions on a single file; issue is how to allow concurrent access to a physically distributed file Session semantics: the effects of read and write operations are seen only by the client that has opened (a local copy) of the file; what happens when a file is closed (only one client may actually win) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

21 Semantics of File Sharing
Figure (a) On a single processor, when a read follows a write, the value returned by the read is the value just written. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

22 Semantics of File Sharing
Figure (b) In a distributed system with caching, obsolete values may be returned. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

23 Semantics of File Sharing
Figure Four ways of dealing with the shared files in a distributed system. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

24 DISTRIBUTED FILE SYSTEMS
CONSISTENCY AND REPLICATION: In modern distributed file systems, client-side caching is the preferred technique for attaining performance; server-side replication is done for fault tolerance. Clients are allowed to keep (large parts of) a file, and will be notified when control is withdrawn; servers are now generally stateful Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

25 DISTRIBUTED FILE SYSTEMS
CONSISTENCY AND REPLICATION e.g. Caching in NFS: clients cache file data, attributes, file handles, and directories. Different strategies exist to handle consistency of the cached data, cached attributes, etc. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

26 Figure 11-21. Client-side caching in NFS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

27 Client-Side Caching Figure Using the NFSv4 callback mechanism to recall file delegation. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

28 DISTRIBUTED FILE SYSTEMS
FAULT TOLERANCE: Replication is deployed to create fault-tolerant server groups Handling Byzantine Failures (arbitrary failures) quorum certificate : sufficiently large number of processes (2k+1) have stored the same request and that it is thus safe to proceed Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

29 Handling Byzantine Failures
Figure The different phases in Byzantine fault tolerance. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

30 Summary Distributed file systems form an important paradigm for building distributed systems generally organized according to the client-server model client-side caching and server replication to meet scalability requirements Also, caching and replication are needed to achieve high availability. More recently, symmetric architectures such as those in peer-to-peer file-sharing systems have emerged. whether whole files or data blocks are distributed Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

31 Summary All operations can be expressed as RPCs to a file server instead of having to use primitive message-passing operations What makes distributed file systems different from nondistributed file systems is the semantics of sharing files Semantics: Unix, session, Immutable, and transactions To achieve acceptable performance, distributed file systems generally allow clients to cache an entire file Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved


Download ppt "DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S"

Similar presentations


Ads by Google