Download presentation
Presentation is loading. Please wait.
1
Distributed Systems CS 15-440
Distributed File Systems Lecture 24, November 30, 2016 Mohammad Hammoud
2
Today… Last Session: Fault Tolerance- Part III Today’s Session:
Distributed File Systems Announcements: P4 is due today by midnight Final exam is on Thursday, Dec 8th from 1:30PM to 4:30PM at Room 1190 (all topics are included; it will be open books, open notes)
3
Intended Learning Outcomes: Distributed File Systems
Considered: a reasonably critical and comprehensive perspective. Thoughtful: Fluent, flexible and efficient perspective. Masterful: a powerful and illuminating ILO7 Explain distributed file systems as a paradigm for general-purpose distributed systems, and analyze its various aspects and architectures ILO7.1 Define distributed file systems (DFSs) and explain various architectures ILO7.2 Analyze various aspects of DFSs including processes, communication, naming, synchronization, consistency and replication, and fault tolerance ILO7 ILO7.1 ILO7.2
4
Distributed File Systems
Why File Systems? To organize data (as files) To provide a means for applications to store, access, and modify data Why “Distributed” File Systems (DFSs)? To share data across a cluster of machines To store large-scale datasets To provide transparency and ease of management
5
File Server (Providing NAS)
NAS versus SAN Another term for DFS is network attached storage (NAS), referring to attaching storage to network servers that provide file systems A similar sounding term that refers to a very different approach is storage area network (SAN) SAN makes storage devices (not file systems) available over a network SAN can typically: Provide extremely high data throughput and is usually implemented with Fibre Channel storage Be costly. Each node connected to the SAN must have a Fibre Channel host bus adapter (HBA) to connect to the Fibre channel network Client Computer File Server (Providing NAS) Client Computer LAN SAN Client Computer Database Server Client Computer
6
DFS Aspects Aspect Description Aspect Description Aspect Description
Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs?
7
Architectures Client-Server Distributed File Systems
Cluster-Based Distributed File Systems Symmetric Distributed File Systems
8
Architectures Client-Server Distributed File Systems
Cluster-Based Distributed File Systems Symmetric Distributed File Systems
9
Network File System Many distributed file systems are organized along the lines of client-server architectures Sun Microsystem’s Network File System (NFS) is one of the most widely-deployed DFS for Unix-based systems NFS comes with a protocol that describes precisely how a client can access a file stored on a (remote) NFS file server NFS allows a heterogeneous collection of processes, possibly running on different OSs and machines, to share a common file system
10
Remote Access Model The model underlying NFS and similar systems is referred to as remote access model In this model, clients are: Offered transparent accesses to a file system that is managed by a remote server Normally unaware of the actual location(s) of files Offered an interface to a file system similar to the interface offered by a conventional local file system Replies from the server Server Client File Requests from a client to access a remote file The file remains at the server
11
Upload/Download Model
A contrary model, referred to as upload/download model, allows a client to access a file locally after having downloaded it from the server An Example: The Internet’s FTP service The file is moved to the client’s side Server Client File New File All accesses are done on the client’s side When the client is done, the file is returned to the server
12
The Basic NFS Architecture
Client Server System call layer System call layer Virtual File System (VFS) layer Virtual File System (VFS) layer Local file system interface NFS client NFS server Local file system interface RPC client stub RPC server stub A Client Request in NFS Network How is Naming Handled?
13
Structured Naming in NFS
Client A Server Client B remote usr users usr work usr Mount steen subdirectory Mount steen subdirectory bin steen bin mbox mbox mbox Exported directory mounted by Client A Exported directory mounted by Client B The same file is now shared! The file is named /usr/bin/mbox at Client A The file is named /usr/bin/mbox at Client B
14
Architectures Client-Server Distributed File Systems
Cluster-Based Distributed File Systems Symmetric Distributed File Systems
15
Data-Intensive Applications
Today there is a deluge of large data-intensive applications Most data-intensive applications fall into one of two styles of computing: Internet services (or cloud computing) High-performance computing (HPC) Cloud computing and HPC applications: Run typically on hundreds/thousands of compute nodes Process sheer volumes of data (or Big Data) Visualization of entropy in Terascale Supernova Initiative application. Image from Kwan-Liu Ma’s visualization team at UC Davis
16
Cluster-Based Distributed File Systems
Cluster-based file systems: Are key for providing scalable data-intensive application performance Typically partition and distribute large-scale datasets using file striping techniques Could be viewed as Cloud-Computing- or HPC-oriented Examples: Cloud-Computing-Oriented: Google File System (GFS) HPC-Oriented: Parallel Virtual File System (PVFS)
17
File Striping Techniques
Server clusters are often used for distributed applications and their associated file systems are adjusted to satisfy their requirements One well-known technique is to deploy file-striping techniques, by which a single file is distributed across multiple servers Hence, it becomes possible to fetch different parts concurrently Accessing file parts in parallel a b d e a a c e c b b d c d e
18
Round-Robin Distribution (1)
How to stripe a file over multiple machines? Round-Robin is typically a reasonable default solution Logical File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Stripe Size Striping Unit Stripe Size= how many striping unit each server will receive. Server 1 Server 2 Server 3 Server 4 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15
19
Round-Robin Distribution (2)
Clients perform writes/reads of file at various regions Client I: 512K write, offset 0 Client II: 512K write, offset 512 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 1 5 2 6 3 7 8 12 9 13 10 14 11 15 Server 1 Server 2 Server 3 Server 4 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15
20
2D Round-Robin Distribution (1)
What happens when we have many servers (say 1000s)? 2D distribution can help Logical File 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Stripe Size Striping Unit Server 1 Server 2 Server 3 Server 4 2 4 6 1 3 5 7 8 10 12 14 9 11 13 15 Group Size = 2
21
2D Round-Robin Distribution (2)
2D distribution can limit the number of servers per client Client I: 512K write, offset 0 Client II: 512K write, offset 512 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 1 5 2 6 3 7 8 12 9 13 10 14 11 15 Server 1 Server 2 Server 3 Server 4 2 4 6 1 3 5 7 8 10 12 14 9 11 13 15 Group Size = 2
22
GFS Data Distribution Policy
The Google File System (GFS) is a scalable DFS for data-intensive applications GFS partitions large files into multiple pieces called chunks or blocks, and stores them on different data servers This design is referred to as block-based design Each GFS chunk has a unique 64-bit identifier and is stored as a file through a local file system at a data server GFS distributes chunks across cluster data servers using a random distribution policy
23
GFS Distribution and Replication Policies
Server 0 (Writer) Server 1 Server 2 Server 3 0M Blk 0 Blk 0 Blk 1 Blk 0 64M Blk 1 Blk 2 Blk 2 Blk 1 128M Blk 2 Blk 3 Blk 4 Blk 4 Blk 3 Blk 3 Blk 6 192M 256M Blk 4 Blk 5 Blk 5 Blk 5 320M Blk 6 Blk 6 384M Each block is “replicated” 3 times by default!
24
GFS Distribution and Replication Policies
Server 0 (Writer) Server 1 Server 2 Server 3 0M Blk 0 Blk 0 Blk 1 Blk 0 64M Blk 1 Blk 2 Blk 2 Blk 1 128M Blk 2 Blk 3 Blk 4 Blk 4 Blk 3 Blk 3 Blk 6 192M 256M Blk 4 Blk 5 Blk 5 Blk 5 320M Blk 6 Blk 6 384M Load Imbalance
25
GFS Architecture The storage and compute capabilities of a cluster are usually organized in two ways: Co-locate storage and compute in the same node Separate storage nodes from compute nodes GFS File name, chunk index GFS client Master Structured Naming Through the Master Contact address The master keeps track of where a chunk is located The GFS master is contacted for metadata information The GFS master maintains a name space, along with a mapping from file name to chunks The chunk servers keep an account of what they have stored Chunks are replicated (server-side- No data caching; but it does cache metadata) to handle failures (No RAID, No SAN) Replicas are updated serially Chunk Id, range Chunk Server Chunk Server Chunk Server Chunk data Linux File System Linux File System Linux File System
26
PVFS Data Distribution Policy
Parallel Virtual File System (PVFS) is a scalable DFS for (scientific) data-intensive applications PVFS divides large files into multiple pieces called stripe units (by default 64KB) and stores them on different data servers This design is referred to as object-based design Unlike the block-based design of GFS, PVFS stores an object (or a handle) as a file that includes all the stripe units at a data server PVFS distributes stripe units across cluster data servers using a round-robin policy
27
PVFS Distribution and Replication Policies
Server 0 (Writer) Server 1 Server 2 Server 3 Blk 0 Blk 0 Blk 0 Blk 1 Blk 1 Blk 1 Blk 2 Blk 2 Blk 2 Blk 3 Blk 3 Blk 3 Blk 4 Blk 4 Blk 4 Blk 5 Blk 5 Blk 5 Blk 6 Blk 6 Blk 6 Blocks are also replicated for “performance” and “fault-tolerance” reasons!
28
PVFS Distribution and Replication Policies
Server 0 (Writer) Server 1 Server 2 Server 3 Blk 0 Blk 0 Blk 0 Blk 1 Blk 1 Blk 1 Blk 2 Blk 2 Blk 2 Blk 3 Blk 3 Blk 3 Blk 4 Blk 4 Blk 4 Blk 5 Blk 5 Blk 5 Blk 6 Blk 6 Blk 6 Load Balance
29
Provides Naming Service
PVFS Architecture The storage and compute capabilities of a cluster are organized in two ways: Co-locate storage and compute in the same node Separate storage nodes from compute nodes Provides Naming Service PVFS Metadata Manager Network Compute Nodes I/O Nodes
30
Architectures Client-Server Distributed File Systems
Cluster-Based Distributed File Systems Symmetric Distributed File Systems
31
Ivy Fully symmetric organizations that are based on the peer-to-peer technology also exist Most current proposals use a DHT-based system for distributing data, combined with a key-based lookup mechanism As an example, Ivy is a distributed file system that is built using the Chord DHT-based system Data storage in Ivy is realized through a block-oriented distributed storage called DHash
32
Naming is provided through Chord!
Ivy Architecture Ivy consists of 3 separate layers: Node where a file system is rooted File System Layer Ivy Ivy Ivy Block-Oriented Storage DHash DHash DHash DHT Layer Chord Chord Chord Naming is provided through Chord!
33
Ivy Ivy implements an NFS-like semantics
To increase “availability” and improve “performance”, Ivy: Replicates every block B to the k immediate successors of the server responsible for storing B Caches looked up blocks along the route that the lookup request followed Ivy uses two kinds of data blocks: Content-hash blocks: A block has an associated key, which is computed as the secure hash of the block’s content Public-key blocks: A block has a public key as lookup key, and whose content has been signed with the associated private key
34
DFS Aspects Aspect Description Aspect Description Aspect Description
Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs?
35
Stateless Processes Cooperating processes in DFSs are usually the storage servers and file manager(s) The most important aspect concerning DFS processes is whether they should be stateless or stateful Stateless Processes: Do not require that servers maintain any client state After a server crashes, there is no need to enter a recovery phase to bring the server to a previous state Locking a file cannot be easily done Example: NFSv3
36
Stateful Processes Stateful Processes:
Require that a server maintains some client state Clients can make effective use of caches, but this would require a cache consistency protocol Provide servers with an ability to support callbacks (i.e., the ability to do RPC to a client) in order to keep track of their clients Example: NFSv4 With a stateful approach clients can make effective use of caches. This requires an efficient cache consistency protocol. Such protocols often work best in collaboration with a server that maintains some information on files as used by its clients. For example, a server may associate a lease with each file it hands out to a client, promising to give the client exclusive read and write access until the lease expires or is refreshed.
37
DFS Aspects Aspect Description Aspect Description Aspect Description
Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs?
38
Communication in DFSs Communication in DFSs is mainly based on remote procedure calls (RPCs) The main reason for choosing RPC is to make the system independent from underlying OSs, networks, and transport protocols GFS uses RPC and may break a read into multiple RPCs to increase parallelism PVFS currently uses TCP for all its internal communication In NFS, all communication between a client and server proceeds along the Open Network Computing RPC (ONC RPC)
39
RPCs in NFS Client Server Lookup Up until NFSv4, the client was made responsible for making the server’s life as easy as possible by keeping requests simple The drawback becomes apparent when considering the use of NFS in a wide-area system In that case, the extra latency of a second RPC leads to performance degradation To circumvent such a problem, NFSv4 supports compound procedures Lookup name Read Time Read file data Client Server Lookup Open Read Lookup name Open file Time Read file data
40
DFS Aspects Aspect Description Aspect Description Aspect Description
Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs? Aspect Description Architecture How are DFSs generally organized? Processes Who are the cooperating processes? Are processes stateful or stateless? Communication What is the typical communication paradigm followed by DFSs? How do processes in DFSs communicate? Naming How is naming often handled in DFSs? Synchronization What are the file sharing semantics adopted by DFSs? Consistency and Replication What are the various features of client-side caching as well as server-side replication? Fault Tolerance How is fault tolerance handled in DFSs?
41
Unix Semantics In Single Processor Systems
Synchronization for file systems would not be an issue if files were not shared When two or more users share the same file at the same time, it is necessary to define the semantics of reading and writing In single processor systems, a read operation after a write will return the value just written Such a model is referred to as Unix Semantics Single Machine Original File a b Process A Write “c” a b c Process B Read gets “abc”
42
Unix Semantics In DFSs In a DFS, Unix semantics can be achieved easily if there is only one file server and clients do not cache files Hence, all reads and writes go directly to the file server, which processes them strictly sequentially This approach provides UNIX semantics, however, performance might degrade as all file requests must go to a single server
43
Caching and Unix Semantics
The performance of a DFS with one single file server and Unix semantics can be improved through caching If a client, however, locally modifies a cache file and shortly another client reads the file from the server, it will get an obsolete file Client Machine #1 File Server Client Machine #2 1. Read “ab” 3. Read gets “ab” Process A a b a b Process B a b 2. Write “c” a b c
44
Session Semantics (1) One way out of getting an obsolete file is to propagate all changes to cached files back to the server immediately Implementing such an approach is combersome An alternative solution is to relax the semantics of file sharing Session Semantics Changes to an open file are initially visible only to the process that modified the file. Only when the file is closed, the changes are made visible to other processes.
45
Session Semantics (2) Using session semantics raises the question of what happens if two or more clients are simultaneously caching and modifying the same file One solution is to say that as each file is closed in turn, its value is sent back to the server The final result depends on whose close request is most recently processed by the server A less pleasant solution, but easier to implement, is to say that the final result is one of the candidates and leave the choice of the candidate unspecified
46
Immutable Semantics A different approach to the semantics of file sharing in DFSs is to make all files immutable With immutable semantics there is no way to open a file for writing What is possible is to create an entirely new file Hence, the problem of how to deal with two processes, one writing and the other reading, just disappears
47
Atomic Transactions A different approach to the semantics of file sharing in DFSs is to use atomic transactions where all changes occur atomically A key property is that all calls contained in a transaction will be carried out in-order 1 A process first executes some type of BEGIN_TRANSACTION primitive to signal that what follows must be executed indivisibly 2 Then come system calls to read and write one or more files 3 When done, an END_TRANSACTION primitive is executed
48
Semantics of File Sharing: Summary
There are four ways of dealing with the shared files in a DFS: Method Comment UNIX Semantics Every operation on a file is instantly visible to all processes Session Semantics No changes are visible to other processes until the file is closed Immutable Files No updates are possible; simplifies sharing and replication Transactions All changes occur atomically Method Comment UNIX Semantics Every operation on a file is instantly visible to all processes Session Semantics No changes are visible to other processes until the file is closed Immutable Files No updates are possible; simplifies sharing and replication Transactions All changes occur atomically Method Comment UNIX Semantics Every operation on a file is instantly visible to all processes Session Semantics No changes are visible to other processes until the file is closed Immutable Files No updates are possible; simplifies sharing and replication Transactions All changes occur atomically Method Comment UNIX Semantics Every operation on a file is instantly visible to all processes Session Semantics No changes are visible to other processes until the file is closed Immutable Files No updates are possible; simplifies sharing and replication Transactions All changes occur atomically Method Comment UNIX Semantics Every operation on a file is instantly visible to all processes Session Semantics No changes are visible to other processes until the file is closed Immutable Files No updates are possible; simplifies sharing and replication Transactions All changes occur atomically
49
Now… Final Exam Overview
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.