11.6 Distributed File Systems ------ Consistency and Replication Xiaolong Wu Instructor: Dr Yanqing Zhang Advanced Operating System.

11.6 Distributed File Systems ------ Consistency and Replication Xiaolong Wu Instructor: Dr Yanqing Zhang Advanced Operating System

Outline Client-Side Caching Server-Side Replication Replication in P2P Systems Latest relevant knowledge (scalability) Future work

Caching in a DFS Caching in any DFS reduces access delays due to disk access times or network latency. Caches can be located in the main memory of either the server or client and/or in the disk of the client Client-side caching (memory or disk) offers most performance benefits, but also leads to potential inconsistencies. However, because in practice file sharing is relatively rare, client-side caching remains a popular way to improve performance in a DFS.

Caching in NFS NFSv3 did not define a caching protocol. Individual implementations made decisions “Stale” data could exist for periods ranging from a few seconds to ½ minute NFSv4 made some improvements but many details are still implementation dependent. General structure of NFS cache model follows

What Do Clients Cache? File data blocks File handles – for future reference Directories Two approaches to caching in NFS Caching with server control Caching with open delegation

Caching Data with Server Control The simplest approach to caching allows the server to retain control over the file. Procedure Client opens file Data blocks are transferred to the client (by read ops) Client can read and write data in the cache. When the file closes, flush changes back to server If a new client on the same machine opens a file after it has been closed, the client cache manager usually must validate local cached data with the server If the data is stale, replace it.

Caching With Open Delegation Allows a client machine to handle some local open and close operations from other clients on the same machine. Normally the server decides if a client can open a file Delegation can improve performance by limiting contact with the server The client machine gets a copy of the entire file, not just certain blocks.

Open delegation – Examples* Suppose a client machine has opened a file for writing, and has been delegated rights to control the file locally. If another local client tries to lock the file, the local machine can decide whether or not to grant the lock If a remote client tries to lock the file (at the server) the server will deny file access If a client has opened the file for reading only, local clients desiring write privileges must still contact the server.

Delegation and Callbacks Server may need to “undelegate” the file – perhaps when another client needs to obtain access. This can be done with a callback, which is essentially an RPC from server to client. Callbacks require the server to maintain state (knowledge) about clients – a reason for NFS to be stateful.

Caching Attributes* Clients can cache attributes as well as data. (size of file, number of links, last date modified, etc.) Cached attributes are kept consistent by the client, if at all No guarantee that the same file cached at two sites will have the same attributes at both sites Attribute modifications should be written through to the server (write through cache coherence policy), although there’s no requirement to do so

Leases* Lease: cached data is automatically invalidated after a certain period of time. Applies to file attributes, file handles (mapping of name to file handle), directories, and sometimes data. When lease expires, must renew data from server Helps with consistency and protects against errors.

Coda ----- A Prototype Distributed File System Developed at CMU – M. Satarayanan Started in 1987 as an improvement on the Andrew file system ( a classic research FS) Andrew strongly influenced NFSv4 and some versions of Linux Most recent version of Coda (6.9.4) was released 1/05/2009 (http://www.coda.cs.cmu.edu/news.html )http://www.coda.cs.cmu.edu/news.html

Client-side Caching in Coda Critical, because of Coda’s objectives Caching achieves scalability; provides more fault tolerance for the client in case it is disconnected from the server. When a client opens a file, the entire file is downloaded. This is true for reads and writes.

Coda Callbacks Callback promise: server’s commitment to notify client when file changes Callback break: notice from server that the client’s file is stale; called a “break” because it terminates the agreement. There will be no further callbacks unless the client renews it.

Callbacks Consistency A Coda callback is an agreement between the server and a client. Server agrees to notify client when a file has been modified by another client, closed, and written back to server. At this time, the client may purge the file from its cache, but it may also continue reading the outdated copy. This is a blend of session and transaction semantics.

11.6.2: Server-Side Replication Caching: replication at the client side. Initiated implicitly by client request Cached data is temporary Unit of caching = a file, or less (usually) Purpose: improved performance Server replication Mainly for fault tolerance & availability May actually degrade performance (overhead) Replicated data is permanent

Caching & Replication in Coda Unit of replication = volume (group of related files) Each volume is stored on several servers, its Volume Storage Group (VSG) Available Volume Storage Group (AVSG) is the set of servers a client can actually reach Maintain Volume Replication consistency: Contact one server to get permission to R/W, contact all when closing an updated file. (ROWA)

Problem Consider a volume is replicated across three serversS1,S2,S3 For A, its AVSG covers S1 and S2. For B, it has access only to server S3 In particular, Both A and B will be allowed to open a file for writing, update their respective copies, and transfer their copy back to the members in their AVSG. There will be different versions of file f stored in the VSG.

Writing in Disconnected Systems Each file has a Coda version vector (CVV), analogous to vector timestamps, one component per server. Starts at (1, 1, 1) Update local component after a file is updated. As long as all servers get all updates, all timestamps will be equal

Detecting Inconsistencies In the previous example, both A and B will be allowed to open a file for writing. When A closes, it will update S1 and S2, but not S3; B will update S3, but not S1, S2. The timestamp at S1 and S2 will be [2, 2, 1]. The timestamp at S3 will be [1, 1, 2]. It is easy to detect the inconsistency, but knowing how to resolve them is application dependent.

Replication in P2P Systems In P2P systems replication is more important because P2P members are less reliable – may leave the system or remove files Load balance is important since there are no designated servers File usage in P2P is different: most files are read only, updates consist of adding new files, so consistency is less of an issue.

Unstructured P2P Systems (each node knows n neighbors) Look-up = search (in structured systems, lookup is directed by some algorithm) Replication speeds up the process How to allocate files to nodes (it may not be possible to force a node to store files) Uniformly distribute n copies across network Allocate more replicas for popular files Users who download files are responsible for sharing them with others (as in BitTorrent)

Structured P2P Systems Replication is used primarily for load balance Possible approaches: Store a replica at each node in the search path (concentrates replicas near the prime copy, but may unbalance some nodes) Store replicas at nodes that request a file, store pointers to it at nodes along the way.

Latest relevant knowledge -----Improve scalability by metadata separation Traditional client/server filesystems (NFS, AFS) have suffered from scalability problems due to their inherent centralization. Modern filesystems replace dumb disks with intelligent object storage devices( OSDs) –which include CPU, NIC and cache. Clients typically interact with a metadata server (MDS) to perform metadata operations (open, rename), while communicating directily with OSDs to perform file I/O (reads and writes). A typical example is Ceph[2], which separate metadata and file I/O and employ an distributed metadata cluster to improve the scalability of metadata access.

Latest relevant knowledge -----Improve scalability by metadata separation Existing filesystems store file metadata on a single server or via a shared-disk to ensure consistency. CalvinFS[3] leverages a high-throughput distribute database system for metadata management in order to design a replicated, scalable filesystem. File metadata can be partitioned and replicated across a cluster of independent severs and operation transformed into distributed transactions

Future works Dynamically adjust the level of replication for individual objects based on workload. By supporting application QoS based on bandwidth and latency guarantees to help balance replication and recovery operations with regular workload.

Reference [1] Coda. Coda release. http://www.coda.cs.cmu.edu/news.htmlhttp://www.coda.cs.cmu.edu/news.html [2] S. Weil, S. Brandt, E. Miller, D. Long, C. Maltzahn Ceph: A scalable,high-performance distributed file system,” in Proc. 7th Symp. OSDI (2006), pp. 307–320 USENIX Association [3] A. Thomson and D. J. Abadi. CalvinFS: consistent wan replication and scalable metadata management for distributed ﬁle systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST), 2015 [4] Xiao, L.; Ren, K.; Zheng, Q. & Gibson, G. A. (2015), ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems. SoCC, ACM, pp. 236-249

Thank you!

11.6 Distributed File Systems ------ Consistency and Replication Xiaolong Wu Instructor: Dr Yanqing Zhang Advanced Operating System.

Similar presentations

Presentation on theme: "11.6 Distributed File Systems ------ Consistency and Replication Xiaolong Wu Instructor: Dr Yanqing Zhang Advanced Operating System."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

11.6 Distributed File Systems ------ Consistency and Replication Xiaolong Wu Instructor: Dr Yanqing Zhang Advanced Operating System.

Similar presentations

Presentation on theme: "11.6 Distributed File Systems ------ Consistency and Replication Xiaolong Wu Instructor: Dr Yanqing Zhang Advanced Operating System."— Presentation transcript:

Similar presentations

About project

Feedback