CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.

Slides:



Advertisements
Similar presentations
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
Advertisements

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Today: –Introduction –Consistency models Data-centric consistency.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
CS-550: Distributed File Systems [SiS]1 Resource Management in Distributed Systems: Distributed File Systems.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Computer Science Lecture 14, page 1 CS677: Distributed OS Consistency and Replication Today: –Introduction –Consistency models Data-centric consistency.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client-Centric.
September 24, 2007The 3 rd CSAIL Student Workshop Byzantine Fault Tolerant Cooperative Caching Raluca Ada Popa, James Cowling, Barbara Liskov Summer UROP.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
PRASHANTHI NARAYAN NETTEM.
CSE 490dp Resource Control Robert Grimm. Problems How to access resources? –Basic usage tracking How to measure resource consumption? –Accounting How.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
1 The Google File System Reporter: You-Wei Zhang.
1 6.4 Distribution Protocols Different ways of propagating/distributing updates to replicas, independent of the consistency model. First design issue.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 7: Domain Name System.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Distributed File Systems
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Computer Science Lecture 14, page 1 CS677: Distributed OS Last Class: Concurrency Control Concurrency control –Two phase locks –Time stamps Intro to Replication.
JINI Coordination-Based System By Anthony Friel * David Kiernan * Jasper Wood.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
What if? or Combining different ideas J. F. Pâris.
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
System Software Lab. A Scalable Web Cache Consistency Architecture Kim Sangyup SSLAB. EE. KAIST SIGCOMM ’ 99 Haobo Yu, Lee Breslau.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Fault Tolerance (2). Topics r Reliable Group Communication.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Chapter Five Distributed file systems. 2 Contents Distributed file system design Distributed file system implementation Trends in distributed file systems.
CS6320 – Performance L. Grewe.
The University of Adelaide, School of Computer Science
Consistency in Distributed Systems
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
CMSC 611: Advanced Computer Architecture
Distributed Systems CS
Outline Announcements Fault Tolerance.
Outline Midterm results summary Distributed file systems – continued
Replication Improves reliability Improves availability
EECS 498 Introduction to Distributed Systems Fall 2017
Cary G. Gray David R. Cheriton Stanford University
Chapter 2: Operating-System Structures
Replica Placement Model: We consider objects (and don’t worry whether they contain just data or code, or both) Distinguish different processes: A process.
Today: Distributed File Systems
The University of Adelaide, School of Computer Science
Database System Architectures
Chapter 2: Operating-System Structures
The University of Adelaide, School of Computer Science
Presentation transcript:

CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05

CS 5204 (FALL 2005) 2 Problems with Distributed System Caching Caching is a mechanism to improve the speed and performance of the system by replicating copies of frequently accessed data. But with the added cost of : Complexity of ensuring consistency  With large number of clients. One client can modify data without consent of other client, leading other clients to provide stale data to its user. Complication of communication and host failures  Message lost due to failures

CS 5204 (FALL 2005) 3 Goals of the presentation Introduction to leases How leases are implemented for cache consistency An Analytical model for lease term evaluation Optimization in lease management Related work on distributed cache consistency Conclusion

CS 5204 (FALL 2005) 4 Leasing In General Term A lease is a contract that gives its holder specified rights over property for a limited period of time In Caching Term A lease is a promise by the server that it will push updates to the client as well as grants control over writes to the covered datum for a specified time

CS 5204 (FALL 2005) 5 Example: Operation of file cache using leases for read File Server Cache Client Cache File Server Client Cache File Server Lease of 10 sec grantedAfter 5 secAfter 10 sec

CS 5204 (FALL 2005) 6 Example: Operation of file cache using leases for write File Server Cache Client Cache Client

CS 5204 (FALL 2005) 7 Implementation of Leases A lease is granted to the data when fetched from the server for the first time Subsequent fetch from cache during lease Server needs approval of the leaseholder After lease expiration, it needs to be renewed to access data as well as data should be updated if modified

CS 5204 (FALL 2005) 8 Simple Analytical model Effective term at Cache t c = max(0,t s – (m prop + 2m proc ) - ε) Time to gain approval t a = 2m prop + (S + 2)m proc Symbols  Descriptions S  number of caches in which the file is shared m(prop)  propagation delay for the message m(proc)  time to process a message (send or receive) ε  allowance for uncertainty in clocks t s  lease term (at server) Server CnCn C5C5 C4C4 C3C3 C2C2 C1C1

CS 5204 (FALL 2005) 9 Simple Analytical Model (Cont..) Load at the server Average delay due to each read or write Server CnCn C5C5 C4C4 C3C3 C2C2 C1C1 S  number of caches in which the file is shared N  number of clients (caches) R  Rate of reads for each client W  Rate of writes for each client

CS 5204 (FALL 2005) 10 Simple Analytical Model (Cont..)  If where then 2NR  Load due to zero lease term  Larger values of α and R implies better performance for short term lease

CS 5204 (FALL 2005) 11 Simple Analytical Model (Cont..)

CS 5204 (FALL 2005) 12 Lease Management Request lease extension before the covered file is accessed  Improves responses time by eliminating the added delay for reads but with the cost of increased level of contention due to false sharing By using smaller number of leases to cover installed files such as one per major directory, and multicasting an extension to all clients periodically eliminates …  Need for clients to request extensions of these lease.  A lease from the multicast extension when a file covered by the lease is to be modified by the server

CS 5204 (FALL 2005) 13 Leases Management (Cont..) Need for the server to contact a large number of clients when an installed file is updated Need for the server to keep track of the leaseholder for installed files Added delay at the client cache for reads of installed files because, in the absence of writes to installed files, these lease do not expire Server can set the lease term based on the file access characteristics for the requested files as well as the propagation delay of the client

CS 5204 (FALL 2005) 14 Fault Tolerance A system continue to provide service even in the presence of faults Leases ensure consistency in spite of message loss and client or server failures, provided that the hosts and network do not suffer certain Byzantine failures including clock failure Leases depends on well behaved clock that have a known bound drift and any related failure is detected either by synchronization protocol or by including explicit timestamps in lease related messages

CS 5204 (FALL 2005) 15 Related Work on Distributed Systems cache consistency Most of the pervious work for consistency have used zero lease term and infinite lease term zero lease term: no guarantee of updating data after data is leased to the client infinite lease term: Always guarantees of updating Overall Consistency work with caching shared memory systems has ignored the problem of communication and cache failures to date.

CS 5204 (FALL 2005) 16 Related Work problems of Distributed Systems cache consistency (Cont..) Jini : A coordinate based distributed system  Management of reference objects is done by reference object itself to form a reference list.  When a process writes a tuple to a JavaSpace, a lease is returned specifying the length of time (caller’s choice) the tuple will be stored before been destroyed  Lease is extensively used to handle situation of referring processes crash  When lists becomes empty, the object can safely destroy itself

CS 5204 (FALL 2005) 17 Conclusion (Advantage) Short term leases have a number of significant advantages over longer leases, including  lower write delays resulting from client crashes  lower recovery delay from server crashes  reduced false sharing. Leasing is well suited for large scale distributed system with faster processors and higher delay networks Lease overhead of handling large number of clients can be reduced by distinguishing different classes of files based on access characteristics Lease provide strict consistency in spite of non Byzantine failures, including partitions.

CS 5204 (FALL 2005) 18 Conclusion (Limitation) Used simplified model of file sharing and focused on relatively low degree of sharing Analysis of performance is approximate as it ignores important factors such as queuing delay Limited experience of leases on actual system operation

CS 5204 (FALL 2005) 19 Questions ??