CS 582 / CMPE 481 Distributed Systems Replication.

Slides:



Advertisements
Similar presentations
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Advertisements

Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Consistency and Replication (3). Topics Consistency protocols.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
1 Linearizability (p566) the strictest criterion for a replication system  The correctness criteria for replicated objects are defined by referring to.
DISTRIBUTED SYSTEMS II REPLICATION –QUORUM CONSENSUS Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems Spring 2009
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
CS 582 / CMPE 481 Distributed Systems Communications (cont.)
CSS490 Replication & Fault Tolerance
Distributed Systems Fall 2011 Gossip and highly available services.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
6.4 Data And File Replication Presenter : Jing He Instructor: Dr. Yanqing Zhang.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 18: Replication.
Slides for Chapter 14: Replication From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
DISTRIBUTED SYSTEMS II REPLICATION CNT. Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Practical Byzantine Fault Tolerance
Group Communication Group oriented activities are steadily increasing. There are many types of groups:  Open and Closed groups  Peer-to-peer and hierarchical.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Fault Tolerant Services
CIS825 Lecture 2. Model Processors Communication medium.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Distributed Systems Topic 5: Time, Coordination and Agreement
Providing High Availability Using Lazy Replication Rivaka Ladin, Barbara Liskov, Liuba Shrira, Sanjay Ghemawat Presented by Huang-Ming Huang.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Fault Tolerance (2). Topics r Reliable Group Communication.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Distributed Systems Lecture 6 Global states and snapshots 1.
1 Highly available services  we discuss the application of replication techniques to make services highly available. –we aim to give clients access to.
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
6.4 Data and File Replication
COT 5611 Operating Systems Design Principles Spring 2012
Distributed systems II Replication Cnt.
Replication Improves reliability Improves availability
Active replication for fault tolerance
EEC 688/788 Secure and Dependable Computing
Slides for Chapter 15: Replication
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Slides for Chapter 18: Replication
COT 5611 Operating Systems Design Principles Spring 2014
Presentation transcript:

CS 582 / CMPE 481 Distributed Systems Replication

Class Overview Group Model Replication Model Request Ordering Case study –Lazy replication

Group Model Definition –a set of one or more objects, joined through a common interface, acting as a single unit for purposes of naming and function Group types –Replicate replicated members for highly availability and/or reliability –primary/stand-by –modular redundant –Partition members executing a common job in a divided manner –e.g. highly parallel array processing –Aggregate non-replicated members sharing the provision of the service defined by group’s interface –e.g. group conferencing

Why Replication Purpose –increase availability, dependability and/or performance without knowledge of replica visibility Replication transparency –hiding the replication of state in a system active vs. primary/stand-by replicas generic functions: active and passive replication mechanisms

Architectural Model FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM

Replication Model Replication model spectrum –Consistency totally synchronous model –complete synchronization among replicas asynchronous model –asynchronous update of replicas - that is, allow temporal inconsistency among replicas most replication models are somewhere between these two models –Purpose Availability –performance improvement Reliability –survive failure of replica(s)

Request Ordering Why ordering is concerned? –concurrent execution of update requests at replicas may result in inconsistency among replicated data –serial equivalence of update requests is required expense of ordering should also be considered Ordering requirements –total ordering requests are processed in the same order at all replicas –causal ordering causally related requests are only ordered at all replicas –sync ordering requests are ordered in sync before or after a certain request at all replicas

Request Ordering (cont) Request handling at replicas –every request is held-back until ordering constraints can be met –request is defined to be stable at a replica once no request from a client and bearing a lower unique identifier can be subsequently delivered to replica; that is, all prior requests have been processed Request ordering implementation –group communication ISIS –exchanging gossip messages among replicas lazy replication

Request Ordering (cont) Properties for request ordering –Safety no message will be delivered out of order from hold- back queue to processing queue once a message has been in the processing queue, no prior request should be in there –Liveness no message should wait indefinitely in hold-back queue

Request Ordering (cont) Total ordering implementation –requires a mechanism to uniquely sequence each request, which enables sequential ordering among messages –unique id generation sequencer approach –request id is generated by a designated process, sequencer –every request is sent to the sequence which assigns a unique id being incremented monotonically and forwards the request to replicas –sequence may become performance bottleneck and point of failure data update protocol approach –token holder sends a request with a temporary id to all replicas –each replica (site i) replies with a new id of max(temp id, id) i/N; token holder selects largest id among proposed id from all replicas and uses it as the agreed id –token holder notifies all replicas of the final id; replica readjusts the message’s position at hold-back queue

Request Ordering (cont) Causal ordering implementation –requires a mechanism to enable causally related requests to be ordered –vector timestamp approach all replicas p i initializes VT i (vector time) to zeros when p i generates a new event, it increments VT i [i] by 1; it attaches the value vt = VT i on outgoing messages when p j handles a request with timestamp vt, it updates it vector clock such as Vt j = merge (Vt j, vt)