Active replication for fault tolerance

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
1 Linearizability (p566) the strictest criterion for a replication system  The correctness criteria for replicated objects are defined by referring to.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 Chapter 14: Replication From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 3, © Addison-Wesley 2001 Presentation.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
CSS490 Replication & Fault Tolerance
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems Course Replication 14.1 Introduction to replication 14.2 System model and group communication 14.3 Fault-tolerant services 14.4 Highly.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
DISTRIBUTED SYSTEMS II REPLICATION CNT. Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
Practical Byzantine Fault Tolerance
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
DISTRIBUTED SYSTEMS II AGREEMENT - COMMIT (2-3 PHASE COMMIT) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Fault Tolerant Services
1 Distribuerede systemer og sikkerhed – 21. marts 2002 zFrom Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design zEdition 3, © Addison-Wesley.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 13: Replication Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Transactions on Replicated Data Steve Ko Computer Sciences and Engineering University at Buffalo.
Fault Tolerance (2). Topics r Reliable Group Communication.
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.
Coordination and Agreement
CSE 486/586 Distributed Systems Consistency --- 2
The consensus problem in distributed systems
Distributed Systems – Paxos
Chapter 14: Replication Introduction
Distributed systems II Replication Cnt.
Outline Announcements Fault Tolerance.
CSE 486/586 Distributed Systems Consistency --- 1
Replication Improves reliability Improves availability
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
Slides for Chapter 15: Replication
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems Course Replication
Lecture 21: Replication Control
CSE 486/586 Distributed Systems Consistency --- 2
Last Class: Fault Tolerance
Presentation transcript:

13.3.2. Active replication for fault tolerance What sort of system do we need to perform totally ordered reliable multicast? 13.3.2. Active replication for fault tolerance the RMs are state machines all playing the same role and organised as a group. all start in the same state and perform the same operations in the same order so that their state remains identical If an RM crashes it has no effect on performance of the service because the others continue as normal It can tolerate byzantine failures because the FE can collect and compare the replies it receives the RMs process each request identically and reply a FE multicasts each request to the group of RMs RM C FE RM FE C Requires totally ordered reliable multicast so that all RMs perfrom the same operations in the same order RM Figure 14.5 • 1

Active replication - five phases in performing a client request FE attaches a unique id and uses totally ordered reliable multicast to send request to RMs. FE can at worst, crash. It does not issue requests in parallel Coordination the multicast delivers requests to all the RMs in the same (total) order. Execution every RM executes the request. They are state machines and receive requests in the same order, so the effects are identical. The id is put in the response Agreement no agreement is required because all RMs execute the same operations in the same order, due to the properties of the totally ordered multicast. Response FEs collect responses from RMs. FE may just use one or more responses. If it is only trying to tolerate crash failures, it gives the client the first response. • 2

Active replication - discussion As RMs are state machines we have sequential consistency due to reliable totally ordered multicast, the RMs collectively do the same as a single copy would do it works in a synchronous system in an asynchronous system reliable totally ordered multicast is impossible – but failure detectors can be used to work around this problem. this replication scheme is not linearizable because total order is not necessarily the same as real-time order To deal with byzantine failures For up to f byzantine failures, use 2f+1 RMs FE collects f+1 identical responses To improve performance, FEs send read-only requests to just one RM • 3

Summary for Sections 14.1-14.3 Replicating objects helps services to provide good performance, high availability and fault tolerance. system model - each logical object is implemented by a set of physical replicas linearizability and sequential consistency can be used as correctness criteria sequential consistency is less strict and more practical to use fault tolerance can be provided by: passive replication - using a primary RM and backups, but to achieve linearizability when the primary crashes, view-synchronous communication is used, which is expensive. Less strict variants can be useful. active replication - in which all RMs process all requests identically needs totally ordered and reliable multicast, which can be achieved in a synchronous system • 4

Safety Linearizability [Herlihy & Wing, 1990] For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. The observed effects should be consistent with a sequential execution of the operations in that order. O2 O3 O1 30/12/2018

Safety For each operation invocation there must be one single time instant during its duration where the operation appears to take effect. The observed effects should be consistent with a sequential execution of the operations in that order. O2 O3 O1 O1:enq[2] O2: deq[]3 O3: enq[3] 30/12/2018

Linearizability (p566) the strictest criterion for a replication system linearizability is not intended to be used with transactional replication systems The real-time requirement means clients should receive up-to-date information but may not be practical due to difficulties of synchronizing clocks a weaker criterion is sequential consistency The correctness criteria for replicated objects are defined by referring to a virtual interleaving which would be correct Consider a replicated service with two clients, that perform read and update operations. A client waits for one operation to complete before doing another. Client operations o10, o11, o12 and o20, o21, o22 at a single server are interleaved in some order e.g. o20, o21, o10, o22 , o11, o12 (client 1 does o10 etc) a replicated object service is linearizable if for any execution there is some interleaving of clients’ operations such that: the interleaved sequence of operations meets the specification of a (single) correct copy of the objects the order of operations in the interleaving is consistent with the real time at which they occurred For any set of client operations there is a virtual interleaving (which would be correct for a set of single objects). Each client sees a view of the objects that is consistent with this, that is, the results of clients operations make sense within the interleaving the bank example did not make sense: if the second update is observed,the first update should be observed too. virtual interleaving does not necessarily occur at any one RM • 7

Sequential consistency (p567) it is not linearizable because client2’s getBalance is after client 1’s setBalance in real time. a replicated shared object service is sequentially consistent if for any execution there is some interleaving of clients’ operations such that: the interleaved sequence of operations meets the specification of a (single) correct copy of the objects the order of operations in the interleaving is consistent with the program order in which each client executed them the following is sequentially consistent but not linearizable Client 1: Client 2: setBalanceB(x,1) getBalanceA(y)  getBalanceA(x)  setBalanceA(y,2) this is possible under a naive replication strategy, even if neither A or B fails - the update at B has not yet been propagated to A when client 2 reads it possible under naive replication strategy, even if neither A or B fails update at B not yet propagated to A when client 2 reads it it is not linearizable because client2’s getBalance is after client 1’s set balance in real time. but the following interleaving: getBalanceAA(y)-> 0 getBalanceAA(x) -> 0 setBalanceBA(x,1) setBalanceAA(x,1) satisfies both criteria for sequential consistency are satisfied but the following interleaving satisfies both criteria for sequential consistency : getBalanceA(y) 0; getBalanceA(x )  0; setBalanceB(x,1); setBalanceA(y,2) • 8

The passive (primary-backup) model for fault tolerance The FE has to find the primary, e.g. after it crashes and another takes over The passive (primary-backup) model for fault tolerance FE C RM Primary Backup Figure 14.4 secondaries are called ‘back-up’ or ‘slave’ There is at any time a single primary RM and one or more secondary (backup, slave) RMs FEs communicate with the primary which executes the operation and sends copies of the updated data to the result to backups if the primary fails, one of the backups is promoted to act as the primary • 9

Passive (primary-backup) replication. Five phases. The five phases in performing a client request are as follows: 1. Request: a FE issues the request, containing a unique identifier, to the primary RM 2. Coordination: the primary performs each request atomically, in the order in which it receives it relative to other requests it checks the unique id; if it has already done the request it re-sends the response. 3. Execution: The primary executes the request and stores the response. 4. Agreement: If the request is an update the primary sends the updated state, the response and the unique identifier to all the backups. The backups send an acknowledgement. 5. Response: The primary responds to the FE, which hands the response back to the client. • 10

Passive (primary-backup) replication (discussion) This system implements linearizability, since the primary sequences all the operations on the shared objects If the primary fails, the system is linearizable, if a single backup takes over exactly where the primary left off, i.e.: the primary is replaced by a unique backup surviving RMs agree which operations had been performed at take over view-synchronous group communication can achieve this when surviving backups receive a view without the primary, they use an agreed function to calculate which is the new primary. The new primary registers with name service view synchrony also allows the processes to agree which operations were performed before the primary failed. E.g. when a FE does not get a response, it retransmits it to the new primary The new primary continues from phase 2 (coordination -uses the unique identifier to discover whether the request has already been performed. if two different clients use different backups to take over, then the system could behave incorrectly the requirements are met if view-synchronous group communication is used, but that is expensive If using view synchronous group communication, the surviving backups receive a view without the primary, they use an agreed function to calculate who is primary. New primary registers with name service View synchrony also allows processes to agree which operations were performed before primary failed. • 11

View-synchronous Group Communication Systems with dynamic groups extend this model by providing explicit join and leave operations to adapt the group membership over time. Moreover, such systems can exclude faulty servers automatically from the membership. Still, reaching agreement on the group membership in the presence of failures is not trivial. Two approaches have been considered: 1. Run a consensus protocol among the all previous group members to agree on the future group membership. This is the canonical approach, tolerates further failures during the membership change, but involves the potentially expensive consensus primitive. 2. Integrate consensus with the membership protocol and run it only among the (hopefully) correct members. Since this consensus algorithm needs not tolerate failures, it can be simpler; but because further failures may still occur, it provides different guarantees. The second approach is taken by view-synchronous group communication systems and related group membership algorithms. 12

Discussion of passive replication To survive f process crashes, f+1 RMs are required it cannot deal with byzantine failures because the client can't get replies from the backup RMs To design passive replication that is linearizable View synchronous communication has relatively large overheads Several rounds of messages per multicast After failure of primary, there is latency due to delivery of group view variant in which clients can read from backups which reduces the work for the primary get sequential consistency but not linearizability Sun NIS uses passive replication with weaker guarantees Weaker than sequential consistency, but adequate to the type of data stored achieves high availability and good performance Master receives updates and propagates them to slaves using 1-1 communication. Clients can uses either master or slave updates are not done via RMs - they are made on the files at the master • 13