Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 7: Consistency 4/13/20151Distributed Systems - COMP 655.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Failure Detection The ping-ack failure detector in a synchronous system satisfies – A: completeness – B: accuracy – C: neither – D: both.
Consistency and Replication Chapter Topics Reasons for Replication Models of Consistency –Data-centric consistency models: strict, linearizable,
DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Consistency and Replication. Replication of data Why? To enhance reliability To improve performance in a large scale system Replicas must be consistent.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CMPT 431 Dr. Alexandra Fedorova Lecture XII: Replication.
Strong Consistency and Agreement COS 461: Computer Networks Spring 2011 Mike Freedman 1 Jenkins,
Replication Management using the State-Machine Approach Fred B. Schneider Summary and Discussion : Hee Jung Kim and Ying Zhang October 27, 2005.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
CSS490 Replication & Fault Tolerance
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.
Replicated Data Management DISHELT FRANCISCO TORRES PAZ.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
DISTRIBUTED SYSTEMS II REPLICATION CNT. Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Practical Byzantine Fault Tolerance
Consistency and Replication. Replication of data Why? To enhance reliability To improve performance in a large scale system Replicas must be consistent.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
DISTRIBUTED SYSTEMS II AGREEMENT - COMMIT (2-3 PHASE COMMIT) Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Fault Tolerant Services
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Fault Tolerance and Replication
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 8: Fault Tolerance and Replication Dr. Michael R. Lyu Computer Science.
Replication and Group Communication. Management of Replicated Data FE Requests and replies C Replica C Service Clients Front ends managers RM FE RM Instructor’s.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Consistency and Replication Chapter 6 Presenter: Yang Jie RTMM Lab Kyung Hee University.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 13: Replication Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
Replication Chapter Katherine Dawicki. Motivations Performance enhancement Increased availability Fault Tolerance.
Reliable multicast Tolerates process crashes. The additional requirements are: Only correct processes will receive multicasts from all correct processes.

Consistency and Replication
Replication and Consistency
Replication and Consistency
Distributed systems II Replication Cnt.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
Replication Improves reliability Improves availability
Active replication for fault tolerance
Fault-tolerance techniques RSM, Paxos
Replication Improves reliability Improves availability
Lecture 21: Replication Control
Fault-Tolerant State Machine Replication
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Lecture 21: Replication Control
Replication and Consistency
Presentation transcript:

Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create the illusion of a single copy.

Updating replicated data F AliceBob F’F’ ’ BobAlice Update and consistency are primary issues. sharedSeparate replicas

Passive replication At most one replica can be the primary server Each client maintains a variable L (leader) that specifies the replica to which it will send requests. Requests are queued at the primary server. Backup servers ignore client requests. primary backup clients L=3

Primary-backup protocol Receive. Receive the request from the client and update the state if appropriate. Broadcast. Broadcast an update of the state to all other replicas. Reply. Send a response to the client. reqreply update client primary backup

Primary-backup protocol If the client fails to get a response due to the crash of the primary, then the request is retransmitted until a backup is promoted to the primary, Failover time is the duration when there is no primary server. req reply update client primary backup ? heartbeat election New primary elected

Active replication Each server receives client requests, and broadcasts them to the other servers. They collectively implement a fault- tolerant state machine.

Fault-tolerant state machine This formalism is based on a survey by Fred Schneider. The clients must receive correct response even if up to m servers fail (either fail-stop or byzantine). For fail-stop, ≥ (m+1) replicas are needed. If a client queries the replicas, the first one that responds gives a correct value. For byzantine failure ≥ (2m+1) replicas are needed. m bad responses can be voted out by the (m+1) good responses. Fault intolerant Fault tolerant

Replica coordination Agreement. Every correct replica receives all the requests. Order. Every correct replica receives the requests in the same order. Agreement part is solved by atomic multicast. Order part is solved by total order multicast. The order part solves the consensus problem where servers will agree about the next update. It requires a synchronous model client server

Agreement With fail-stop processors, the agreement part is solved by reliable atomic multicast. To deal with byzantine failures, an interactive consistency protocol needs to be implemented. Thus, with an oral message protocol, > 3m processors will be required. client server

Order Let timestamps determine the message order. client server A request is stable at a server, when the it does not expect to receive any other client request with a lower timestamp. Assume three clients are trying to update a data, the channels are FIFO, and their timestamps are 20, 30, 42. Each server will update its copy with the value that has the timestamp 20.

Order Let timestamps determine the message order. client server But some clients may not send an update. How long should the server wait? Require clients to send null messages (as heartbeat signals) with some timestamp ts. A message (null, 35) means that the client will not send any update till ts=35. These can be part of periodic hearbeat messages. null

What is replica consistency? clients replica Consistency models define a contract between the data manager and the clients regarding the responses to read and write operations.

Replica Consistency Data Centric Client communicates with the same replica Client centric Client communicates with different replica at different times. This may be the case with mobile clients.

Data-centric Consistency Models 1. Strict consistency 2. Linearizability 3. Sequential consistency 4.Causal consistency 5.Eventual consistency (as in DNS) 6.Weak consistency There are many other models

Strict consistency Strict consistency corresponds to true replication transparency. If one of the processes executes x:= 5 at real time t and this is the latest write operation, then at a real time t’ > t, every process trying to read x will receive the value 5. Too strict! Why? W(x:=5) R(x=5) t t’ p1 p2

Sequential consistency Some interleaving of the local temporal order of events at the different replicas is a consistent trace. W(x:=100) W(x:=99] R(x=100) R(x=99)

Sequential consistency Is sequential consistency satisfied here? W(x:=10) W(x:=8] W(x=20) R(x=20) R(x:=10) R(x=10)

Causal consistency All writes that are causally related must be seen by every process in the same order. W(x:=10) W(x:=20) R(x=10)R(x=20) R(x=10)