CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CPSC 668Set 19: Asynchronous Solvability1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
CPSC 668Set 5: Synchronous LE in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
CPSC 668Set 1: Introduction1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 4: Asynchronous Lower Bound for LE in Rings1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 19: Asynchronous Solvability 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 1: Introduction 1.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Distributed Algorithms Lecture 10b – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks – Probabilistic Consensus.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
1 SECOND PART Algorithms for UNRELIABLE Distributed Systems: The consensus problem.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
The consensus problem in distributed systems
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
Agreement Protocols CS60002: Distributed Systems
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
6.852: Distributed Algorithms Spring, 2008
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Presentation transcript:

CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch

CPSC 668Set 9: Fault Tolerant Consensus2 Processor Failures in Message Passing Crash: at some point the processor stops taking steps –at the processor's final step, it might succeed in sending only a subset of the messages it is supposed to send Byzantine: processor changes state arbitrarily and sends messages with arbitrary content

CPSC 668Set 9: Fault Tolerant Consensus3 Consensus Problem Every processor has an input. Termination: Eventually every nonfaulty processor must decide on a value. Agreement: All decisions by nonfaulty processors must be the same. Validity: If all inputs are the same, then the decision of a nonfaulty processor must equal the common input.

CPSC 668Set 9: Fault Tolerant Consensus4 Examples of Consensus Binary inputs: –input vector 1,1,1,1,1 decision must be 1 –input vector 0,0,0,0,0 decision must be 0 –input vector 1,0,0,1,0 decision can be either 0 or 1 Multi-valued inputs: –input vector 1,2,3,2,1 decision can be 1 or 2 or 3

CPSC 668Set 9: Fault Tolerant Consensus5 Overview of Consensus Results Synchronous system At most f faulty processors Tight bounds for message passing: crash failuresByzantine failures number of rounds f + 1 total number of processors f + 13f + 1 message sizepolynomial

CPSC 668Set 9: Fault Tolerant Consensus6 Overview of Consensus Results Impossible in asynchronous case. Even if we only want to tolerate a single crash failure. True both for message passing and shared read-write memory.

CPSC 668Set 9: Fault Tolerant Consensus7 Modeling Crash Failures Modify failure-free definitions of admissible execution to accommodate crash failures: All but a set of at most f processors (the faulty ones) taken an infinite number of steps. –In synchronous case: once a faulty processor fails to take a step in a round, it takes no more steps. In a faulty processor's last step, an arbitrary subset of the processor's outgoing messages make it into the channels.

CPSC 668Set 9: Fault Tolerant Consensus8 Modeling Byzantine Failures Modify failure-free definitions of admissible execution to accommodate Byzantine failures: A set of at most f processors (the faulty ones) can send messages with arbitrary content and change state arbitrarily (i.e., not according to their transition functions).

CPSC 668Set 9: Fault Tolerant Consensus9 Consensus Algorithm for Crash Failures Code for each processor: v := my input at each round 1 through f+1: if I have not yet sent v then send v to all wait to receive messages for this round v := minimum among all received values and current value of v if this is round f+1 then decide on v

CPSC 668Set 9: Fault Tolerant Consensus10 Execution of Algorithm round 1: Relation to Formal Model –send my input in channels initially –receive round 1 msgs deliver events –compute value for v compute events round 2: –send v (if this is a new value) due to previous compute events –receive round 2 msgs deliver events –compute value for v compute events … round f + 1: –send v (if this is a new value) due to previous compute events –receive round f + 1 msgs deliver events –compute value for v compute events –decide v part of compute events

CPSC 668Set 9: Fault Tolerant Consensus11 Correctness of Crash Consensus Algorithm Termination: By the code, finish in round f+1. Validity: Holds since processors do not introduce spurious messages: if all inputs are the same, then that is the only value ever in circulation.

CPSC 668Set 9: Fault Tolerant Consensus12 Correctness of Crash Consensus Algorithm Agreement: Suppose in contradiction p j decides on a smaller value, x, than does p i. Then x was hidden from p i by a chain of faulty processors: There are f + 1 faulty processors in this chain, a contradiction. q1q1 q2q2 qfqf q f+1 pjpj pipi round 1 round 2 round f round f+1 …

CPSC 668Set 9: Fault Tolerant Consensus13 Performance of Crash Consensus Algorithm Number of processors n > f f + 1 rounds at most n 2 |V| messages, each of size log|V| bits, where V is the input set.

CPSC 668Set 9: Fault Tolerant Consensus14 Lower Bound on Rounds Assumptions: n > f + 1 every processor is supposed to send a message to every other processor in every round Input set is {0,1}

CPSC 668Set 9: Fault Tolerant Consensus15 Failure-Sparse Executions Bad behavior for the crash algorithm was when there was one crash per round. This is bad in general. A failure-sparse execution has at most one crash per round. We will deal exclusively with failure- sparse executions in this proof.

CPSC 668Set 9: Fault Tolerant Consensus16 Valence of a Configuration The valence of a configuration C is the set of all values decided by a nonfaulty processor in some configuration reachable from C by an admissible (failure-sparse) execution. Bivalent: set contains 0 and 1. Univalent: set contains only one value –0-valent or 1-valent

CPSC 668Set 9: Fault Tolerant Consensus17 Valence of a Configuration C EFGD <= decisions 0/1 : bivalent 1 : 1-valent 0 : 0-valent 0/1 0 1

CPSC 668Set 9: Fault Tolerant Consensus18 Statement of Round Lower Bound Theorem (5.3): Any crash-resilient consensus algorithm requires at least f + 1 rounds in the worst case. Proof Strategy: show  bivalent initial config. … round 1 round 2 round f - 2 round f - 1 show we can keep things bivalent through round f - 1 round f show we can keep a n.f. proc. from deciding in round f

CPSC 668Set 9: Fault Tolerant Consensus19 Existence of Bivalent Initial Config. Suppose in contradiction all initial configurations are univalent. inputsvalency 000… …01? 000…11? … 001…11? 011…11? 111…111 by validity condition

CPSC 668Set 9: Fault Tolerant Consensus20 Existence of Bivalent Initial Config. Let – I 0 be a 0-valent initial config – I 1 be a 1-valent initial config –s.t. they differ only in p i 's input I0I0  p i fails initially, no other failures. By termination, eventually rest decide. all but p i decide 0 I1I1  This execution looks the same as the one above to all the processors except p i. all but p i decide 0

CPSC 668Set 9: Fault Tolerant Consensus21 Keeping Things Bivalent Let  ' be a (failure-sparse) k-1 round execution ending in a bivalent config. –for k - 1 < f - 1 Show there is a one-round (f-s) extension  of  ' ending in a bivalent config. –so  has k < f rounds Suppose in contradiction every one- round (f-s) extension of  ' is univalent.

CPSC 668Set 9: Fault Tolerant Consensus22 Keeping Things Bivalent '' failure-free round k 1-val p i crashes 0-val p i fails to send to  p i fails to send to q 1,…,q m p i fails to send to q 1,…,q j+1 p i fails to send to q 1,…,q j rounds 1 to k-1 1-val 0-val bi- val … … now focus in on these two extensions

CPSC 668Set 9: Fault Tolerant Consensus23 Keeping Things Bivalent '' 1-val 0-val p i fails to send to q 1,…,q j p i fails to send to q 1,…,q j+1 rounds 1 to k-1 round k  n.f. decide 1 n.f. decide 1 q j+1 fails in rd. k+1; no other failures  only q j+1 can tell difference

CPSC 668Set 9: Fault Tolerant Consensus24 Cannot Decide in Round f We've shown there is an f - 1 round (failure-sparse) execution, call it , ending in a bivalent configuration. Extending this execution to f rounds might not preserve bivalence. However, we can keep a processor from explicitly deciding in round f, thus requiring at least one more round (f+1).

CPSC 668Set 9: Fault Tolerant Consensus25 Cannot Decide in Round f Case 1: There is a 1-round (f-s) extension of  ending in a bivalent config. Then we are done. Case 2: All 1-round (f-s) extensions of  end in univalent configs.

CPSC 668Set 9: Fault Tolerant Consensus26 Cannot Decide in Round f  0-val p i fails to send to nf p j rounds 1 to f-1 1-val round f failure free bi- val. p i fails to send to nf p j, sends to another nf p k p i might send to p k p i sends to p j and p k look same to p k look same to p j p k either undecided or decided 1 p j either undecided or decided 0