DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CPSC 668Set 19: Asynchronous Solvability1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
CPSC 668Set 5: Synchronous LE in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
The Byzantine Generals Problem (M. Pease, R. Shostak, and L. Lamport) January 2011 Presentation by Avishay Tal.
CPSC 668Set 4: Asynchronous Lower Bound for LE in Rings1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
CPSC 668Set 3: Leader Election in Rings1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Systems of Distributed systems Module 2 - Distributed algorithms Teaching unit 2 – Properties of distributed algorithms Ernesto Damiani University of Bozen.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 1: Introduction 1.
1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski Lectures on Parallel and Distributed.
Distributed Algorithms Lecture 10b – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks – Probabilistic Consensus.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Several sets of slides by Prof. Jennifer Welch will be used in this course. The slides are mostly identical to her slides, with some minor changes. Set.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE 668 Set 3: Leader Election in Rings 1.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
1 SECOND PART Algorithms for UNRELIABLE Distributed Systems: The consensus problem.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
The consensus problem in distributed systems
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Alternating Bit Protocol
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
6.852: Distributed Algorithms Spring, 2008
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Presentation transcript:

DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1

Processor Failures in Message Passing Set 9: Fault Tolerant Consensus 2  Crash: at some point the processor stops taking steps  at the processor's final step, it might succeed in sending only a subset of the messages it is supposed to send  Byzantine: processor changes state arbitrarily and sends messages with arbitrary content

Consensus Problem Set 9: Fault Tolerant Consensus 3  Every processor has an input.  Termination: Eventually every nonfaulty processor must decide on a value.  decision is irrevocable!  Agreement: All decisions by nonfaulty processors must be the same.  Validity: If all inputs are the same, then the decision of a nonfaulty processor must equal the common input.

Examples of Consensus Set 9: Fault Tolerant Consensus 4  Binary inputs:  input vector 1,1,1,1,1 decision must be 1  input vector 0,0,0,0,0 decision must be 0  input vector 1,0,0,1,0 decision can be either 0 or 1  Multi-valued inputs:  input vector 1,2,3,2,1 decision can be 1 or 2 or 3

Overview of Consensus Results Set 9: Fault Tolerant Consensus 5  Synchronous system  At most f faulty processors  Tight bounds for message passing: crash failuresByzantine failures number of rounds f + 1 total number of processors f + 13f + 1 message sizepolynomial

Overview of Consensus Results Set 9: Fault Tolerant Consensus 6  Impossible in asynchronous case.  Even if we only want to tolerate a single crash failure.  True both for message passing and shared read- write memory.

Modeling Crash Failures Set 9: Fault Tolerant Consensus 7  Modify failure-free definitions of admissible execution to accommodate crash failures:  All but a set of at most f processors (the faulty ones) taken an infinite number of steps.  In synchronous case: once a faulty processor fails to take a step in a round, it takes no more steps.  In a faulty processor's last step, an arbitrary subset of the processor's outgoing messages make it into the channels.

Modeling Byzantine Failures Set 9: Fault Tolerant Consensus 8  Modify failure-free definitions of admissible execution to accommodate Byzantine failures:  A set of at most f processors (the faulty ones) can send messages with arbitrary content and change state arbitrarily (i.e., not according to their transition functions).

Consensus Algorithm for Crash Failures Set 9: Fault Tolerant Consensus 9 Code for each processor: v := my input at each round 1 through f+1: if I have not yet sent v then send v to all wait to receive messages for this round v := minimum among all received values and current value of v if this is round f+1 then decide on v

Execution of Algorithm Set 9: Fault Tolerant Consensus 10  round 1: Relation to Formal Model  send my input in channels initially  receive round 1 msgs deliver events  compute value for v compute events  round 2:  send v (if this is a new value) due to previous compute events  receive round 2 msgs deliver events  compute value for v compute events  …  round f + 1:  send v (if this is a new value) due to previous compute events  receive round f + 1 msgs deliver events  compute value for v compute events  decide v part of compute events

Correctness of Crash Consensus Algorithm Set 9: Fault Tolerant Consensus 11 Termination: By the code, finish in round f+1. Validity: Holds since processors do not introduce spurious messages: if all inputs are the same, then that is the only value ever in circulation.

Correctness of Crash Consensus Algorithm Set 9: Fault Tolerant Consensus 12 Agreement:  Suppose in contradiction p j decides on a smaller value, x, than does p i.  Then x was hidden from p i by a chain of faulty processors:  There are f + 1 faulty processors in this chain, a contradiction. q1q1 q2q2 qfqf q f+1 pjpj pipi round 1 round 2 round f round f+1 …

Performance of Crash Consensus Algorithm Set 9: Fault Tolerant Consensus 13  Number of processors n > f  f + 1 rounds  at most n 2 |V| messages, each of size log|V| bits, where V is the input set.

Lower Bound on Rounds Set 9: Fault Tolerant Consensus 14 Assumptions:  n > f + 1  every processor is supposed to send a message to every other processor in every round  Input set is {0,1}

Failure-Sparse Executions Set 9: Fault Tolerant Consensus 15  Bad behavior for the crash algorithm was when there was one crash per round.  This is bad in general.  A failure-sparse execution has at most one crash per round.  We will deal exclusively with failure-sparse executions in this proof.

Valence of a Configuration Set 9: Fault Tolerant Consensus 16  The valence of a configuration C is the set of all values decided by a nonfaulty processor in some configuration reachable from C by an admissible (failure-sparse) execution.  Bivalent: set contains 0 and 1.  Univalent: set contains only one value  0-valent or 1-valent

Valence of a Configuration Set 9: Fault Tolerant Consensus 17 C EFGD  decisions 0/1 : bivalent 1 : 1-valent 0 : 0-valent 0/1 0 1

Statement of Round Lower Bound Set 9: Fault Tolerant Consensus 18 Theorem (5.3): Any crash-resilient consensus algorithm requires at least f + 1 rounds in the worst case. Proof Strategy: show  bivalent initial config. … round 1 round 2 round f - 2 round f - 1 show we can keep things bivalent through round f - 1 round f show we can keep a n.f. proc. from deciding in round f

Existence of Bivalent Initial Config. Set 9: Fault Tolerant Consensus 19  Suppose in contradiction all initial configurations are univalent. inputsvalency 000… …01? 000…11? … 001…11? 011…11? 111…111 by validity condition 0 1

Existence of Bivalent Initial Config. Set 9: Fault Tolerant Consensus 20  Let  I 0 be a 0-valent initial config  I 1 be a 1-valent initial config  s.t. they differ only in p i 's input I0I0  p i fails initially, no other failures. By termination, eventually rest decide. all but p i decide 0 I1I1  This execution looks the same as the one above to all the processors except p i. all but p i decide 0

Keeping Things Bivalent Set 9: Fault Tolerant Consensus 21  Let  ' be a (failure-sparse) k - 1 round execution ending in a bivalent config.  for k - 1 < f - 1  Show there is a one-round (f-s) extension  of  ' ending in a bivalent config.  so  has k < f rounds  Suppose in contradiction every one-round (f-s) extension of  ' is univalent.

Keeping Things Bivalent Set 9: Fault Tolerant Consensus 22 '' failure-free round k 1-val p i crashes 0-val p i fails to send to  p i fails to send to q 1,…,q m p i fails to send to q 1,…,q j+1 p i fails to send to q 1,…,q j rounds 1 to k-1 1-val 0-val bi- val … … now focus in on these two extensions

Keeping Things Bivalent Set 9: Fault Tolerant Consensus 23 '' 1-val 0-val p i fails to send to q 1,…,q j p i fails to send to q 1,…,q j+1 rounds 1 to k-1 round k  n.f. decide 1 n.f. decide 1 q j+1 fails in rd. k+1; no other failures  only q j+1 can tell difference Since k-1 < f-1 and α’ is failure-sparse, less than f-1 procs fail in α’. Even with p i failing in round k, less than f procs have failed. So we can have q j+1 fail in round k+1 without exceeding our budget of f failures.

Cannot Decide in Round f Set 9: Fault Tolerant Consensus 24  We've shown there is an f - 1 round (failure-sparse) execution, call it , ending in a bivalent configuration.  Extending this execution to f rounds might not preserve bivalence.  However, we can keep a processor from explicitly deciding in round f, thus requiring at least one more round (f+1).

Cannot Decide in Round f Set 9: Fault Tolerant Consensus 25 Case 1: There is a 1-round (f-s) extension of  ending in a bivalent config. Then we are done. Case 2: All 1-round (f-s) extensions of  end in univalent configs.

Cannot Decide in Round f Set 9: Fault Tolerant Consensus 26  0-val p i fails to send to nf p j rounds 1 to f-1 1-val round f failure free bi- val. p i fails to send to nf p j, sends to another nf p k p i might send to p k p i sends to p j and p k look same to p k look same to p j p k either undecided or decided 1 p j either undecided or decided 0