Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

The weakest failure detector question in distributed computing Petr Kouznetsov Distributed Programming Lab EPFL.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 13: Impossibility of Consensus All slides © IG.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Byzantine Generals Problem: Solution using signed messages.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Faults and fault-tolerance One of the selling points of a distributed system is that the system will continue to perform even if some components / processes.
Sergio Rajsbaum 2006 Lecture 3 Introduction to Principles of Distributed Computing Sergio Rajsbaum Math Institute UNAM, Mexico.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 11: Asynchronous Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 11: Asynchronous Consensus 1.
For Distributed Algorithms 2014 Presentation by Ziv Ronen Based on “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer,
Consensus and Its Impossibility in Asynchronous Systems.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 10 The Consensus Problem Part of Section 12.5 and Paper: “Impossibility of Distributed Consensus.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch Set 11: Asynchronous Consensus 1.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Consensus, impossibility results and Paxos Ken Birman.
The consensus problem in distributed systems
When Is Agreement Possible
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Lecture 16-A: Impossibility of Consensus
Alternating Bit Protocol
Distributed Consensus
Distributed Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions R eaching agreement about which process has failed Clock phase synchronization Air traffic control system: all aircrafts must have the same view

Problem Specification Termination. Every non-faulty process must eventually decide. Agreement. The final decision of every non-faulty process must be identical. Validity. If every non-faulty process begins with the same initial value v, then their final decision must be v.

Problem Specification u3 u2 u1 u0 v v v v Here, v must be equal to the value at some input line. Finally, all outputs must be identical, even if one or more processes fail. input output p0 p1 p2 p3

Observation If there is no failure, then reaching consensus is trivial. All-to-all broadcast followed by a applying a choice function … Consensus in presence of failures can however be complex. The complexity depends on the system model and the type of failures

Asynchronous Consensus Seven members of a busy household decided to hire a cook, since they do not have time to prepare their own food. Each member separately interviewed every applicant for the cook’s position. Depending on how it went, each member voted "yes" (means “hire”) or "no" (means “don't hire”). These members will now have to communicate with one another to reach a uniform final decision about whether the applicant will be hired. The process will be repeated with the next applicant, until someone is hired. Consider various modes of communication like shared memory or message passing. Also assume that one process (i.e. a member) may crash at any time.

Asynchronous Consensus Theorem. In a purely asynchronous distributed system, the consensus problem is impossible to solve if even a single process crashes Result due to Fischer, Lynch, Patterson (commonly known as FLP 85). Received the most influential paper award of ACM PODC in 2001

Proof Bivalent and Univalent states A decision state is bivalent, if starting from that state, there exist two distinct executions leading to two distinct decision values 0 or 1. Otherwise it is univalent. A univalent state may be either 0-valent or 1-valent.

Proof (continued) Lemma. No execution can lead from a 0-valent to a 1-valent state or vice versa. Proof. Follows from the definition of 0-valent and 1-valent states.

Proof Lemma. Every consensus protocol must have a bivalent initial state. Proof by contradiction. Suppose not. Then consider the following input patterns: s[0] …0 0 0{0-valent)there must ne a j: …0 0 1s[j] is 0-valent …0 1 1s[j+1] is 1-valent …………(differ in j th position) s[n-1] …1 1 1{1-valent} What if process (j+1) crashes at the first step?

Lemma. In a consensus protocol, starting from any initial bivalent state I, there must exist a reachable bivalent state T, such that every action taken by some process p in state T leads to either a 0 valent or a 1-valent state. Note that bivalent states should not form a cycle, since it affects termination. Actions 0 and 1 from T must be taken by the same process p. Why? Proof of FLP (continued)

T T0 T1 Decision =1 Decision = 0 p reads q writes e1 e0 Starting from T, let e1 be a computation that excludes any step by p. Let p crash after reading. Then e1 is a valid computation from T0 too. To all non-faulty processes, these two computations are identical, but the outcomes are different! This is not possible! But then, starting from a 0-valent state, a computation reaches decision = 1 which is not feasible Case 1. 1-valent 0-valent Assume shared memory communication. Also assume that p ≠ q. Various cases are possible Such a computation must exist since p can crash at any time ?

Proof (continued) T T0 T1 Decision =1 Decision = 0 p writes q writes e1 e0 Both write on the same variable, and p writes first. From T, let e1 be a computation that excludes any step by p. Let p crash after writing. Then e1 is a valid computation from T0 too. To all non-faulty processes, these two computations are identical, (q overwrites the value written by p) but the outcomes are different! Case 2.1-valent 0-valent

Proof (continued) T T0 T1 Z Decision =1 Decision = 0 p writes q writes Let both p and q write, but on different variables. Then regardless of the order of these writes, both computations lead to the same intermediate global state Z, which must be univalent. Is Z 1-valent or 0-valent? Both are absurd. Case 3 0-valent 1-valent p writes q writes

Proof (continued) Similar arguments can be made for communication using the message passing model too (See Nancy Lynch’s book). These lead to the fact that p, q cannot be distinct processes, and p = q. Call p the decider process. What if p crashes in state T? No consensus is reached!

Conclusion In a purely asynchronous system, there is no solution to the consensus problem if a single process crashes.. Note that this is true for deterministic algorithms only. Solutions do exist for the consensus problem using randomized algorithm, or using the synchronous model.