Consensus and Its Impossibility in Asynchronous Systems.

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 13: Impossibility of Consensus All slides © IG.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Byzantine Generals Problem: Solution using signed messages.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Structure of Consensus 1 The Structure of Consensus Consensus touches upon the basic “topology” of distributed computations. We will use this topological.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 6: Impossibility.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 12: Impossibility.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus Impossibility Dennis Shasha (following Lynch, Fischer, Patterson)
Computer Science 425 Distributed Systems (Fall 2009) Lecture 10 The Consensus Problem Part of Section 12.5 and Paper: “Impossibility of Distributed Consensus.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
University of Tampere, CS Department Distributed Commit.
1 Consensus Hierarchy Part 1. 2 Consensus in Shared Memory Consider processors in shared memory: which try to solve the consensus problem.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
SysRép / 2.5A. SchiperEté The consensus problem.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Consensus, impossibility results and Paxos Ken Birman.
The consensus problem in distributed systems
When Is Agreement Possible
Alternating Bit Protocol
Distributed Consensus
Distributed Consensus
Consensus, FLP, and Paxos
FLP Impossibility of Consensus
Distributed systems Consensus
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Consensus and Its Impossibility in Asynchronous Systems

A Few Questions from Protocols We studied Was it necessary to assume that a node can detect failure of other nodes? –What if we could not do that in a system? E.g., if delays are arbitrarily long –The protocol we constructed would not work. –Can any protocol exist to solve this problem? Why is this problem important?

Model with Minimal Assumptions Goal of this work is to identify what assumptions are absolutely essential to solve the problem of interest. –Think of this as a `game’ between `protocol designer’ and `system implementer’ The more guarantees `system implementer’ provides the easier it is to design protocol. The more guarantees that are expected of the `system designer’ means that the protocol is likely to be more restrictive –For example, a protocol that assumes FIFO communication is more restrictive than on that does not require it. One model we consider for this is `asynchronous systems’

Asynchronous Systems What does asynchronous mean? –Computation consists of steps, in each step one of the following things can happen A process sends a message A process receives a message A process performs some local computation This was one of the models we considered at the beginning of the semester

Observation about Asynchronous Systems If a process is about to perform a local computation and it is delayed then it can still do that local computation –Irrespective of what other processes do

Effect of Asynchrony It is not possible to distinguish between a slow process and a failed process –This is the reason why consensus is not solvable in asynchronous systems

Consensus Problem Each process has a vote, either 0 or 1 Each process must decide on a decision, either 0 or 1, subject to the following constraints: –Agreement If two processes decide then their decision must be the same. –Validity If the votes of all processes were equal and no failures occur then the decision of all processes (if they decide) must equal that vote. –Termination All non-failed processes must decide

Revisiting Safety and Liveness Agreement and Validity are safety properties Termination is a liveness property Could the problem be solved `trivially’ if we only had two of these propreties?

Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement about which process has failed Clock phase synchronization Air traffic control system: all aircrafts must have the same view If there is no failure, then reaching consensus is trivial. All-to-all broadcast Followed by a applying a choice function … Consensus in presence of failures can however be complex.

Example of Asynchronous Consensus Seven members of a busy household decided to hire a cook, since they do not have time to prepare their own food. Each member separately interviewed every applicant for the cook’s position. Depending on how it went, each member voted "yes" (means “hire”) or "no" (means “don't hire”). These members will now have to communicate with one another to reach a uniform final decision about whether the applicant will be hired. The process will be repeated with the next applicant, until someone is hired.

Asynchronous Consensus Theorem. In a purely asynchronous distributed system, the consensus problem is impossible to solve if even a single process crashes Famous result due to Fischer, Lynch, Patterson (commonly known as FLP 85)

Computation Prefix Computation-prefix: –A computation-prefix is a sequence where in each step, some process executes its local event, send event or a receive event. We write computation-prefixes as follows: –<>: Initial computation where nothing has occurred – Each sequence here is finite.

Computation Valance 0-valent computation –A computation is 0-valent if the only decision in that computation is 0. 1-valent computation –A computation is 1-valent if the only decision in that computation is 1.

Computation Valance Univalent computation –A computation is univalent iff it is either 0-valent or 1- valent. –In other words, a univalent computation has entered the decision mode. Bivalent computation –A computation is bivalent iff it is neither 0-valent nor 1- valent. –In other words, a bivalent computation has not entered a decision mode yet.

Some possible protocols for consensus without failures In all protocols, send your votes to all others. Final decision to be made only after you receive all votes. Protocol 1: –Take a majority of all votes with some fixed way to break a tie Protocol 2: –If all 0’s or all 1’s: decide 0 or 1 respectively. –Else: If number of 0’s in the votes received is prime then decide on 1. otherwise, decide on 0 Protocol 3: –If all processes vote 1 then decide 1. Else decide 0 Protocol 4: –If all processes vote 0 then decide 0. Else decide 1

Coordinator based solution Everyone send a vote to coordinator –First message received by coordinator is the decision

Proof Lemma. Every consensus protocol must have a bivalent initial state. Proof by contradiction. Suppose not. Then consider the following scenario: s[0] …0 0 0{0-valent) …0 0 1s[j] is 0-valent …0 1 1s[j+1] is 1-valent …………(differ in j th position) s[n-1] …1 1 1{1-valent} What if process (j+1) crashes at the first step?

Computation Valance of is a bivalent computation. –Based on the definition of the consensus problem

Existence of Decider Process At some state, the computation must turn from bivalent computation to a univalent computation

If decider process slows down Just about the time when it was about to make the decision

Summary In a purely asynchronous system, there is no solution to the consensus problem if a single process crashes.. Note that this is true for deterministic algorithms only. Solutions do exist for the consensus problem using randomized algorithm, or using the synchronous model.