Asynchronous Consensus

Slides:



Advertisements
Similar presentations
Impossibility of Distributed Consensus with One Faulty Process
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Teaser - Introduction to Distributed Computing
IMPOSSIBILITY OF CONSENSUS Ken Birman Fall Consensus… a classic problem  Consensus abstraction underlies many distributed systems and protocols.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Distributed Systems Overview Ali Ghodsi
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Failure Detectors CS 717 Ashish Motivala Dec 6 th 2001.
Byzantine Generals Problem: Solution using signed messages.
Failures and Consensus. Coordination If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate.
Failure Detectors. Can we do anything in asynchronous systems? Reliable broadcast –Process j sends a message m to all processes in the system –Requirement:
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 3 – Distributed Systems.
 Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 7: Failure Detectors.
Asynchronous Consensus (Some Slides borrowed from ppt on Web.(by Ken Birman) )
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Distributed Systems Tutorial 4 – Solving Consensus using Chandra-Toueg’s unreliable failure detector: A general Quorum-Based Approach.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Composition Model and its code. bound:=bound+1.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus and Its Impossibility in Asynchronous Systems.
Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
SysRép / 2.5A. SchiperEté The consensus problem.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Multi-phase Commit Protocols1 Based on slides by Ken Birman, Cornell University.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CS614 – Byzantine Agreement Ken Birman. Outline of talk  Byzantine model and problem statement  Impossibility results  Solutions to the problem With.
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Unreliable Failure Detectors for Reliable Distributed Systems Tushar Deepak Chandra Sam Toueg Presentation for EECS454 Lawrence Leinweber.
Consensus, impossibility results and Paxos Ken Birman.
The consensus problem in distributed systems
When Is Agreement Possible
Alternating Bit Protocol
Distributed Consensus
Distributed Systems, Consensus and Replicated State Machines
FLP Impossibility & Weakest Failure Detector
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
IS 698/800-01: Advanced Distributed Systems Crash Fault Tolerance
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

Asynchronous Consensus Ken Birman

Outline of talk Reminder about models Asynchronous consensus: Impossibility result Solution to the problem With an “oracle” that detects failures Without oracles, using timeout Big issues? Revisit from Byzantine agreement Is this model realistic? In what ways is it “legitimate”? Should we focus on impossibility, or “possibility”? Asynchronous consensus in real world systems

Distributed Computing Models Recall that we had two models To reason about networks and applications we need to be precise about the setting in which our protocols run But “real world” networks are very complex They can drop packets, or reorder them Intruders might be able to intercept and modify data Timing is totally unpredictable

Asynchronous network model Asynchronous because we lack clocks: Network can arbitrarily delay a message But we assume that messages are sequenced and retransmitted (arbitrary numbers of times), so they eventually get through. “Free” to say: lossless, ordered No value to assumptions about process speed Failures in asynchronous model? Usually, limited to process “crash” faults If detectable, we call this “fail-stop” – but how to detect?

An asynchronous network Not causal!

An asynchronous network Time shrinks…

An asynchronous network Time shrinks… Time stretches…

Justification? If we can do something in the asynchronous model, we can probably do it even better in a real network Clocks, a-priori knowledge can only help… But today we will focus on an impossibility result By definition, impossibility in this model means “xxx can’t always be done”

Paradigms Fundamental problems, the solution of which yields general insight into a broad class of questions In distributed systems: Agreement (on value proposed by a leader) Consensus (everyone proposes a value… pick one) Electing a leader Atomic broadcast/multicast (send a message, reliably, to everyone who isn’t faulty, such that concurrent messages are delivered in the same order everywhere) Deadlock detection, clock or process synchronization, taking a snapshot (“picture”) of the system state….

Consensus problem Models distributed agreement Comes in various forms (with subtle differences in the associated results)! With a leader: leader gives an order, like “attack”, and non-faulty participants either attack or do nothing, despite some limited number of failures: Byzantine Agreement Without a leader: participants have an initial vote; protocol runs and eventually all non-faulty participants chose the same outcome, and it is one of the initial votes (typically, 0 or 1): Fault-tolerant Consensus

Consensus problem P0 Q0 R1 P1 Q1 R1

Fault-tolerance Goal: an algorithm tolerant of one failure Failure: process crashes but this is not detectable So the algorithm must work both in the face of arbitrary message delay caused by the network, and in the event of a single failure

If some process stays up… Suppose we knew that P won’t fail Then P could simply broadcast it’s input All would “decide” upon this value Solves the problem

If one process stays up Indeed, suppose that P stays up only long enough to send one message But there is only one failure And we knew that P would “lead” Then we can relay P’s message, using an all-to-all broadcast

Algorithm P: broadcast my input Q  P: on receiving P’s message for first time, broadcast a copy Tolerates anything except failure of P in the first step, but we need to agree upon “P” before starting (ie P is the least ranked process, using alphabetic ranking)

Another algorithm All processes start by broadcasting own value to all other processes If we know that there is always exactly one failure, could wait until n-1 messages received, then using any deterministic rule But doesn’t work if sometimes we have one failure, sometimes none

FLP result Considers general case Assumes an algorithm that can decide with zero or one failures Proves that this algorithm can be prevented from reaching decision, indefinitely

Basic idea Think of system state as a “configuration” Configuration is v-valent if decision to pick v has become inevitable: all runs lead to v If not 0-valent or 1-valent, configuration is bivalent Initial configuration includes At least one 0-valent: {0,0,0….0} At least one 1-valent: {1,1,1…..1} At least one bivalent: {0,0,…1,1}

Basic idea 0-valent configurations bi-valent configurations

Transitions between configurations Configuration is a set of processes and messages Applying a message to a process changes its state, hence it moves us to a new configuration Because the system is asynchronous, can’t predict which of a set of concurrent messages will be delivered “next” But because processes only communicate by messages, this is unimportant

Basic Lemma Suppose that from some configuration C, the schedules 1, 2 lead to configurations C1 and C2, respectively. If the sets of processes taking actions in 1 and 2, respectively, are disjoint than 2 can be applied to C1 and 1 to C2, and both lead to the same configuration C3

Basic Lemma C 2 1 C1 C2 2 1 C3

Main result No consensus protocol is totally correct in spite of one fault Note: Uses total in formal sense (guarantee of termination)

Basic FLP theorem Suppose we are in a bivalent configuration now and later will enter a univalent configuration We can draw a form of frontier, such that a single message to a single process triggers the transition from bivalent to univalent

Basic FLP theorem C e’ e bivalent D0 C1 univalent e’ e D1

Single step decides They prove that any run that goes from a bivalent state to a univalent state has a single decision step, e They show that it is always possible to schedule events so as to block such steps Eventually, e can be scheduled but in a state where it no longer triggers a decision

Basic FLP theorem They show that we can delay this “magic message” and cause the system to take at least one step, remaining in a new bivalent configuration Uses the diamond-relation seen earlier But this implies that in a bivalent state there are runs of indefinite length that remain bivalent Proves the impossibility of fault-tolerant consensus

Notes on FLP No failures actually occur in this run, just delayed messages Result is purely abstract. What does it “mean”? Says nothing about how probable this adversarial run might be, only that at least one such run exists

FLP intuition Suppose that we start a system up with n processes Run for a while… close to picking value associated with process “p” Someone will do this for the first time, presumably on receiving some message from q If we delay that message, and yet our protocol is “fault-tolerant”, it will somehow reconfigure Now allow the delayed message to get through but delay some other message

Key insight FLP is about forcing a system to attempt a form of reconfiguration This takes time Each “unfortunate” suspected failure causes such a reconfiguration

FLP and our first algorithm P is the leader and is supposed to send its input to Q Q “times out” and Tells everyone that P has apparently failed Then can disseminate its own value If P wakes up, we re-admit it to the system but it is no longer considered least ranked One can make such algorithms work… But they can be attacked by delaying first P, then Q, then R, etc

FLP in the real world Real systems are subject to this impossibility result But in fact often are subject to even more severe limitations, such as inability to tolerate network partition failures Also, asynchronous consensus may be too slow for our taste And FLP attack is not probable in a real system Requires a very smart adversary!

Chandra/Toueg Showed that FLP applies to many problems, not just consensus In particular, they show that FLP applies to group membership, reliable multicast So these practical problems are impossible in asynchronous systems, in formal sense But they also look at the weakest condition under which consensus can be solved

Chandra/Toueg Idea Separate problem into The consensus algorithm itself A “failure detector:” a form of oracle that announces suspected failure But it can change its mind Question: what is the weakest oracle for which consensus is always solvable?

Sample properties Completeness: detection of every crash Strong completeness: Eventually, every process that crashes is permanently suspected by every correct process Weak completeness: Eventually, every process that crashes is permanently suspected by some correct process

Sample properties Accuracy: does it make mistakes? Strong accuracy: No process is suspected before it crashes. Weak accuracy: Some correct process is never suspected Eventual strong accuracy: there is a time after which correct processes are not suspected by any correct process Eventual weak accuracy: there is a time after which some correct process is not suspected by any correct process

A sampling of failure detectors Completeness Accuracy Strong Weak Eventually Strong Eventually Weak Perfect P Strong S Eventually Perfect P Eventually Strong  S D Weak W  D Eventually Weak  W

Perfect Detector? Named Perfect, written P Strong completeness and strong accuracy Immediately detects all failures Never makes mistakes

Example of a failure detector The detector they call W: “eventually weak” More commonly: W: “diamond-W” Defined by two properties: There is a time after which every process that crashes is suspected by some correct process There is a time after which some correct process is never suspected by any correct process Think: “we can eventually agree upon a leader.” If it crashes, “we eventually, accurately detect the crash”

W: Weakest failure detector They show that W is the weakest failure detector for which consensus is guaranteed to be achieved Algorithm is pretty simple Rotate a token around a ring of processes Decision can occur once token makes it around once without a change in failure-suspicion status for any process Subsequently, as token is passed, each recipient learns the decision outcome

Rotating a token versus 2-phase commit Propose v… ack… Decide v “phase”

Rotating a token versus 2-phase commit Their protocol is basically a 2-phase commit But with n processes, 2PC requires 2(n-1) messages per phase, 3(n-1) total Passing a token only requires n messages per phase, for 2n total (when nothing fails) Tolerates f <  n/2  failures

Set of problems solvable in: Clock synchronization TRB non-blocking atomic commit consensus atomic broadcast reliable broadcast Synchronous systems Asynchronous using P Asynchronous using W Asynchronous TRB: Byzantine Generals with only crash failures

Building systems with W Unfortunately, this failure detector is not implementable Using timeouts we can make mistakes at arbitrary times But with long enough timeouts, could produce a close approximation to W

Would we want to? Question: are we solving the right problem? Pros and cons of asynchronous consensus Think about an air traffic control application Find one problem for which asynchronous consensus is a good match Find one problem for which the match is poor

French ATC system (simplified) Onboard Radar Controllers X.500 Directory Air Traffic Database (flight plans, etc)

Potential applications Maintaining replicated state within console clusters Distributing radar data to participants Distributing data over wide-area links within large geographic scale Management and control (administration) of the overall system Distributing security keys to prevent unauthorized action Agreement when flight control handoffs occur

Broad conclusions? The protocol seems unsuitable for high availability applications If the core of the system must make progress, the agreement property itself is too strong If a process becomes unresponsive might not want to wait for it to recover Also, since we can’t implement any of these failure detectors, the whole issue is abstract… Hence real systems don’t try to solve consensus as defined and used in these kinds of protocols!

Value of FLP/Consensus A clear and elegant problem statement Highlights limitations Perhaps with clocks we can overcome them More likely, we need a different notion of failure “Crash failure” is too narrow, “unreachable” also treated as failure in many real systems Caused much debate about real systems

Nature of debate We’ll see many practical systems soon Do they Evade FLP in some way? Are they subject to FLP? If so, what problem do they “solve”, given that consensus (and most problems reduce to consensus) is impossible to solve? Or are they subject to even more stringent limitations? Is fault-tolerant consensus even an issue in real systems?