When Is Agreement Possible

Slides:



Advertisements
Similar presentations
Fault Tolerance. Basic System Concept Basic Definitions Failure: deviation of a system from behaviour described in its specification. Error: part of.
Advertisements

Impossibility of Distributed Consensus with One Faulty Process
DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
Announcements. Midterm Open book, open note, closed neighbor No other external sources No portable electronic devices other than medically necessary medical.
Distributed Algorithms – 2g1513 Lecture 10 – by Ali Ghodsi Fault-Tolerance in Asynchronous Networks.
Computer Science 425 Distributed Systems CS 425 / ECE 428 Consensus
Consensus Hao Li.
Distributed Computing 8. Impossibility of consensus Shmuel Zaks ©
Byzantine Generals Problem: Solution using signed messages.
1 Principles of Reliable Distributed Systems Lecture 3: Synchronous Uniform Consensus Spring 2006 Dr. Idit Keidar.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
CPSC 668Set 9: Fault Tolerant Consensus1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
1 Principles of Reliable Distributed Systems Lecture 5: Failure Models, Fault-Tolerant Broadcasts and State-Machine Replication Spring 2005 Dr. Idit Keidar.
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Distributed systems Module 2 -Distributed algorithms Teaching unit 1 – Basic techniques Ernesto Damiani University of Bozen Lesson 4 – Consensus and reliable.
Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes.
On the Cost of Fault-Tolerant Consensus When There are no Faults Idit Keidar & Sergio Rajsbaum Appears in SIGACT News; MIT Tech. Report.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit.
Lecture 8-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 16, 2010 Lecture 8 The Consensus.
Distributed Algorithms – 2g1513 Lecture 9 – by Ali Ghodsi Fault-Tolerance in Distributed Systems.
Consensus and Its Impossibility in Asynchronous Systems.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.
CS294, Yelick Consensus revisited, p1 CS Consensus Revisited
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 9 Consensus I Section Klara Nahrstedt.
Sliding window protocol The sender continues the send action without receiving the acknowledgements of at most w messages (w > 0), w is called the window.
Hwajung Lee. Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Agreement in Distributed Systems n definition of agreement problems n impossibility of consensus with a single crash n solvable problems u consensus with.
Chapter 21 Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan.
1 Fault tolerance in distributed systems n Motivation n robust and stabilizing algorithms n failure models n robust algorithms u decision problems u impossibility.
Alternating Bit Protocol S R ABP is a link layer protocol. Works on FIFO channels only. Guarantees reliable message delivery with a 1-bit sequence number.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 9: Fault Tolerant Consensus 1.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
1 Fault-Tolerant Consensus. 2 Communication Model Complete graph Synchronous, network.
CSE 486/586 CSE 486/586 Distributed Systems Consensus Steve Ko Computer Sciences and Engineering University at Buffalo.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Consensus, impossibility results and Paxos Ken Birman.
The Consensus Problem in Fault Tolerant Computing
The consensus problem in distributed systems
CSE 486/586 Distributed Systems Failure Detectors
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
8.2. Process resilience Shreyas Karandikar.
CSE 486/586 Distributed Systems Failure Detectors
CSE 486/586 Distributed Systems Failure Detectors
Alternating Bit Protocol
Distributed Consensus
Agreement Protocols CS60002: Distributed Systems
Distributed Consensus
EEC 688/788 Secure and Dependable Computing
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
EEC 688/788 Secure and Dependable Computing
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
FLP Impossibility of Consensus
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
CSE 486/586 Distributed Systems Failure Detectors
CSE 486/586 Distributed Systems Reliable Multicast --- 1
CSE 486/586 Distributed Systems Consensus
Presentation transcript:

When Is Agreement Possible When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

Introduction Basics of agreement protocols Impossibility of agreement in asynchronous system with failures When is agreement possible?

Basics of Agreement Protocols What is agreement? What are the necessary conditions for agreement?

What Do We Mean By Agreement? In simplest case, can n processors agree that a variable takes on value 0 or 1? Only non-faulty processors need agree More complex agreements can be built from this simple agreement

Conditions for Agreement Protocols Consistency All participants agree on same value and decisions are final Validity Participants agree on a value at least one of them wanted Termination All participants choose a value in a finite number of steps

Impossibility of Agreement in Async System With Failures Assume a reliable, but asynchronous, message passing system Any message may face arbitrary delays Can a set of processors reach agreement if one of the processors fails?

Agreement Isn’t Always Possible In the general case for arbitrary systems Adding some special properties to the system may change that result But without those properties, provably impossible A result sometimes abbreviated FLP For Fischer, Lynch, and Patterson, who proved it

Model of the System The system consists of n processors The goal is for all non-faulty processors to agree on value 0 or 1 Rule out the trivial case of always agreeing on 0 (or 1) Agreement depends on protocol, initial state, and inputs to each processor

Bivalent and Univalent States A bivalent state is a system state that could lead to either value being decided A univalent state can only lead to one of the values being decided 0-valent or 1-valent Valency must take allowable failures into account!

System Configuration Processors have internal state State of network is the set of messages sent, but not yet received Event e is the receipt of message m by a processor Which can lead to sending one or more new messages Events are deterministic A schedule is a sequence of events

Proving the Result Let’s assume the result is false That we can reach agreement with one failure in these conditions Use an adversarial model Within rules of behavior, assume adversary can force any legal event Look for contradictions

What Can the Adversary Do? Force any processor to perform an event at any moment Choose any message to be delivered to any processor when it requests a message Delay any message arbitrarily long Once, it can kill one processor permanently

The Necessity of Bivalency There has to be an initial bivalent configuration for the system Why? If all processors started with value 1, the system would decide 1 If all processors started with value 0, the system would decide 0

Intermediate Initial States If some processors start with value 0 and some with value 1 Some initial states lead to result 1 Some initial states lead to result 0 All initial states lead to one or the other So there is a 1-valent initial state that differs from a 0-valent initial state by one processor’s initial value

A Graphical Representation What’s in these states? State x State y Node 1:0 Node 2:1 Node 3: 1 . Node N: 0 Node 1:0 Node 2:1 Node 3: 1 . Node N: 1 They differ in only one value 0-valent initial states 1-valent initial states

Why Does This Imply Bivalence? What if that one differing processor is the processor that fails? The system must still reach agreement from the remaining states Which are identical, now But on what value?

Is This Possible? State x Does the system decide on 1? Looks like x and y must be bivalent Does the system decide on 0? State x State y Node 1:0 Node 2:1 Node 3: 1 . Node N: 0 Node 1:0 Node 2:1 Node 3: 1 . Node N: 1 Then State x wasn’t 0-valent, after all Then State y wasn’t 1-valent, after all 0-valent initial states 1-valent initial states

So What? So there has to be at least one bivalent initial state Why’s that so bad? If the system never leaves a bivalent state, it never makes a decision We must show our adversary can’t perpetually force bivalency

The Persistence of Bivalency Let’s assume bivalency doesn’t persist At some point, some bivalent state must transition to a univalent state Implying at least two events One to go to 0-valent One to go to 1-valent With no events leading to bivalent states

A Graphical Representation D e’ D’ Remember, these events are each delivery of a message So m and m’ must have been in the message delivery system state simultaneously

Looking Closely at Events e and e’ What would happen if we executed e first, then e’? What would happen if we executed them in the opposite order? Well, why should I care? Would executing them in either order lead to the same state? If so, there’s a contradiction

Order of Events e and e’ C e D e’ D’ e’ e

Why Should They Lead to the Same State? What if e and e’ occur on different processors? Then they’re independent events So they should produce the same result if executed in either order So e and e’ could not have occurred on different processors

Could the Events Occur on the Same Processor P? If e was first, the state became 0-valent If e’ was first, the state became 1-valent But what if P then fails? Since the event happened only at P, only P sees the effects So we’re still in a bivalent state

Recapitulating the Argument It’s possible to start in a bivalent state There must be some point at some processor P at which the bivalent state changes to univalent If P fails before anyone knows the valency, the system becomes bivalent And can never settle to univalency Perpetual bivalency implies no agreement

When Is Agreement Possible? Didn’t we show in the last class that we can reach agreement if less than 1/3 of our processors are faulty? Yes, but only if the message passing system is synchronous Whether agreement is possible in a system depends on certain parameters

Parameters for Agreement In Distributed Systems Synchronous vs. asynchronous processors Bounded vs. unbounded communications delay Ordered vs. unordered messages Point-to-point vs. broadcast communications

Synchronous vs. Asynchronous Processors Synchronous processors imply that all processors make progress predictably More precisely, there is a constant s such that for every s+1 steps taken by Pi all Pj will take at least one step

Bounded vs. Unbounded Communications Delay Delay is bounded if and only if all messages arrive at their destination within t steps Implies no lost messages Doesn’t imply messages arrive in the order sent

Ordered vs. Unordered Messages Messages are ordered if they are received in the same real time order as their sending Using true real time In some cases, merely receiving all messages in same order at all processors is enough

Point-to-Point vs. Broadcast Communications Point-to-point communications means a given message sent by Pi is seen only by its destination Pj Broadcast communications mean that Pi can send a message to all other processors in a single atomic step Most typically by hardware broadcast

So, When Can We Reach Agreement? Case 1: Processors are synchronous and communications is bounded Case 2: Messages are ordered and the transmission medium is broadcast Case 3: Processors are synchronous and messages are ordered And that’s it (Case 1 covers Byzantine agreement)

What Does This Result Mean? For practical systems we really build Not that we can never reach agreement Good systems almost always do But that we generally can’t guarantee it Which implies that our systems should tolerate disagreements At some times Under some conditions

When Is Disagreement OK? For preference, when it doesn’t matter E.g., when reasonable results possible even without agreement Or when it eventually works itself out With possible inconsistencies in the meantime Or, at worst, when it is visible to people who can fix it

When Is Disagreement Not OK? When the consequences of disagreement are dire When it results in unfixable problems When its consequences are invisible, but relevant Unfortunately, we don’t always get to choose when we can avoid it

Minimizing Chances of Disagreement Understand when agreement is most critical In those cases, use protocols that are less likely to fail on agreement Which usually have heavy expenses So don’t always use them

A Classification of Faults More detailed than previously discussed Produced by fault-tolerant computing community Divides faults into classes Stronger class is subset of weaker class

An Ordered Fault Classification Byzantine Authenticated Byzantine Incorrect Computation Timing Omission Crash Fail Stop

Fail Stop Faults A processor ceases operation But informs other processors in computation that it has stopped Relatively easy to deal with

Crash Fault A processor crashes or loses internal state and halts Without notification to anyone else Hard to distinguish from a really slow processor

Omission Faults A processor fails to do something in time Like respond to a message But otherwise it may still be operating correctly Or it may have crashed

Timing Fault A processor completes a task before or after the window when it should Or never A late acknowledgement to a message, e.g.

Incorrect Computation Fault A processor fails to produce the correct results for a given set of input Which could be merely not producing the results soon enough Or could be sending back trash

Authenticated Byzantine Fault Processor performs an arbitrary or malicious fault But authentication mechanisms note any alterations made to others’ messages

Byzantine Fault Any and every fault Having arbitrarily bad consequences Possibly working in combination with other faults to produce really bad results In this classification, all other faults are subclasses of Byzantine faults