Download presentation
Presentation is loading. Please wait.
1
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 1 Principles of Reliable Distributed Systems Lecture 12: Impossibility of Fault-Tolerant Asynchronous Consensus aka FLP (Fischer, Lynch, Paterson, 85) Spring 2008 Prof. Idit Keidar
2
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 2 Material Textbooks: –Nancy Lynch, Distributed Algorithms Ch. 12 (FLP), Ch. 25 (partial synchrony). –Attiya & Welch, Distributed Computing, Ch. 5. A Constructive Proof of FLP, Hagen Völzer, IPL 2004
3
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 3 Reminder: Consensus Each process has an input, should irrevocably decide an output Agreement: correct processes’ decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide Binary Consensus: input values are 0 and 1
4
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 4 Model Asynchronous –Messages can be delayed arbitrarily (non- assumption) –Processes take steps at asynchronous times Crash failures –At most one crash failure in a run –A process that crashes at any point in a run is faulty in that run
5
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 5 Some Definitions For formal lower bound proofs we need formal definitions of what algorithms can and cannot do
6
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 6 Configurations (Global States) A configuration (or global state) of a distributed system is a vector consisting of the local states of all of its components –Process states: values in variables –Communication link states: messages in transit s 1, s 2, …, s n, c 12, c 13, …,c n(n-1) External observer view
7
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 7 Algorithms Deterministic algorithm = collection of state-transition functions, one per system component –Together: function from configurations to configurations
8
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 8 State Transitions A process’s algorithm defines transitions –From a given local state and (possibly) incoming messages –To a new state and (possibly) messages to send The transition modifies the process state and (possibly) incoming and/or outgoing channel states
9
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 9 Runs (Executions) A run (execution) of an algorithm = an alternating sequence of configurations and actions Example run of a shared counter: 0, inc A (), 1, inc B (), 2, inc B (), 3, inc B (), 4
10
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 10 More on Configurations Reachable configuration = there is a run in which it occurs v-decided configuration: some process has decided v (stored as part of the state)
11
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 11 Environments A run is determined by the algorithm’s actions, and the environment’s actions In a synchronous model, the environment actions are failures and message loss In an asynchronous model, also scheduling of process actions and message delays
12
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 12 To Prove Lower Bounds It’s sufficient to look at a subset of all possible runs –A subset of possible environment actions Simplifies proof Weakens the adversary, hence strengthens the lower bound Is the same true for algorithms?
13
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 13 Simplified Asynchronous Model Assume that processes take steps only upon message receipt –Assume further that each process initially has a special message “start” waiting for it in an incoming channel –Why can we assume this? Recall that we are allowed to restrict ourselves to a subset of the runs
14
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 14 Runs of Simplified Model A run is a sequence of steps, each of which occurs at one process p that: –Reads a message m from an incoming channel The channel state changes to exclude m –Changes the local state of p –Puts zero or more messages on channels to other processes
15
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 15 Considered Environment Actions (p,m) –Process p delivers m –Enabled when m is in a channel to p and p is correct –Removes m from the channel –May change p’s local state –May change any number of p’s outgoing channels
16
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 16 Fair Executions An execution is fair if for every (p,m), if (p,m) is enabled then it eventually occurs Note: an enabled action does not stop being enabled until it occurs, why? Note: fairness is a condition on the environment, not the consensus protocol Why do we care about fairness?
17
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 17 Observation Given a fixed deterministic algorithm, the configuration at the end of a run is fully determined by the initial values and environment actions in the run
18
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 18 Notation c p,m c’ –Action (p,m) in configuration c leads to c’ c c’ –Exists a series c c 1 c 2 … c’ c p c’ –Exists such a series of steps of p only c -p c’ –Exists such a series in which p does not takes steps (p is silent)
19
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 19 1-Resilient Algorithm One process can crash –Crashed processes stop taking actions Implication: from every reachable configuration c, for every process p, there is some c’ s.t. c -p c’ and c’ is v-decided for some v Why is it OK to assume p can stop taking actions? What if some other process has crashed?
20
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 20 p-Silent Decision Values val(p,c) = {v | c’ : c -p c’ and c’ is v-decided} –Not empty, why? c is v-uniform if: p val(p,c) = {v} c is non-uniform if it is neither 0-uniform nor 1-uniform Examples: –Initial configuration with all input values 0? –1-decided configurations?
21
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 21 Example: t-Resilient Uniform Consensus (Lecture 5) v i =init i ; Alive i = P in every round 1 ≤ k ≤ t+2: send v i to all receive round k messages for all p j if (received v j ) then v i = min(v i, v j ) otherwise p j is suspected if ( ( p j Alive i : received v j = v i ) && !decided ) then decide v i. for all p j if (suspect p j ) then Alive i =Alive i {p j }
22
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 22 What Is val(p 1,c 1 )? I/II p1p1 p2p2 p3p3 1 1 0 C1C1 0 val(p 1,c 1 ) = {v | c’ : c 1 -p1 c’ and c’ is v-decided} C 2 – 0-uniform 0 {p 2,p 3 } C 3 – 0-decided 0 {p 2,p 3 }
23
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 23 What Is val(p 1,c 1 )? II/II p1p1 p2p2 p3p3 1 C1C1 1 val(p 1,c 1 ) = {v | c’ : c 1 -p1 c’ and c’ is v-decided} val(p 1,c 1 ) = {0,1} C’ 2 – 1-uniform 1 {p 3 } C’ 3 – 1-decided 1 {p 3 } 1 0 Assuming t > 1 at least 2-resilient algorithm
24
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 24 What Is val(p 2,c 1 )? 1 C1C1 val(p 2,c 1 ) = {1} 1 0 p1p1 p2p2 p3p3 1 {p 1,p 3 }
25
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 25 Diamond Lemma If c p c 1 and c -p c 2 then exists c’ such that c 1 -p c’ and c 2 p c’ p movesp silent c c’ c1c1 c2c2
26
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 26 Proposition 1 If c p,m c’ then val(p,c) val(p,c’) c p,m p silent c’v-decided If it was possible to decide v without p, then p’s action cannot take this possibility away
27
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 27 Proposition 2: If c p,m c’ and val(q,c)={0} then val(q,c’)≠{1} Case 1: p≠q cc’ Case 2: p=q, then by Proposition 1, 0 val(q,c’) p,m … 0-decided q silent
28
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 28 Lemma 1: Exists Non-Uniform Initial Configuration Assume by contradiction no non-uniform initial configuration exists c j+1 cjcj 00...00...0111…1... differ only in state of some p j 01…1
29
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 29 Lemma 1 (Cont’d) c j is 0-uniform, so –c j -pj c where c is 0-decided c j and c j+1 differ only at p j, so –c j+1 -pj c A contradiction to c j+1 being 1-uniform c j+1 cjcj 00...00...0111…1... 01…1
30
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 30 Proof Strategy Show that we can keep the system in non-uniform configurations arbitrarily long Note: execution must be fair!
31
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 31 Lemma 2 For each non-uniform configuration c and process p, exists c’ s.t. c c’ and val(p,c’)={0,1} Proof on board. Are we done? It is always possible to reach a state from which both values can be decided without p
32
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 32 Building a Fair Execution Start from non-uniform configuration (Lemma 1) Repeat while possible: –Choose (p,m) that has been enabled the longest –Use Lemma 2 to get to c s.t. val(p,c)={0,1} –If (p,m) is still enabled, let c p,m c’ happen –By Proposition 1, val(p,c’)={0,1}, non-uniform Fairness: every enabled (p,m) eventually occurs
33
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 33 We Have Proven: Every asynchronous fault-tolerant consensus algorithm has a fair execution in which no process decides [ FLP85 ] Fault-Tolerant Asynchronous Consensus is Impossible!
34
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 34 Impossibility Revisited Every asynchronous fault-tolerant consensus algorithm has a fair execution in which no process decides [ FLP85 ] It is possible to design asynchronous consensus algorithms that don’t always terminate
35
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 35 Course Summary
36
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 36 Main Topics State machine replication for consistency and availability –Uses Atomic Broadcast –Uses Consensus Asynchronous Message-Passing Models –Consensus impossible [FLP] –Solvable with eventual synchrony, failure detectors S, –In two communication steps in “fast” case –Eventually reliable links are enough (Paxos) Shared memory –Convenient model –Can be emulated using message-passing –Good for “data-centric” replication
37
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 37 Course Summary (What I Hope You Learned…) Distributed systems are subtle –It’s very easy to get things wrong –Lesson: don’t design a distributed system without proving the algorithm first! Redundancy is the key to reliability –Multiple replicas: 2t+1, 3t+1, etc. Strong consistency is attainable but costly and has scalability limitations
38
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring 2008 38 Good Luck in the exam!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.