Consensus and Its Impossibility in Asynchronous Systems
A Few Questions from Protocols We studied Was it necessary to assume that a node can detect failure of other nodes? –What if we could not do that in a system? E.g., if delays are arbitrarily long –The protocol we constructed would not work. –Can any protocol exist to solve this problem? Why is this problem important?
Model with Minimal Assumptions Goal of this work is to identify what assumptions are absolutely essential to solve the problem of interest. –Think of this as a `game’ between `protocol designer’ and `system implementer’ The more guarantees `system implementer’ provides the easier it is to design protocol. The more guarantees that are expected of the `system designer’ means that the protocol is likely to be more restrictive –For example, a protocol that assumes FIFO communication is more restrictive than on that does not require it. One model we consider for this is `asynchronous systems’
Asynchronous Systems What does asynchronous mean? –Computation consists of steps, in each step one of the following things can happen A process sends a message A process receives a message A process performs some local computation This was one of the models we considered at the beginning of the semester
Observation about Asynchronous Systems If a process is about to perform a local computation and it is delayed then it can still do that local computation –Irrespective of what other processes do
Effect of Asynchrony It is not possible to distinguish between a slow process and a failed process –This is the reason why consensus is not solvable in asynchronous systems
Consensus Problem Each process has a vote, either 0 or 1 Each process must decide on a decision, either 0 or 1, subject to the following constraints: –Agreement If two processes decide then their decision must be the same. –Validity If the votes of all processes were equal and no failures occur then the decision of all processes (if they decide) must equal that vote. –Termination All non-failed processes must decide
Revisiting Safety and Liveness Agreement and Validity are safety properties Termination is a liveness property Could the problem be solved `trivially’ if we only had two of these propreties?
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement about which process has failed Clock phase synchronization Air traffic control system: all aircrafts must have the same view If there is no failure, then reaching consensus is trivial. All-to-all broadcast Followed by a applying a choice function … Consensus in presence of failures can however be complex.
Example of Asynchronous Consensus Seven members of a busy household decided to hire a cook, since they do not have time to prepare their own food. Each member separately interviewed every applicant for the cook’s position. Depending on how it went, each member voted "yes" (means “hire”) or "no" (means “don't hire”). These members will now have to communicate with one another to reach a uniform final decision about whether the applicant will be hired. The process will be repeated with the next applicant, until someone is hired.
Asynchronous Consensus Theorem. In a purely asynchronous distributed system, the consensus problem is impossible to solve if even a single process crashes Famous result due to Fischer, Lynch, Patterson (commonly known as FLP 85)
Computation Prefix Computation-prefix: –A computation-prefix is a sequence where in each step, some process executes its local event, send event or a receive event. We write computation-prefixes as follows: –<>: Initial computation where nothing has occurred – Each sequence here is finite.
Computation Valance 0-valent computation –A computation is 0-valent if the only decision in that computation is 0. 1-valent computation –A computation is 1-valent if the only decision in that computation is 1.
Computation Valance Univalent computation –A computation is univalent iff it is either 0-valent or 1- valent. –In other words, a univalent computation has entered the decision mode. Bivalent computation –A computation is bivalent iff it is neither 0-valent nor 1- valent. –In other words, a bivalent computation has not entered a decision mode yet.
Some possible protocols for consensus without failures In all protocols, send your votes to all others. Final decision to be made only after you receive all votes. Protocol 1: –Take a majority of all votes with some fixed way to break a tie Protocol 2: –If all 0’s or all 1’s: decide 0 or 1 respectively. –Else: If number of 0’s in the votes received is prime then decide on 1. otherwise, decide on 0 Protocol 3: –If all processes vote 1 then decide 1. Else decide 0 Protocol 4: –If all processes vote 0 then decide 0. Else decide 1
Coordinator based solution Everyone send a vote to coordinator –First message received by coordinator is the decision
Proof Lemma. Every consensus protocol must have a bivalent initial state. Proof by contradiction. Suppose not. Then consider the following scenario: s[0] …0 0 0{0-valent) …0 0 1s[j] is 0-valent …0 1 1s[j+1] is 1-valent …………(differ in j th position) s[n-1] …1 1 1{1-valent} What if process (j+1) crashes at the first step?
Computation Valance of is a bivalent computation. –Based on the definition of the consensus problem
Existence of Decider Process At some state, the computation must turn from bivalent computation to a univalent computation
If decider process slows down Just about the time when it was about to make the decision
Summary In a purely asynchronous system, there is no solution to the consensus problem if a single process crashes.. Note that this is true for deterministic algorithms only. Solutions do exist for the consensus problem using randomized algorithm, or using the synchronous model.