Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault Tolerance (1) DISTRIBUTED SYSTEMS (dDist) 2014
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Plan Basic Concepts Process Resilience Reliable communication –Client-server communication –Group communication Distributed commit Recovery Tolerating failures Agreement/Consensus is the core of the problem
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Failure Models Figure 8-1. Different types of failures. Fail-stop Fail-silent “Byzantine” failures
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved The Two-Army Problem Blue army needs to decide whether to attack red army –Blue: soldiers –Red: 4000 soldiers If only one division of blue army attacks then disaster Blue army uses messenger –Subject to capture by red army How can blue armies reach agreement on attack? –They cannot… 1) 2a) 2b) Omission failure
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Two-Army Problem Attack! Ack Did you receive Ack? Ack Did you receive Ack? Ack
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement/Consensus is Fundamental Synchronization Electing a coordinator Consistency War between little blue and red guys A prerequisite for distributed commit –More later…
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems (1/2) n processors Process P i has an input value v i The processes talk using some protocol Each process P j outputs a value w j Processes can be correct or incorrect Consistency: correct P i and P j : w i = w j Validity: correct P i a correct P j : w i = v j
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems (2/2) n processors Process P i has an input value v i The processes talk using some protocol Each process P j outputs a value w j Processes can be correct or incorrect Consistency: correct P i and P j : w i = w j Validity: correct P i a correct P j : w i = v j In particular: If all correct processes has the same input v, then they all get v as output
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems Possible “axis”: 1.Synchronous versus asynchronous systems –Processes are synchronous iff there exists a constant s ≥ 1 so that whenever any process has taken at least s steps, all other processes have taken at least one step 2.Communication delay is bounded or not –Delay is bounded iff all messages sent by a process arrives within r real-time steps, for some predetermined r –The other case is called eventual delivery or unbounded 3.Message transmission is done through unicasting or multicasting 4.Message delivery is ordered or not –Message delivery is ordered iff delivery of messages is ordered the same way as the sending of messages –Important distinction for multicast The Two-Army Problem?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Requirements to Agreement Consistency –All correct processes end up agreeing on the same value Validity –The agreed-upon value was the input to at least one of the correct processes Termination –Each process decides on a value within a finite number of steps Attack! Ack Did you receive Ack? Ack Did you receive Ack? Ack
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems Figure 8-4. Circumstances under which distributed agreement/consensus can be reached. Assumes fail-stop failures Note that Figure 8-4 is wrong in [T&S, 2007 hardcover] Asynchronous Synchronous Multicast Unicast
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems Agreement is possible in Case 1 –Processes are synchronous and communication delay is bounded Case 2 –Messages are ordered and the transmission mechanism is multicast Case 3 –Processes are synchronous and messages are ordered Can use time-outs to see if process has failed. Each multicasts initial value, all non-failed processes choose first value received Obscure algorithm with an exponential number of messages
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Byzantine Generals’ Problem A group of Byzantine generals camped around an enemy city –Must agree on a common battle plan Attack? Retreat? –Some of the generals may be traitors –Direct communication Model –Synchronous processes –Bounded communication –Unicast communication –Maybe – Byzantine rather than fail-stop process failures… Is agreement possible? P1 P2 P4 P3
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Byzantine Agreement n processors Process P i has an input value v i {A,R} The processes talk using some protocol Each process P j outputs a value w i {A,R} Processes can be correct or incorrect, and incorrect might send arbitrary values in the protocol Consistency: correct P i and P j : w i = w j Validity: If all correct processes has the same input, then that common input must become the output: correct P i a correct P j : w i = v j
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved A Protocol for BA? Each of n generals decides what to do v(i) in {Attack,Retreat} Send v(i) to other generals Each general decides outcome based on majority of values –Assume biased towards attack… Two properties wanted –Loyal generals decide on the same plan of action –A small number of traitors cannot cause loyal generals to adopt a bad plan Assume General 1 receives: What does he decide? Traitor May have sent R to others…
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Trust is Bad A R R R
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Can four generals tolerate one traitor? Assume P1 receives: What must he decide? –Hint: Validity Assume P1 receives: What must he decide? –Hint: Consistency Assume P1 receives: What must he decide? –Hint: Consistency Assume P1 receives: What must he decide? –Hint: Consistency Assume P1 receives: What must he decide? –Hint: Consistency Assume P1 receives: What must he decide? –Hint: Validity AAAAA RAAAA RRAAA RRRAA RRRRA RRRR R
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems Sufficient if we can obtain the following stronger conditions: 1.Each general collects list of the others’ votes 2.The value, v(i), for a loyal general i is used by all other loyal generals 3.Any two loyal generals use the same value of v(i) [even if i is a traitor] Then all generals have the same view on “votes” and can make a common decision based on that
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Protocol in [T&S, 2007] for 1 Traitor Pages in the hardcover edition are drunken nonsense! 1.P i unicast v(i) to all 2.P i assembles vector of received values [v(1), …, v(n)] 3.P i unicast [P i, [v(1), …, v(n)]] 4.P i assembles result vector –A) For each j≠i, look at j’th element of vectors not received from j –B) j’th element of result vector is majority of elements of A)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Z = majority of x1, x2, x3 For P1, P2, P4 y5 y6y7y8 y9 y10y11 y12
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems (1/2) In general for Byzantines failures: –Agreement can be reached iff for m faulty processes, there are at least 2m + 1 non-faulty processes In particular Byzantine agreement is impossible for 2 correct and 1 faulty process
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Agreement in Faulty Systems (2/2) Intuition for impossibility with 1 traitor out of 3: Reporting what others sent does not help. –If 2 reports on 1 to 1, that is just silly. –If 2 reports on 2 to 1 that is equally silly. –If 2 reports on 3 to 1 and there is a disagreement, how does 1 decide if it was 2 who sent a wrong value or it was 2 who reported a wrong value?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Reliable Client-Server Communication Point-to-point communication –E.g., using TCP –Masks omission failures via acknowledgements and retransmissions (using timeouts) RPC –…
RPC Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client Server Possible failures? Possible failures? request reply
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved RPC Semantics in the Presence of Failures Five different classes of failures that can occur in RPC systems: 1.The client is unable to locate the server. 2.The request message from the client to the server is lost. 3.The server crashes after receiving a request. 4.The reply message from the server to the client is lost. 5.The client crashes after sending a request.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client Unable to Locate Server Not much to do except to throw an Exception…
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Request Message Lost Client may retransmit message after timer expires If it is certain that the message was lost, otherwise it must take care … –Much more on this later
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Lost Reply Add sequence number on request –Store reply when sent –On new requests, check on server against stored sequence numbers + replies –Need stateful server
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Client Crashes Can lead to orphan computations at server –Orphans may use up valuable resources –Orphans may confuse rebooting clients when they send the reply back Partial Solutions –Orphan extermination Log RPCs to stable storage before sending them (client) Kill of orphans on reboot (client) –Reincarnation Client broadcasts epoch when it (re)boots Orphans from previous epochs are killed (server) –Gentle reincarnation Only kill orphans if parent cannot be reached (server) –Expiration Allocate quantum of time to RPC (server)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes Figure 8-7. A server in client- server communication. (a) The normal case. (b) Crash after execution. (c) Crash before execution.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes Client cannot distinguish causes for No REP –But correct behavior may differ for (b) and (c) Possible approaches for stub –At-least-once semantics Client retries until it gets reply –At-most-once semantics Client gives up and reports failure –Exactly-once semantics Impossible… But we can make the probability of failure very low
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes Exactly-once semantics again Example –Client wants to print document on server –Three events that can happen at example server: Send the completion message (M), Print the text (P), Crash (C). –Two different strategies at server P → M M → P
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes These events can occur in six different orderings: 1.M →P →C A crash occurs after sending the completion message and printing the text. 2.M →C (→P) A crash happens after sending the completion message, but before the text could be printed. 3.C (→M →P) A crash happens before the server could do anything. 4.P →M →C A crash occurs after sending the completion message and printing the text. 5.P →C (→M) The text printed, after which a crash occurs before the completion message could be sent. 6.C (→P →M) A crash happens before the server could do anything.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Server Crashes Assume server crashes, subsequently recovers, and notifies clients –What can client do? Client does not know whether server crashed before or after printing
3. Server Crashes If at-least-once semantics desirable –Resend request until reply is received –Idempotent procedures needed Operations, O, such that O = OO Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
Operations, O, such that O = OO –x := 0 is idempotent –x := x + 1 is not –Print is not –Read “next” block of a file is not idempotent –Read block 3 of a file is idempotent –Write “next” block of a file is not idempotent –Write block 3 of a file is idempotent Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Idempotent Operations
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved A Possible Solution? P D M D = Write “printed” to a log –After sending M, delete the entry –After a crash, if “printed” is in the log, resend M and then resend the acknowledgement Now at most we can get extra acknowledgements!? No: PDC(M) versus PCDM
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Window of Failure If failure is unavoidable, minimize its probability, e.g., by minimizing the window of failure! P D M D
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Reliable Group Communication Maintain process group which see the same messages Issues –“Reliable” what type of error do we handle? –If members join/leave while multicasting? –If members fail while multicasting? Typically build on top of unreliable multicasting
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Basic Reliable-Multicasting Schemes Figure 8-9. A simple solution to reliable multicasting when all receivers are known and are assumed not to fail. Omission errors only! (a) Message transmission. (b) Reporting feedback.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Nonhierarchical Feedback Control Figure Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Hierarchical Feedback Control Figure The essence of hierarchical reliable multicasting. Each local coordinator forwards the message to its children and later handles retransmission requests.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Atomic Multicast – (Fail-Stop) Guarantee that message is delivered to all group members or none at all –At message delivery need to agree on group membership (who is “all” i.e. who is alive) E.g., for distributed update
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Virtual Synchrony Figure The principle of virtual synchronous multicast.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Virtual Synchrony Reliable multicasting –Different orderings possible E.g., none, FIFO, causal, total View-based –Multicast from non-faulty process delivered to all non-faulty processes in view –Multicast from failed process delivered to or ignored by all non-faulty processes in view Require agreement on both messages and view
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony Main issue –Handle group membership/view changes Assume reliable, in-order point-to-point communication –A process p that wants to multicast m uses point-to- point communication of m to each view member –E.g., use TCP What if p fails during multicasting? –Some processes may have received m others may not –Failure detection + view change
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Failure Detection Essentially two approaches used –Pinging Are you alive? Yes! –Heartbeats I am alive!, I am alive!, I am alive!, I am alive!, …
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony Figure (a) Process 4 notices that process 7 has crashed and sends a view change.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony Figure (b) Process 6 sends out all its un-acknowledged messages, followed by a flush message Other non-failed processes do the same All messages from 7 that have been received by at least one process will be delivered to all!
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Implementing Virtual Synchrony Figure (c) Process 6 installs the new view when it has received a flush message from everyone else.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Summary Independent failures is a defining characteristic of distributed systems –Fault tolerance and reliability are fundamental to distributed systems Process failures –Replication/process groups is a way of handling failure –There are limits to fault tolerance – agreements Fail-stop failures Byzantine failures Communication failures –Client-server communication –Group communication