Presentation is loading. Please wait.

Presentation is loading. Please wait.

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Similar presentations


Presentation on theme: "DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S"— Presentation transcript:

1 DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

2 Fault Tolerance Basic Concepts
Being fault tolerant is strongly related to what are called dependable systems Dependability implies the following: Maintainability Availability Reliability Safety MARS Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

3 Figure 8-1. Different types of failures.
Failure Models Node failure models: Fail-stop, Announced failure, Byzantine (arbitrary) failure Figure 8-1. Different types of failures. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

4 Failure Masking by Redundancy
Figure 8-2. Triple modular redundancy (TMR). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

5 Flat Groups versus Hierarchical Groups
Figure 8-3. (a) Communication in a flat group. (b) Communication in a simple hierarchical group. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

6 Agreement in Faulty Systems (1)
Possible cases: Synchronous versus asynchronous systems. No more than c times speed difference Communication delay is bounded or not. Message delivery is ordered or not. Real global time of send Message transmission is done through unicasting or multicasting. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

7 Agreement in Faulty Systems (2)
X X X X Figure 8-4. Circumstances under which distributed agreement can be reached. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

8 Agreement in Faulty Systems (3)
Figure 8-5. The Byzantine agreement problem for three nonfaulty and one faulty process. (a) Each process sends their value to the others. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

9 Agreement in Faulty Systems (4)
From 2 From 3 From 4 P1 sees: 1: (1,2,x,4) 2: (1,2,y,4) 3: (a,b,c,d) 4: (1,2,z,4) P2 sees: 1: (1,2,x,4) 2: (1,2,y,4) 3: (e,f,g,h) 4: (1,2,z,4) Majority: (1,2,?,4), conclude 3 is bad Hence: (1,2,?,4), 3 is bad So which process should 1 suspect? 2? 3? Why? Figure 8-5. The Byzantine agreement problem for three nonfaulty and one faulty process. (b) The vectors that each process assembles based on (a). (c) The vectors that each process receives in step 3. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

10 Agreement in Faulty Systems (5)
And 2 can decide (1,2,y) and ?? is bad 1 Sees: From 1: (1,2,x) From 2: (1,2,y) From 3: (1,b,x) So which process should 1 suspect? Suppose a=1 and c=x... Majority: (1,2,x) and ?? is bad Figure 8-6. The same as Fig. 8-5, except now with two correct process and one faulty process. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

11 K-Fault Tolerance K-fault tolerant = any k failures can be tolerated
Fail-stop: k+1 redundancy suffices Byzantine: 2k+1 required Even this requires (effective) atomic multicast Ultimately, need to analyze system requirements Implement fault tolerance at multiple levels Use statistical/probabilistic analysis Have a plan B Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

12 RPC Semantics in the Presence of Failures
Five different classes of failures that can occur in RPC systems: The client is unable to locate the server. The request message from the client to the server is lost. The server crashes after receiving a request. The reply message from the server to the client is lost. The client crashes after sending a request. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

13 Server Crashes (1) Figure 8-7. A server in client-server communication. (a) The normal case. (b) Crash after execution. (c) Crash before execution. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

14 Server Crashes (2) Three events that can happen at the server:
Send the completion message (M), Print the text (P), Crash (C). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

15 Server Crashes (3) These events can occur in six different orderings:
M →P →C: A crash occurs after sending the completion message and printing the text. M →C (→P): A crash happens after sending the completion message, but before the text could be printed. P →M →C: A crash occurs after sending the completion message and printing the text. P→C(→M): The text printed, after which a crash occurs before the completion message could be sent. C (→P →M): A crash happens before the server could do anything. C (→M →P): A crash happens before the server could do anything. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

16 Server Crashes (4) Figure 8-8. Different combinations of client and server strategies in the presence of server crashes. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

17 Basic Reliable-Multicasting Schemes
Figure 8-9. A simple solution to reliable multicasting when all receivers are known and are assumed not to fail. (a) Message transmission. (b) Reporting feedback. Problem: “ACK implosion” (or “NAK implosion” if NAKs only) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

18 Nonhierarchical Feedback Control
Figure Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others. Problem: May have to process many NAKs per message! A “solution”: Have multicast channel per NAK number.... Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

19 Hierarchical Feedback Control
Figure The essence of hierarchical reliable multicasting. Each local coordinator forwards the message to its children and later handles retransmission requests. Can also provide ACKs hierarchically – node ACKs to its parent when all its children ACK Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

20 Digital Fountain Approach
Use erasure correction to handle missing messages: Use erasure correction code (receiver knows which messages are missing, hence treats as erasures) Intersperse recovery messages with data messages to provide redundancy needed to recover missing msgs As long as receiver gets enough data messages and recovery messages, all messages can be recovered Avoids retransmissions at expense of redundancy Can adapt rate at which recovery messages are sent to network conditions See M. Luby – Fountain Codes, Raptor Codes; e.g., Especially applicable when don't care who comes and goes – all receivers just “drink from the fountain” Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

21 Digital Fountain Approach(2)
Simple Example: Sender sends messages M1, M2, M3, along with recovery message P3,3 that covers M1, M2, and M3 Receiver 1 gets M2, M3, and P3,3 – is able to recover M1 using these Receiver 2 gets M1, M3, and P3,3 – is able to recover M2 from these Receiver 3 gets M1, M2, and P3,3 – is able to recover M3 from these Receiver 4 gets M1, M2, and M3 – doesn't care that it missed P3,3 – it doesn't need it If two or more of these messages are missed, then receiver has to wait for a later recovery message that covers this range also – it will eventually be sent.... Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

22 Virtual Synchrony (1) Figure The logical organization of a distributed system to distinguish between message receipt and message delivery. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

23 Figure 8-13. The principle of virtual synchronous multicast.
Virtual Synchrony (2) Figure The principle of virtual synchronous multicast. Group G defines “epochs” of membership Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

24 Message Ordering (1) Four different orderings are distinguished:
Unordered multicasts FIFO-ordered multicasts Causally-ordered multicasts Totally-ordered multicasts Orderings follow specifications given earlier – causality determined by “happen-before” relation (local and message send-receive) on events, FIFO local to sender only – each inducing a partial order. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

25 Message Ordering (2) Figure 8-14. Three communicating processes in the
same group. The ordering of events per process is shown along the vertical axis. Order of receipt does not have to be order of delivery. If delivered in order of receipt, would violate FIFO ordering at P3 (hence also violate causal order) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

26 Message Ordering (3) Figure Four processes in the same group with two different senders, and a possible delivery order of messages under FIFO-ordered multicasting Each delivery order only has to obey restriction that m1 is delivered before m2, and m3 is delivered before m4. Note that since there is no causal relationship between the messages sent at P1 and those sent at P4, this also satisfies causal order. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

27 Implementing Virtual Synchrony (1)
Figure Six different versions of virtually synchronous reliable multicasting. Atomic only implies all-or-nothing delivery with some total order – constraints on order is orthogonal to this.... Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

28 Impl. Virtual Synchrony (1.5)
Isis system (Birman) – messages only have to be delivered to operational nodes in group So change group when member joins, leaves, or dies Assumes underlying unicast is reliable (TCP) Classify message as stable (know that all nodes in group have received it) ... or unstable (otherwise) And only deliver stable messages Group-change messages sent when node joins, leaves, or dies – sent to all operational nodes (in new group) Flush all unstable messages (to surviving members of old group) and mark them as stable – then send flush msg Install new view when receive flush from all survivors Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

29 Implementing Virtual Synchrony (2)
Figure (a) Process 4 notices that process 7 has crashed and sends a view change. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

30 Implementing Virtual Synchrony (3)
Note: only have to send an unstable message to nodes that are not known to have received it Figure (b) Process 6 sends out all its unstable messages, followed by a flush message. FIFO delivery insures Flush is received after unstable msgs Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

31 Implementing Virtual Synchrony (4)
Figure (c) Process 6 installs the new view when it has received a flush message from everyone else. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

32 Two-Phase Commit (1) Coordinator Participant
Really, commit request from any *from all participants Figure (a) The finite state machine for the coordinator in 2PC. (b) The finite state machine for a participant. *Commit is like AND of participant votes (except coordinator may elect to abort anyway!) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

33 All participants vote to commit, no messages lost.
Two-Phase Commit (1.1) Vote-req Store, send Global-Commit W W C W W W P1 P2 W W W W P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK W=write to stable storage (intentions list) Normal behavior of 2PC All participants vote to commit, no messages lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

34 All participants vote to commit, Vote-req message lost.
Two-Phase Commit (1.2) Vote-req TO, Resend Store, send Global-Commit C * P1 P2 P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Normal behavior of 2PC All participants vote to commit, Vote-req message lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

35 Two-Phase Commit (1.25) Normal behavior of 2PC
Vote-req Timeout, ABORT C P1 P2 P3 Store Tx, Send vote-commit Store Global-ABORT, Send ACK Normal behavior of 2PC All participants vote to commit, Vote-req message lost, Coordinator elects to abort. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

36 Two-Phase Commit (1.3) Normal behavior of 2PC
Vote-req TO, Resend Store, send Global-Commit C P1 P2 P3 Store Tx, Send vote-commit Resend Store Global-Commit, Send ACK Normal behavior of 2PC All participants vote to commit, YES message lost. Coordinator could also decide to abort (not shown here). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

37 All participants vote to commit, Global-Commit message lost.
Two-Phase Commit (1.4) Vote-req Store, send Global-Commit TO, Resend C P1 P2 P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Normal behavior of 2PC All participants vote to commit, Global-Commit message lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

38 All participants vote to commit, ACK message lost.
Two-Phase Commit (1.5) Vote-req Store, send Global-Commit TO, Resend C P1 P2 P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Resend Normal behavior of 2PC All participants vote to commit, ACK message lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

39 Two-Phase Commit (1.6) Behavior of 2PC when Participant crashes...
Vote-req Store, send Global-Commit TO, Resend C P1 P2 P2 crash P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Behavior of 2PC when Participant crashes... All participants vote to commit, no messages lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

40 Two-Phase Commit (1.65) Behavior of 2PC when Participant crashes...
Vote-req Store, send Global-Commit Resend C P1 P2 P2 crash TO, Resend vote-commit P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Behavior of 2PC when Participant crashes... All participants vote to commit, no messages lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

41 Two-Phase Commit (1.7) Behavior of 2PC when Participant crashes...
Vote-req Store, send Global-Commit C P1 P2 P2 crash P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Behavior of 2PC when Participant crashes... All participants vote to commit, no messages lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

42 Two-Phase Commit (1.7) Behavior of 2PC when Participant crashes...
Vote-req Store, send Global-Commit C P1 P2 P2 crash P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Behavior of 2PC when Participant crashes... All participants vote to commit, no messages lost. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

43 Two-Phase Commit (2) Figure Actions taken by a participant P when residing in state READY and having contacted another participant Q. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

44 Two-Phase Commit (3) . . . Figure Outline of the steps taken by the coordinator in a two-phase commit protocol. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

45 Two-Phase Commit (4) . . . Figure Outline of the steps taken by the coordinator in a two-phase commit protocol. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

46 Figure 8-21. (a) The steps taken by a participant process in 2PC.
Two-Phase Commit (5) Figure (a) The steps taken by a participant process in 2PC. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

47 Figure 8-21. (b) The steps for handling incoming decision requests..
Two-Phase Commit (7) Figure (b) The steps for handling incoming decision requests.. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

48 Two-Phase Commit (1.8) Behavior of 2PC when Coordinator crashes
Vote-req Store, send Global-Abort C C crashes P1 P2 P3 Store Tx, Send vote-commit Store Global-Abort, Send ACK Behavior of 2PC when Coordinator crashes All participants vote to commit, but C aborts. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

49 Two-Phase Commit (1.9) Behavior of 2PC when Coordinator crashes
Vote-req Store, send Global-Commit C C crashes P1 P2 P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Behavior of 2PC when Coordinator crashes All participants vote to commit, and C commits. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

50 Two-Phase Commit (1.95) Behavior of 2PC when Coordinator crashes
Vote-req Store, send Global-Commit C C crashes TO, ask P2 P1 TO, ask C P2 P3 Store Tx, Send vote-commit Store Global-Commit, Send ACK Behavior of 2PC when Coordinator crashes All participants vote to commit, and C commits. At least one participant receives Global-Commit. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

51 Two-Phase Commit (1.96) Behavior of 2PC when Coordinator crashes
Vote-req Vote-req Store, send Global-Abort C C crashes TO, ask C, then Ask P3 P1 NO P2 P3 Store Tx, P1, P3 send YES P2 sends NO Store Global-Abort, Send ACK P3 knows Global-Abort Behavior of 2PC when Coordinator crashes Coordinator sends global-abort. Result known since Global-Abort message received. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

52 Two-Phase Commit (1.97) Behavior of 2PC when Coordinator crashes
Vote-req Vote-req Store, send Global-Abort C C crashes TO, ask P2 P1 NO TO, ask C P2 P2 knows Global-Abort P3 Store Tx, P1, P3 send YES P2 sends NO Behavior of 2PC when Coordinator crashes One participant votes to abort, forces global-abort. Result known even if no Global-Abort message received. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

53 Two-Phase Commit (1.98) Behavior of 2PC when Coordinator crashes
Vote-req C ??? C crashes TO, ask others P1 ??? P2 ??? P3 ??? Store Tx, Send vote-commit Timeout, resend vote-commits Behavior of 2PC when Coordinator crashes All participants vote to commit, but none hear from C. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

54 Three-Phase Commit (1) The states of the coordinator and each participant satisfy the following two conditions: There is no single state from which it is possible to make a transition directly to either a COMMIT or an ABORT state. There is no state in which it is not possible to make a final decision, and from which a transition to a COMMIT state can be made. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

55 Three-Phase Commit (2) Observation: no node can be in INIT if any other node is in PRECOMMIT If participant stuck in READY or PRECOMMIT, contact others – if any in INIT or ABORT, then ABORT if majority in PRECOMMIT, then COMMIT else ABORT Figure (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

56 Stable Storage Want persistent and reliable storage (more than just a hard disk): Intentions list for 2PC and 3PC Checkpoints and message logging for more general recovery Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

57 Recovery – Stable Storage
Write = {write hd0, read hd0, check, then write hd1, read hd1, check} Read = {read hd0 & check, read hd1 & check, then compare } hd0 (a) (b) (c) hd1 If a differs from a', which value to use? Always use a' (if checksum OK) – hd0 always written first! Figure (a) Stable storage. (b) Crash after drive 1 is updated. (c) Bad spot. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

58 RAID Storage (1) No redundancy, pipelined for speed No redundancy,
RAID levels 0 through 2. Backup and parity drives are shown shaded Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

59 RAID Storage (2) Bit level parity Strip level parity
Distribute parity strips RAID levels 3 through 5. Backup and parity drives are shown shaded Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

60 Figure 8-24. A recovery line.
Checkpointing Figure A recovery line. Distributed snapshot = recording of a consistent, global state - can't have a received message that wasn't sent! Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

61 Checkpointing - Rollback
P2 crashes, rolls back to C23 – but this requires P1 to return to C14, which causes P2 to return to C22, causing P1 to return to C13. Now we are OK, and the system can resume. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

62 Independent Checkpointing
Figure The domino effect, a.k.a. cascading rollbacks. P2 crashes, rolls back to C23 – but this requires P1 to return to C13, which causes P2 to return to C22 (due to m), causing P1 to return to C12 (due to m'). Now P2 has to return to C21, then P1 has to return to C11, ... until both are at initial state! Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

63 Independent Checkpointing
Each process maintains dependency information – which interval between checkpoints that it has depends on which intervals of other processes' checkpoint intervals. Dependency information is stored along with checkpoint info. When a process P rolls back to some checkpoint C, each other process has to roll back to a checkpoint that does not depend on any interval after the rollback point C of P. This may cause other processes (including P) to roll back further. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

64 Coordinated Checkpointing
To avoid complexity and cost of independent checkpointing with message logging, coordinated checkpointing is done. Here, processes take a distributed snapshot – coordinate their checkpoints so that there are no cascading dependencies. Coordinator multicasts a CHECKPOINT-REQUEST message Each process takes a checkpoint, sends an ACK, and delays any outgoing application messages When the coordinator receives an ACK from each process, it multicasts a CHECKPOINT-DONE message When a process receives the CHECKPOINT-DONE message, it may resume sending application messages Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

65 Message Logging (1) Checkpointing is expensive – writes to stable storage! Reduce number of checkpoints by message logging – if we replay all messages after a checkpoint, all processes can reach a globally consistent state without having to save that state in stable storage. Approach works well with piecewise deterministic computation – local operations between message sends/receives are considered deterministic – only message passing is random. Logged messages include sender, receiver, sequence # (for duplicate detection on replay), and maybe delivery order. A stable message is one that can not be lost (e.g., in stable stg). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

66 Message Logging (2) Each message m has set DEP(m) of processes that depend (directly or indirectly) on delivery of m. These are processes to which m is delivered, or to which a message m' is delivered that is causally dependent on delivery of m. COPY(m) is processes with copy (not yet in stable storage) of m. A process Q is an orphan process if there is a message m such that Q is in DEP(m), but every process in COPY(m) has crashed; Q depends on m, but there is no way to replay m. Avoid orphans by ... KILLING THEM!!! Well, at least make sure that if Q is in DEP(m), it is also in COPY(m). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

67 Message Logging (3) Pessimistic logging protocols:
At most one process depends on any nonstable message m A process delivering m is not allowed to send any messages until m has been written to stable storage Optimistic logging protocols: Work done after a crash occurs (hope this doesn't happen!) If for some message m, all processes in COPY(m) have crashed, then any orphan in DEP(m) is rolled back to a state in which it is no longer in DEP(m). Must track dependencies – complicated.... Just use pessimistic logging! Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved

68 Characterizing Message-Logging Schemes
Figure Incorrect replay of messages after recovery, leading to an orphan process. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved


Download ppt "DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S"

Similar presentations


Ads by Google