Download presentation
Presentation is loading. Please wait.
Published byHoward Alexander Modified over 9 years ago
1
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Chapter 8 Fault Tolerance (2) DISTRIBUTED SYSTEMS (dDist)
2
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Distributed Commit (1/2) Given a process group and an operation –The operation might or might not be committable at all processes Either everybody commits or everybody aborts –Consistency, validity, termination
3
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Distributed Commit (2/2) Can we not just do this with Virtual Synchrony? –Coordinator multicasts vote request –All processes respond to request –Coordinator multicasts vote result COMMIT iff all vote COMMIT This handles some error cases But, what if a participant B crashes between a backup votes COMMIT and the COMMIT result is broadcast and then comes back to live?
4
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Two-Phase Commit 1) Commit → 2) Vote-request → 3) Vote-commit ← 4) Global-commit →
5
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Two-Phase Commit Figure 8-18. (a) The finite state machine for the coordinator in 2PC. (b) The finite state machine for a participant. Input event Output event COORDINATORPARTICIPANT
6
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Two-Phase Commit 2PC detects crashes via timeouts 2PC handles crashes by logging state to permanent storage, turning crash errors into reset errors
7
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Coordinator Perspective Blocks in WAIT –Participant may have failed –That participant might vote ABORT, in which case a GLOBAL COMMIT would be wrong and irreversible –So, must do a GLOBAL ABORT TIMEOUT COORDINATOR
8
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Coordinator Perspective Figure 8-20. Outline of the steps taken by the coordinator in a two-phase commit protocol.... COORDINATOR
9
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Coordinator Perspective Figure 8-20. Outline of the steps taken by the coordinator in a two-phase commit protocol.... COORDINATOR
10
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Participant Perspective Blocks in READY –Coordinator may have failed What to do? –Some participants may already have committed… –Perhaps another participant knows what to do…? PARTICIPANT
11
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Participant Perspective Figure 8-19. Actions taken by a participant P when residing in state READY and having contacted another participant Q. We know that coordinator managed to start commit At least one participant aborted and coordinator noticed Q did not even receive vote-request, so no one committed yet What if all in READY? After timeout allowing all messages in transit to arrive:
12
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Two-Phase Commit Figure 8-21. (a) The steps taken by a participant process in 2PC. PARTICIPANT
13
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 All READY (1/2) ? Why do we block when all live participants are in the READY state? PARTICIPANTCOORDINATOR
14
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 All READY (2/2) ? Same view, but different decisions, so Yellow needs to wait for Blue or Green to come up again and inspect their log files! PARTICIPANTCOORDINATOR
15
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Two-Phase Commit Two-Phase Commit has the problem that if the coordinator and one participant crashes at a bad time the entire system freezes until one of them is up again Getting a server up and running again typically involves human (a.k.a. very slow) intervention
16
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Three-Phase Commit Three-Phase Commit enhances Two- Phase Commit in that it is non-blocking in many more cases As long as the live participants can make a majority decision they can continue on their own If there are many participants, this makes it very unlikely that 3PC blocks
17
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Figure 8-22. (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant. TIMEOUT PARTICIPANTCOORDINATOR Three-Phase Commit
18
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Figure 8-22. (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant. TIMEOUT PARTICIPANTCOORDINATOR Three-Phase Commit
19
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 On timeout: IF anyone in ABORT ABORT ELIF anyone in COMMIT COMMIT ELIF anyone in INIT ABORT ELSE elect new coordinator among the live New Coordinator: Go to WAIT and from there goto ABORT or PRECOMMIT ABORT: If a majority of participants are in READY PRECOMMIT: If a majority are in PRECOMMIT If no majority, then block PARTICIPANTCOORDINATOR
20
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 PARTICIPANTCOORDINATOR On timeout: IF anyone in ABORT ABORT ELIF anyone in COMMIT COMMIT ELIF anyone in INIT ABORT ELSE elect new coordinator among the live New Coordinator: Go to WAIT and from there goto ABORT or PRECOMMIT ABORT: If a majority of participants are in READY PRECOMMIT: If a majority are in PRECOMMIT If no majority, then block If anyone is in PRECOMMIT, then original coordinators vote is set to be PRECOMMIT, as the original coordinator must be in PRECOMMIT
21
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 PARTICIPANTCOORDINATOR On timeout: IF anyone in ABORT ABORT ELIF anyone in COMMIT COMMIT ELIF anyone in INIT ABORT ELSE elect new coordinator among the live New Coordinator: Go to WAIT and from there goto ABORT or PRECOMMIT ABORT: If a majority of participants are in READY PRECOMMIT: If a majority are in PRECOMMIT If no majority, then block If anyone is in PRECOMMIT, then original coordinators vote is set to be PRECOMMIT, as the original coordinator must be in PRECOMMIT
22
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 More Non-Blocking Follows from the decision rules that the live agents always can make decisions on their own unless no true majority for READY or PRECOMMIT can be found True majority: Majority among all processes, both dead and live
23
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Correctness (1/4) Let P and Q be any two processes which both acted as coordinator at some point THEOREM It can never happen that P is in ABORT and Q is in COMMIT Proof: 1.When P went to ABORT there was a true majority in READY 2.When Q went to COMMIT there was a true majority in PRECOMMIT 3.These two configurations are mutually exclusive
24
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Correctness (2/4) By construction: If there is a process in ABORT, then there is a coordinator in ABORT PARTICIPANTCOORDINATOR
25
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Correctness (3/4) Bu construction: If there is a process in COMMIT, then there is a coordinator in COMMIT PARTICIPANTCOORDINATOR
26
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Correctness (4/4) Let P and Q be any two processes COROLLARY It can never happen that P is in ABORT and Q is in COMMIT
27
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Summary Looked at Distributed Commit Distributed commit –2PC – blocking, has a bad state –3PC – less blocking, but not widely used in practice
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.