Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 8 – Fault Tolerance Section 8.5 Distributed Commit Heta Desai Dr. Yanqing Zhang Csc 8320 - Advanced Operating Systems October 14 th, 2015.

Similar presentations


Presentation on theme: "Chapter 8 – Fault Tolerance Section 8.5 Distributed Commit Heta Desai Dr. Yanqing Zhang Csc 8320 - Advanced Operating Systems October 14 th, 2015."— Presentation transcript:

1 Chapter 8 – Fault Tolerance Section 8.5 Distributed Commit Heta Desai Dr. Yanqing Zhang Csc 8320 - Advanced Operating Systems October 14 th, 2015

2 Today’s Presentation Outline  Background and terms  Problem Formulation  Two-Phase Commit  Three-Phase Commit  Future work

3 Background  What is the problem?  Atomic Multicasting - more general problem due to distributed commit  Why is it important?  Atomic multicast ensures that : (1) The correct addressees of every message agree either to deliver or not to deliver (2) No two correct processes deliver any two messages in a different order

4 Distributed Commit  Given a process group and an operation  The operation might or might not be committable at all processes  Either everybody eventually commits or everybody eventually aborts  Even servers which crash and come back to live  Consistency: all nodes see the same data at the same time.  Availability: node failures do not prevent survivors from continuing to operate.

5 Distributed Commit  Can we not just do this with Virtual Synchrony?  Coordinator multicasts vote request  All processes respond to request  Coordinator multicasts vote result  COMMIT iff all vote COMMIT  This handles some error cases  But, what if a participant B crashes between a backup votes COMMIT and the COMMIT result is broadcast and then comes back to live?

6 What can go wrong?

7 Two Phase Commit (2PC)

8 The finite state machine for the coordinator. The finite state machine for the participant

9 Two Phase Commit (2PC)  Failures – Crash and omission – Detect via timeouts  Processes may recover – Need for logging states

10 Two Phase Commit (2PC) - Perspective  Coordinator think – Blocks in wait = Participant may have failed  That participant might vote ABORT, in which case a GLOBAL COMMIT would be wrong and irreversible  So, must do a GLOBAL ABORT  Participant think – Blocks in Ready = Coordinator may have failed  Some participants may have already committed

11 Two Phase Commit (2PC)

12

13 Actions taken by a participant P when residing in state READY and having contacted another participant Q

14 Two Phase Commit – Bad State so Yellow needs to wait for Blue or Green to come up again and inspect their log files!

15 Two Phase Commit (2PC)  Two-Phase Commit has the problem that if the coordinator and one participant crashes at a bad time the entire system freezes until one of them is up again  Getting a server up and running again typically involves human (a.k.a. very slow) intervention

16 Three Phase Commit  Three-Phase Commit enhances Two Phase Commit in that it is non-blocking in many more cases  As long as the live participants can make a majority decision they can continue on their own  Majority among all, not only the live ones  If there are many participants, this makes it very unlikely that 3PC blocks

17 Three Phase Commit  The states of the coordinator and each participant satisfy the following two conditions:  There is no single state from which it is possible to make a transition directly to either a COMMIT or an ABORT state.  There is no state in which it is not possible to make a final decision, and from which a transition to a COMMIT state can be made.

18 Three Phase Commit

19  So what if a failure occurs? – Need to be able to recover to a correct state  Backward recovery – Bring the system to backward to a correct, previous state – Restore  Forward recovery – Bring the system forward to a correct, new state

20 References 1.Mikito Takada. “Distributed Systems for fun and profit” Online ebook http://book.mixu.net/distsys/ebook.html#intro http://book.mixu.net/distsys/ebook.html#intro 2.Chapter Slides. http://www.comp.nus.edu.sg/~tankl/cs5225/2008/commit2.pdfhttp://www.comp.nus.edu.sg/~tankl/cs5225/2008/commit2.pdf 3.Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. 0-13-239227-5 4.Part 2 slides https://services.brics.dk/java/courseadmin/dDist/documents/getDocument/Fault+Tolerance +(2).pdf?d=113839 https://services.brics.dk/java/courseadmin/dDist/documents/getDocument/Fault+Tolerance +(2).pdf?d=113839 5.Genuine Atomic Multicast, in Proc.11th Int. Workshop on Distributed Algorithms (WDAG’97), Lecture Notes in Computer Science, vol. 1320, Springer, Berlin, pp. 141–154.


Download ppt "Chapter 8 – Fault Tolerance Section 8.5 Distributed Commit Heta Desai Dr. Yanqing Zhang Csc 8320 - Advanced Operating Systems October 14 th, 2015."

Similar presentations


Ads by Google