1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.

Slides:



Advertisements
Similar presentations
6.852: Distributed Algorithms Spring, 2008 Class 7.
Advertisements

CS542: Topics in Distributed Systems Distributed Transactions and Two Phase Commit Protocol.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CSC 4320/6320 Operating Systems Lecture 13 Distributed Coordination
Network Protocols Mark Stanovich Operating Systems COP 4610.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Exercises for Chapter 17: Distributed Transactions
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
OCT Distributed Transaction1 Lecture 13: Distributed Transactions Notes adapted from Tanenbaum’s “Distributed Systems Principles and Paradigms”
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
CS 603 Distributed Transactions February 18, 2002.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Non-blocking Atomic Commitment Aaron Kaminsky Presenting Chapter 6 of Distributed Systems, 2nd edition, 1993, ed. Mullender.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Systems CS Fault Tolerance- Part III Lecture 15, Oct 26, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Transactions Distributed Systems Lecture 15: Transactions and Concurrency Control Transaction Notes mainly from Chapter 13 of Coulouris.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 5: Synchronous Uniform.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
1 CS 194: Distributed Systems Distributed Commit, Recovery Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering.
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 19: Paxos All slides © IG.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Systems Fall 2009 Distributed transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Deadlocks and Transaction Recovery.
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Replication with View Synchronous Group Communication Steve Ko Computer Sciences and Engineering.
Distributed Transactions Chapter 13
Consensus and Its Impossibility in Asynchronous Systems.
Distributed Systems CS Fault Tolerance- Part III Lecture 19, Nov 25, 2013 Mohammad Hammoud 1.
1 8.3 Reliable Client-Server Communication So far: Concentrated on process resilience (by means of process groups). What about reliable communication channels?
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CSE 486/586 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Concurrency Control Steve Ko Computer Sciences and Engineering University at Buffalo.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Distributed Transaction Management, Fall 2002Lecture Distributed Commit Protocols Jyrki Nummenmaa
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
University of Tampere, CS Department Distributed Commit.
1 Distributed Coordination. 2 Topics Event Ordering Mutual Exclusion Atomicity of Transactions– Two Phase Commit (2PC) Deadlocks  Avoidance/Prevention.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Consensus and leader election Landon Cox February 6, 2015.
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Lampson and Lomet’s Paper: A New Presumed Commit Optimization for Two Phase Commit Doug Cha COEN 317 – SCU Spring 05.
Fault Tolerance Chapter 7.
Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
CS 162 Section 10 Two-phase commit Fault-tolerant computing.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Network Protocols Andy Wang Operating Systems COP 4610 / CGS 5765.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Sarah Diesburg Operating Systems COP 4610
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Distributed Systems CS
Andy Wang Operating Systems COP 4610 / CGS 5765
Distributed Transactions
Distributed Databases Recovery
Last Class: Fault Tolerance
Presentation transcript:

1 More on Distributed Coordination

2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator dies (or at startup)? Last time: Failure Detection Election Algorithms Today Global Agreement Atomicity of Transactions– Two Phase Commit (2PC)

3 Generals coordinate with link failures Problem:  Two generals are on two separate mountains  Can communicate only via messengers; but messengers can get lost or captured by enemy  Goal is to coordinate their attack  If attack at different times  they loose !  If attack at the same time  they win ! A B Is 11AM OK? 11AM sounds good! 11AM it is, then. Does A know that this message was delivered? Even if all previous messages get through, the generals still can’t coordinate their actions, since the last message could be lost, always requiring another confirmation message. Even if all previous messages get through, the generals still can’t coordinate their actions, since the last message could be lost, always requiring another confirmation message.

4 Distributed Transactions -- The Problem How can we atomically update state on two different systems?  Harder than on a single CPU Examples:  Atomically move a directory from server A to server B  Atomically move $100 from one bank to another Issues:  Messages exchanged by systems can be lost  Systems can crash Question:  Can one use messages and retries over an unreliable network to synchronize the actions of two machines? Distributed consensus in the presence of link failures. Answer:  Remarkably, NO !!  Even if you assume that all messages do get through !

5 Two-phase Commit Can’t solve the General’s paradox; solve a related, but simpler problem Problem: Distributed transaction  Two machines agree to do something or not do it, atomically  But, do not perform the actions at the same time !! Example:  Transfer $100 from one bank to another  Need to guarantee that both banks agree on what happened  …but the two events do not need to be perfectly synchronized Key concept behind two-phase commit protocols:  Use logs on each machine to commit a transaction

6 Two-phase Commit Protocol: Phase 1 Phase 1: Coordinator requests a transaction  Coordinator sends a REQUEST to all participants  Example: C  S1 : “delete foo from /” C  S2 : “add foo to /”  On receiving request, participants perform these actions:  Test transaction, if valid record it in local log  Write VOTE_COMMIT or VOTE_ABORT to local log  Send VOTE_COMMIT or VOTE_ABORT to coordinator Failure caseSuccess case S1 decides OK; writes “rm /foo; VOTE_COMMIT” to log; and sends VOTE_COMMIT S2 has no space on disk; so rejects the transaction; writes and sends VOTE_ABORT S1 and S2 decide OK and write updates and VOTE_COMMIT to log; send VOTE_COMMIT

7 Two-phase Commit Protocol: Phase 2 Phase 2: Coordinator commits or aborts the transaction  Coordinator decides  Case 1: coordinator receives VOTE_ABORT or time-outs  coordinator writes GLOBAL_ABORT to log and sends GLOBAL_ABORT to participants  Case 2: Coordinator receives VOTE_COMMIT from all participants  coordinator writes GLOBAL_COMMIT to log and sends GLOBAL_COMMIT to participants  Participants commit or abort the transaction  On receiving a decision, participants write GLOBAL_COMMIT or GLOBAL_ABORT to log

8 Simple Example

9 Does Two-phase Commit work? Yes … can be proved formally Consider the following cases:  What if participant crashes during the request phase before writing anything to log?  On recovery, participant does nothing; coordinator will timeout and abort transaction; and retry!  What if coordinator crashes during phase 2?  Case 1: Log does not contain GLOBAL_*  send GLOBAL_ABORT to participants and retry  Case 2: Log contains GLOBAL_ABORT  send GLOBAL_ABORT to participants  Case 3: Log contains GLOBAL_COMMIT  send GLOBAL_COMMIT to participants

10 Limitations of Two-phase Commit What if the coordinator crashes during Phase 2 (before sending the decision) and does not wake up?  All participants block forever! (They may hold resources – eg. locks!) Possible solution:  Participant, on timing out, can make progress by asking other participants (if it knows their identity)  If any participant had heard GLOBAL_ABORT  abort  If any participant sent VOTE_ABORT  abort  If all participants sent VOTE_COMMIT but no one has heard GLOBAL_*  can we commit? NO – the coordinator could have written GLOBAL_ABORT to its log (e.g., due to local error or a timeout)

11 Two-phase Commit: Summary When you need to coordinate a transaction across multiple machines, …  Don’t hack together a solution!  Use two-phase commit  For two-phase commit, identify circumstances where indefinite blocking can occur  Decide if the risk is acceptable If two-phase commit is not adequate, then …  Use advanced distributed coordination techniques  To learn more about such protocols, take a distributed computing course !!