1 Distributed Coordination. 2 Topics Event Ordering Mutual Exclusion Atomicity of Transactions– Two Phase Commit (2PC) Deadlocks  Avoidance/Prevention.

Slides:



Advertisements
Similar presentations
CS 542: Topics in Distributed Systems Diganta Goswami.
Advertisements

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
6.852: Distributed Algorithms Spring, 2008 Class 7.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)
CSC 4320/6320 Operating Systems Lecture 13 Distributed Coordination
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Consensus Algorithms Willem Visser RW334. Why do we need consensus? Distributed Databases – Need to know others committed/aborted a transaction to avoid.
Computer Science Lecture 18, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
CS 582 / CMPE 481 Distributed Systems
What we will cover…  Distributed Coordination 1-1.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Computer Science Lecture 17, page 1 CS677: Distributed OS Last Class: Fault Tolerance Basic concepts and failure models Failure masking using redundancy.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Synchronization in Distributed Systems. Mutual Exclusion To read or update shared data, a process should enter a critical region to ensure mutual exclusion.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
CS 603 Three-Phase Commit February 22, Centralized vs. Decentralized Protocols What if we don’t want a coordinator? Decentralized: –Each site broadcasts.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Distributed Systems Fall 2009 Distributed transactions.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Election Algorithms and Distributed Processing Section 6.5.
Distributed Deadlocks and Transaction Recovery.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement among multiple servers – Is transaction X committed?
CS162 Section Lecture 10 Slides based from Lecture and
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
Distributed Transactions Chapter 13
Consensus and Its Impossibility in Asynchronous Systems.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
University of Tampere, CS Department Distributed Commit.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 8 Fault.
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Committed:Effects are installed to the database. Aborted:Does not execute to completion and any partial effects on database are erased. Consistent state:
Consensus and leader election Landon Cox February 6, 2015.
Chap 15. Agreement. Problem Processes need to agree on a single bit No link failures A process can fail by crashing (no malicious behavior) Messages take.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
Two-Phase Commit Brad Karp UCL Computer Science CS GZ03 / M th October, 2008.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Fault Tolerance Chapter 7. Goal An important goal in distributed systems design is to construct the system in such a way that it can automatically recover.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
1 AGREEMENT PROTOCOLS. 2 Introduction Processes/Sites in distributed systems often compete as well as cooperate to achieve a common goal. Mutual Trust/agreement.
Synchronization (2) – Mutual Exclusion
Distributed Systems and Concurrency: Synchronization in Distributed Systems Majeed Kassis.
Last Class: Fault Tolerance
Presentation transcript:

1 Distributed Coordination

2 Topics Event Ordering Mutual Exclusion Atomicity of Transactions– Two Phase Commit (2PC) Deadlocks  Avoidance/Prevention  Detection The King has died. Long live the King!

3 Event Ordering Coordination of requests (especially in a fair way) requires events (requests) to be ordered. Stand-alone systems:  Shared Clock / Memory  Use a time-stamp to determine ordering Distributed Systems  No global clock  Each clock runs at different speeds How do we order events running on physically separated systems? Messages (the only mechanism for communicating between systems) can only be received after they have been sent.

4 Event Ordering: Happened Before Relation 1. If A and B are events in the same process, and A executed before B, then A  B. 2. If A is a message sent and B is when the message is received, then A  B. 3. A  B, and B  C, then A  C

5 Happened-Before Relationship Ordered events  p 1 preceeds ___  q 4 preceeds ___  q 2 preceeds ___  p 0 preceeds ___ time p0p0 q0q0 p1p1 p2p2 p3p3 p4p4 P q1q1 q2q2 q3q3 q4q4 q5q5 r1r1 r0r0 r2r2 r3r3 QR message Unordered (Concurrent) events  q 0 is concurrent with ___  q 2 is concurrent with ___  q 4 is concurrent with ___  q 5 is concurrent with ___

6 Happened Before and Total Event Ordering Define a notion of event ordering such that: 1. If A  B, then A precedes B. 2. If A and B are concurrent events, then nothing can be said about the ordering of A and B. Solution: 1. Each processor i maintains a logical clock LC i 2. When an event occurs locally, ++ LC i 3. When processor X sends a message to Y, it also sends LC x in the message. 4. When Y receives this message, it: if LC y < (LC x + 1) LC y = LC x + 1; Note: If “time” of A precedes “time” of B, then ???

7 If A->B and C->B does A->C? 1. Yes 2. No

8 Mutual Exclusion – Centralized Approach One known process in the system coordinates mutual exclusion Client:  Send a request to the controller, wait for reply  When reply comes, back – enter critical section  When finished, send release to controller. Controller:  Receives a request: If mutex is available, immediately send a reply (and mark mutex busy with client id). Otherwise, queue request.  Receives a release from current user: Remove next requestor from queue and send reply. Otherwise, mark mutex available.

9 Example: Centralized Approach P1 P2 Coordinator request reply Critical Section reply release Advantages? Disadvantages?

10 Mutual Exclusion – Decentralized Approach Requestor K:  Generate a TimeStamp TS k.  Send request (K, TS k ) to all processes.  Wait for a reply from all processes.  Enter CS Process K receives a Request:  Defer reply if already in CS  Else if we don’t want in, send reply.  (We want in) If TS r < TS k, send reply to R.  Else defer the reply. Leave CS, send reply to all deferred requests.

11 Example – Decentralized Approach Advantages? Disadvantages?  Lost reply hangs entire system P1 P3 P2 request (1) request (2) reply Critical Section reply

12 Distributed control vs. central control 1. Distributed control is easier, and more fault tolerant than central control. 2. Distributed control is harder, and more fault tolerant than central control. 3. Distributed control is easier, but less fault tolerant than central control 4. Distributed control is harder, but less fault tolerant than central control

13 Generals coordinate with link failures Problem:  Two generals are on two separate mountains  Can communicate only via messengers; but messengers can get lost or captured by enemy  Goal is to coordinate their attack  If attack at different times  they loose !  If attack at the same time  they win ! A B Is 11AM OK? 11AM sounds good! 11AM it is, then. Does A know that this message was delivered? Even if all previous messages get through, the generals still can’t coordinate their actions, since the last message could be lost, always requiring another confirmation message. Even if all previous messages get through, the generals still can’t coordinate their actions, since the last message could be lost, always requiring another confirmation message.

14 General’s coordination with link failures: Reductio Problem:  Take any exchange of messages that solves the general’s coordination problem.  Take the last message m n. Since m n might be lost, but the algorithm still succeeds, it must not be necessary.  Repeat until no messages are exchanged.  No messages exchanged can’t be a solution, so our assumption that we have an algorithm to solve the problem must be wrong. Distributed consensus in the presence of link failures is impossible.  That is why timeouts are so popular in distributed algorithms.  Success can be probable, just not guaranteed in bounded time.

15 Distributed concensus in the presence of link failures is 1. possible 2. not possible

16 Distributed Transactions -- The Problem How can we atomically update state on two different systems?  Generalization of the problem we discussed earlier ! Examples:  Atomically move a file from server A to server B  Atomically move $100 from one bank to another Issues:  Messages exchanged by systems can be lost  Systems can crash U se messages and retries over an unreliable network to synchronize the actions of two machines? The two-phase commit protocol allows coordination under reasonable operating conditions.

17 Two-phase Commit Protocol: Phase 1 Phase 1: Coordinator requests a transaction  Coordinator sends a REQUEST to all participants  Example: C  S1 : “delete foo from /” C  S2 : “add foo to /quux”  On receiving request, participants perform these actions:  Execute the transaction locally  Write VOTE_COMMIT or VOTE_ABORT to their local logs  Send VOTE_COMMIT or VOTE_ABORT to coordinator Failure caseSuccess case S1 decides OK; writes “rm /foo; VOTE_COMMIT” to log; and sends VOTE_COMMIT S2 has no space on disk; so rejects the transaction; writes and sends VOTE_ABORT S1 and S2 decide OK and write updates and VOTE_COMMIT to log; send VOTE_COMMIT

18 Two-phase Commit Protocol: Phase 2 Phase 2: Coordinator commits or aborts the transaction  Coordinator decides  Case 1: coordinator receives VOTE_ABORT or times-out  coordinator writes GLOBAL_ABORT to log and sends GLOBAL_ABORT to participants  Case 2: Coordinator receives VOTE_COMMIT from all participants  coordinator writes GLOBAL_COMMIT to log and sends GLOBAL_COMMIT to participants  Participants commit the transaction  On receiving a decision, participants write GLOBAL_COMMIT or GLOBAL_ABORT to log

19 Does Two-phase Commit work? Yes … can be proved formally Consider the following cases:  What if participant crashes during the request phase before writing anything to log?  On recovery, participant does nothing; coordinator will timeout and abort transaction; and retry!  What if coordinator crashes during phase 2?  Case 1: Log does not contain GLOBAL_*  send GLOBAL_ABORT to participants and retry  Case 2: Log contains GLOBAL_ABORT  send GLOBAL_ABORT to participants  Case 3: Log contains GLOBAL_COMMIT  send GLOBAL_COMMIT to participants

20 Limitations of Two-phase Commit What if the coordinator crashes during Phase 2 (before sending the decision) and does not wake up?  All participants block forever! (They may hold resources – eg. locks!) Possible solution:  Participant, on timing out, can make progress by asking other participants (if it knows their identity)  If any participant had heard GLOBAL_ABORT  abort  If any participant sent VOTE_ABORT  abort  If all participants sent VOTE_COMMIT but no one has heard GLOBAL_*  can we commit?  NO – the coordinator could have written GLOBAL_ABORT to its log (e.g., due to local error or a timeout)

21 Two-phase Commit: Summary Message complexity 3(N-1)  Request/Reply/Broadcast, from coordinator to all other nodes. When you need to coordinate a transaction across multiple machines, …  Use two-phase commit  For two-phase commit, identify circumstances where indefinite blocking can occur  Decide if the risk is acceptable If two-phase commit is not adequate, then …  Use advanced distributed coordination techniques  To learn more about such protocols, take a distributed computing course

22 Can the two phase commit protocol fail to terminate? 1. Yes 2. No

23 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator dies (or at startup)? Bully algorithm

24 Bully Algorithm Assumptions  Processes are numbered (otherwise impossible).  Using process numbers does not cause unfairness. Algorithm idea  If leader is not heard from in a while, assume s/he crashed.  Leader will be remaining process with highest id.  Processes who think they are leader-worthy will broadcast that information.  During this “election campaign” processes who are near the top see if the process trying to grab power crashes (as evidenced by lack of message in timeout interval).  At end of time interval, if alpha-process has not heard from rivals, assumes s/he has won.  If former alpha-process arises from dead, s/he bullies their way to the top. (Invariant: highest # process rules)

25 Bully Algorithm Details Bully Algorithm details  Algorithm starts with Pi broadcasting its desire to become leader. Pi waits T seconds before declaring victory.  If, during this time, Pi hears from Pj, j>i, Pi waits another U seconds before trying to become leader again. U?  U ~ 2T, or T + time to broadcast new leader  If not, when Pi hears from only Pj, j<i, and T seconds have expired, then Pi broadcasts that it is the new leader.  If Pi hears from Pj, j<i that Pj is the new leader, then Pi starts the algorithm to elect itself (Pi is a bully).  If Pi hears from Pj, j>i that Pj is the leader, it records that fact.

26 In the bully algorithm can there every be a point where the highest number process is not the leader? 1. Yes 2. No

27 Byzantine Agreement Problem N Byzantine generals want to coordinate an attack.  Each general is on his/her own hill.  Generals can communicate by messenger, and messengers are reliable (soldier can be delayed, but there is always another foot soldier).  There might be a traitor among the generals. Goal: In the presence of less than or equal to f traitors, can the N-f loyal Generals coordinate an attack?  Yes, if N ≥ 3*f+1  Number of messages ~ (f+1)*N 2  f+1 rounds  (Restricted form where traitorous generals can’t lie about what other generals say does not have N bound)

28 Byzantine Agreement Example N = 4 m = 1 Round 1: Each process Pi broadcasts its value Vi.  E.g., Po hears (10, 45, 74, 88)  Vo = 10 Round 2: Each process Pi broadcasts the vector of values, Vj, j!=i, that it received in the first round.  E.g., Po hears from P1 (10,45,74,88),  from P2 (10,66,74,88) [P2 or P1 bad]  from P3 (10,45,74,88) Can take all values and vote. Majority wins. Need enough virtuous generals to make majority count. Don’t know who virtuous generals are, just know that they swamp the bad guys.

29 Byzantine Agreement Danger N = 3 m = 1 Round 1: Each process Pi broadcasts its value Vi.  E.g., Po hears (10, 45, 75) [P2 is lying] Round 2: Each process Pi broadcasts the vector of values, Vj, j!=i, that it received in the first round.  E.g., Po hears from P1 (10,45,76),  from P2 (10,45,75)  Either  R1: P2 told Po 75  R1: P2 told P1 76  P1 is trustworthy, P2 lies about P1  OR  R1: P2 told Po 75  R1: P2 told P1 75  P2 is trustworthy, P1 lies about P2 in R2  Po can’t choose between 75 and 76, even if P1 non-faulty

30 Byzantine fault tolerant algorithms tend to run quickly. 1. Yes 2. No