Chapter 18.3: Distributed Coordination. 18.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.

Slides:



Advertisements
Similar presentations
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Advertisements

Deadlock Prevention, Avoidance, and Detection
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Lectures 25-26: Distributed Coordination (Ch 18)
Chapter 18: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Event Ordering.
Chapter 16: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Apr 11, 2005 Chapter 16 Distributed.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 7: Deadlocks.
Chapter 7: Deadlocks (Continuation). 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 14, 2005 Chapter 7: Deadlocks.
7.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 7: Deadlocks.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Deadlocks  (How to Detect Them and Avoid Them) A:
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 7: Deadlocks.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 30 Handling Deadlock.
Lecture 7: Deadlocks, Deadlock Risk Management. Lecture 7 / Page 2AE4B33OSS Silberschatz, Galvin and Gagne ©2005 Contents The Concept of Deadlock Resource-Allocation.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 17 Distributed Coordination Event Ordering Mutual Exclusion Atomicity Concurrency.
Distributed Coordination CS 3100 Distributed Coordination1.
CS 582 / CMPE 481 Distributed Systems
What we will cover…  Distributed Coordination 1-1.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 8: Deadlocks System Model Deadlock Characterization Methods for Handling Deadlocks.
Chapter 18-1: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.
Module 2.4: Distributed Systems
Chapter 7.1: Deadlocks.
03/07/2007CSCI 315 Operating Systems Design1 Deadlock Notice: The slides for this lecture have been largely based on those accompanying the textbook Operating.
Chapter 18.2: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 14, 2005 Chapter 7: Deadlocks The Deadlock.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 7: Deadlocks.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 7: Deadlocks.
Silberschatz, Galvin and Gagne  Operating System Concepts Deadlock and Starvation Deadlock – two or more processes are waiting indefinitely for.
03/03/2004CSCI 315 Operating Systems Design1 Deadlock Notice: The slides for this lecture have been largely based on those accompanying the textbook Operating.
Distributed Commit. Example Consider a chain of stores and suppose a manager – wants to query all the stores, – find the inventory of toothbrushes at.
1 Mutual exclusion (mx) and Deadlock(dl) handling Overview of Event Ordering Mutual Exclusion Atomicity Locking protocols Time-stamping Deadlock Handling.
7: Deadlocks1 DEADLOCKS EXAMPLES: "It takes money to make money". You can't get a job without experience; you can't get experience without a job. BACKGROUND:
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 14, 2005 Objectives Understand the Deadlock.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Deadlocks.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 8: Deadlocks System Model Deadlock Characterization Methods for Handling Deadlocks.
O/S 4740 Distributed Coordination. Event Ordering In a Centralized system, we have common memory and clock, –So we can always determine the order that.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 7: Deadlocks Modified.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 7: Deadlocks.
Chapter 7 Deadlocks. 7.2 Modified By Dr. Khaled Wassif Operating System Concepts – 7 th Edition Silberschatz, Galvin and Gagne ©2005 Chapter 7: Deadlocks.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 AE4B33OSS Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock Characterization.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 28 Handling Deadlock.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 7: Deadlocks.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock.
Chapter 18: Distributed Coordination Adapted to COP4610 by Robert van Engelen.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 7: Deadlocks System Model Deadlock Characterization Methods.
Lecture 12 Handling Deadlock – Prevention, avoidance and detection.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 7: Deadlocks.
Chapter 16: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Apr 11, 2005 Outline n Event.
Operating Systems Unit VI Deadlocks and Protection Department of Computer Science Engineering and Information Technology.
7.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock Characterization.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 7: Deadlocks.
Chapter 18: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Apr 11, 2005 Chapter 18 Distributed.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
7.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 7: Deadlocks.
Silberschatz, Galvin and Gagne ©2009 Edited by Khoury, 2015 Operating System Concepts – 9 th Edition, Chapter 7: Deadlocks.
Silberschatz and Galvin  Operating System Concepts Module 18: Distributed Coordination Event Ordering Mutual Exclusion Atomicity Concurrency.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 7: Deadlocks The Deadlock Problem System Model Deadlock.
Chapter 7: Deadlocks. 7.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts - 7 th Edition, Feb 14, 2005 Chapter 7: Deadlocks The Deadlock.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 7: Deadlocks.
OPERATING SYSTEM CONCEPTS AND PRACTISE
Chapter 18: Distributed Coordination
Chapter 7: Deadlocks Source & Copyright: Operating System Concepts, Silberschatz, Galvin and Gagne.
Module 18: Distributed Coordination
Chapter 7: Deadlocks.
Presentation transcript:

Chapter 18.3: Distributed Coordination

18.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter 18.1 Event Ordering Mutual Exclusion Atomicity Chapter 18.2 Concurrency Control Deadlock Handling Chapter 18.3 Deadlock Prevention – finish up Election Algorithms – a little bit Reaching Agreement – a little bit

18.3 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter Objectives To present schemes for handling deadlock detection in a distributed system (have looked at deadlock prevention and avoidance) To take a brief look at election algorigthms To take a brief look at Reaching Agreement considerations.

18.4 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Deadlock Detection

18.5 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Deadlock Detection In deadlock prevention, we may implement an algorithm that preempts resources even if no deadlock has occurred. This is not necessarily good, and we want to avoid unnecessary preemptions wherever possible – an this is a real problem with deadlock prevention... To help us avoid unnecessary preemptions, we can build a wait-for graph that is used to describe the state of resource allocations. Remember that we’re only considering a single resource of each type, and thus if we have a cycle in our wait-for graph, we are in trouble and have a deadlock. Wait-for graph philosophy is reasonably straightforward; issue is how to maintain it. Two techniques we consider require each site to keep its own local wait-for graph. In the wait-for graphs, nodes correspond to processes (local and non-local) currently holding or requesting any resources local to that site. Can see in the figure (next page), we have a system consisting of two sites, each maintaining its own local wait-for graph. Note that P(2) and P(3) appear in both graphs, and this indicates that these processes have requested resources at both sites.

18.6 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Two Local Wait-For Graphs Both local wait-for graphs are built in the accustomed manner for local processes / resources. When a process P(i) at site S(i) needs a resource held by process P(J) in site S(2), a request message is sent by P(i) to site S(2). The edge P(i)  P(J) is then inserted into the local wait-for graph of site S(2) Of course, if any local wait-for graph has a cycle, we have deadlock. BUT the fact that there are NO cycles does not mean there are no deadlocks. We must look at a ‘larger picture.’ To show this: Note that each graph above is acyclic; nevertheless a deadlock exists in the system. To prove that a deadlock has NOT occurred, we must show that the UNION of all local graphs is acyclic. Next slide shows this is not the case…

18.7 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Global Wait-For Graph Continuing, when we take the union of the two wait-for graphs, it is clearn that we do indeed have a cycle, and this implies that the system is in a deadlocked state. We have a number of methods to organize the wait-for graph in a distributed system. Some common approaches are Centralized approaches and Fully distributed approaches. These are very detailed and in the interest of time (and desire to cover another chapter after this one) in this course, we will not go into detail on these two approaches. Rather, we will jump to Election Algorithms and Reaching Agreement.

18.8 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Election Algorithms

18.9 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Election Algorithms We have discussed in a number of instances how centralized and fully distributed approaches handle the coordination of transactins. So, given we understand the role (and possible distribution) of transaction coordinators, what happens when one such transaction coordinator becomes unavailable? We must determine where a new copy of the coordinator should be restarted. Hence, enter a process referred to as Election Algorithms. These algorithms assume that a unique priority number is associated with each active process in the system; assume also that the priority number of process P i is i Assume also a one-to-one correspondence between processes and sites The coordinator is always the process with the largest priority number. So, when a coordinator fails, the algorithm must elect that active process with the largest priority number Then, this number is sent to each active process in the system. Also, when the former transaction coordinator becomes restored, it must be able to identify the new transaction coordinator via this algorithm.. Two algorithms are typically used to elect a new coordinator: A bully algorithm and A ring algorithm

18.10 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Bully Algorithm (1 of 2) This algorithm is applicable to systems where every process can send a message to every other process in the system Given this assumption, If process P i sends a request that is not answered by the coordinator within a time interval T, then P i assumes that the coordinator has failed; P i then acts like a bully and tries to elect itself as the new coordinator P i sends an election message to every process with a higher priority number, P( j ), then waits for any of these processes to answer within some time, T If there’s no response within T, P(i) assume that all processes with numbers greater than i have failed; P i then elects itself the new coordinator If an answer is received, P i begins time interval T´, waiting to receive a message that a process with a higher priority number has been elected If no message is sent within T´, P(i) assumes the process with a higher number has failed; P i should restart the algorithm.

18.11 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Bully Algorithm (Cont.) If P i is not the coordinator, then, at any time during execution, P i may receive one of the following two messages from process P( j ) P( j ) is the new coordinator (j > i). P i, in turn, records this information P ( j ) j started an election (j > i). P i, sends a response to P ( j ) and begins its own election algorithm, provided that P i has not already initiated such an election The process that completes its algorithm has the highest number and is elected as the coordinator. It will have also sent its number to all active processes with smaller numbers. After a failed process recovers, it will immediately begins execution of the same algorithm – being a bully that it is. If there are no active processes with higher numbers, the recovered process forces all processes with lower number to let it become the coordinator process, even if there is a currently active coordinator with a lower number You can go through the detailed example of how these elections occur…

18.12 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Ring Algorithm (1 of 2) No great surprise here. This election algorithm is based on a ring architectural structure or at least a logical ring, if not physical ring. Communications are as expected where processes sends its messages to the neighbors on the right. The Active List. The main data structure used by the algorithm includes what is called an ‘active list’ containing priority numbers of all processes active in the system. Each process maintains an active list, consisting of all the priority numbers of all active processes in the system. If process P(i) detects a coordinator failure, it creates an initially empty new active list. It then sends a message elect(i) to its right neighbor, and adds the number i to its active list Note the direction of the communications.

18.13 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Ring Algorithm (Cont.) If P i receives a message elect( j ) from the process on the left, it must respond in one of three ways: 1. If this is the first elect message it has seen or sent, P i creates a new active list with the numbers i and j  It then sends the message elect( i ), followed by the message elect( j ) 2. If i  j, then the active list for P i now contains the numbers of all the active processes in the system  P i can now easily determine the largest number in the active list to identify the new coordinator process 3. If i = j, then P i receives the message elect( i )  The active list for P i contains all the active processes in the system  P i can now determine the new coordinator process.

18.14 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Reaching Agreement

18.15 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Reaching Agreement (directly from book) Normally, applications processes wish to agree on a common “values” Such agreement, however, may not take place due to a: Faulty communication medium which might result in lost or garbled messages, or Faulty processes  Processes themselves may send garbled or otherwise incorrect messages to other processes  Processes themselves can also be flawed in other ways and result in unpredictable process behaviors. In short, we can have a mess. We can ‘hope’ that processes fail in a clean manner, But processes can fail miserably and send garbled / incorrect messages to other processes or even collaborate with other failed processes in an attempt to destroy the integrity of the system. So let’s look more closely at reaching agreement:

18.16 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Reaching Agreement – Unreliable Communications Approach 1: assume processes fail in a clean manner, where the data communications medium is unreliable. So lets assume that some process P( i ) at site S(1) which has sent a message to process P( j ) at site S(2), needs to know whether P( j ) has received the message so that it can decide how to proceed with, say, its computation. For example, P( i ) may decide to compute a function foo if P( j ) has received its message or to compute a function boo if P( j ) has not received the message (because of some hardware failure). We can use a time-out scheme similar to the one described earlier to detect failures. To implement this, when P( i ) sends out a message, it also specifies some kind of time interval during which it is willing to wait for an acknowledgment message from P( j ). When P( j ) receives the message, it immediately sends an acknowledgement to P( i ). If P( i ) received the acknowledgment message within the specified time interval, it can safely conclude that P( j ) needs to retransmit its message and wait for an acknowledgment. Then P( I ) can know whether to execute foo or boo. This procedure continues until P( i ) either gets the acknowledgment message back or it is notified by the system that site S(2) is down. Note that, if these are the only two viable alternatives, P( i ) must wait until it has been notified that one of the situations has occurred.

18.17 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Reaching Agreement – Unreliable Communications Suppose now that P( j ) also needs to know that P( i ) has received its acknowledgment message so that it can decide how to proceed with its computation. For example, P( j ) may want to compute foo only if it is assured that P( i ) got its acknowledgment.  In other words, P( i ) and P( j ) will compute foo if and only if both have agreed on it. It turns out that, in the presence of failure, it is not possible to accomplish this task.  More precisely, it is not possible in a distributed environment for processes P( i ) and P ( j ) to agree completely on their respective states.

18.18 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Reaching Agreement – Unreliable Communications To prove this claim, let us suppose that a minimal sequence of message transfers exists such that, after the messages have been delivered, both processes agree to compute foo. Let m’ be the last message sent by P( i ) to P ( j ). Since P( i ) does not know whether its message will arrive at P( j ) (since the message may be lost due to a failure), P( i ) will execute foo regardless of the outcome of the message delivery. Thus, the message m’ could be removed from the communications sequence without affecting the decision procedure. Hence, the original sequence was not minimal, contradicting our assumption and showing that there is no sequence. (proof by contradiction) The processes can never be sure that both will compute foo.

End of Chapter 18.3