Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 15/01/2008.

Slides:



Advertisements
Similar presentations
CS542 Topics in Distributed Systems Diganta Goswami.
Advertisements

CS 542: Topics in Distributed Systems Diganta Goswami.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Distributed Computing
Synchronization. Why Synchronize? Often important to control access to a single, shared resource. Also often important to agree on the ordering of events.
1 Algorithms and protocols for distributed systems We have defined process groups as having peer or hierarchical structure and have seen that a coordinator.
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Distributed Systems Spring 2009
CS 582 / CMPE 481 Distributed Systems
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.
1. Explain why synchronization is so important in distributed systems by giving an appropriate example When each machine has its own clock, an event that.
Distributed Systems Fall 2009 Coordination and agreement, Multicast, and Message ordering.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Synchronization in Distributed Systems. Mutual Exclusion To read or update shared data, a process should enter a critical region to ensure mutual exclusion.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Clock Synchronization and algorithm
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Mutex EE324 Lecture 11.
Synchronization Chapter 6
1 Synchronization. 2 Why Synchronize?  Often important to control access to a single, shared resource.  Also often important to agree on the ordering.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
1 Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Coordination and Agreement. Topics Distributed Mutual Exclusion Leader Election.
Global State (1) a)A consistent cut b)An inconsistent cut.
Synchronization CSCI 4780/6780. Mutual Exclusion Concurrency and collaboration are fundamental to distributed systems Simultaneous access to resources.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Synchronization Chapter 5. Outline 1.Clock synchronization 2.Logical clocks 3.Global state 4.Election algorithms 5.Mutual exclusion 6.Distributed transactions.
Synchronization Chapter 5.
Lecture 10 – Mutual Exclusion Distributed Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.
Distributed Process Coordination Presentation 1 - Sept. 14th 2002 CSE Spring 02 Group A4:Chris Sun, Min Fang, Bryan Maden.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Lecture 10: Coordination and Agreement (Chap 12) Haibin Zhu, PhD. Assistant Professor Department of Computer Science Nipissing University © 2002.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Synchronization. Clock Synchronization In a centralized system time is unambiguous. In a distributed system agreement on time is not obvious. When each.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Lecture on Synchronization Submitted by
Process Synchronization Presentation 2 Group A4: Sean Hudson, Syeda Taib, Manasi Kapadia.
Lecture 11: Coordination and Agreement Central server for mutual exclusion Election – getting a number of processes to agree which is “in charge” CDK4:
Coordination and Agreement
Distributed Computing
Distributed Mutex EE324 Lecture 11.
Chapter 6.3 Mutual Exclusion
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Distributed Mutual Exclusion Introduction Performance measures
Lecture 10: Coordination and Agreement
Synchronization Chapter 2.
Synchronization (2) – Mutual Exclusion
Prof. Leonardo Mostarda University of Camerino
Lecture 11: Coordination and Agreement
Presentation transcript:

Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 15/01/2008

Fall 2007cs4252 Agenda Synchronization –Logical Clocks & Applications –Election Algorithms –Distributed Mutual Exclusion Ref: COMP 212, University of Liverpool

Fall 2007cs4253 Logical Clocks Synchronization based on “relative time”. Note that (with this mechanism) there is no requirement for “relative time” to have any relation to the “real time”. What’s important is that the processes in the Distributed System agree on the ordering in which certain events occur. Such “clocks” are referred to as Logical Clocks.

Fall 2007cs4254 What can go wrong? Updating a replicated database: –Update 1 adds 100 rupees to an account, Update 2 calculates and adds 1% interest to the same account. Even if the clock is synchronized, due to network delays, the updates may come in different order Inconsistent state

Fall 2007cs4255 Lamport’s Logical Clocks First point: if two processes do not interact, then their clocks do not need to be synchronized – they can operate concurrently without fear of interfering with each other. Second (critical) point: it does not matter that two processes share a common notion of what the “real” current time is. What does matter is that the processes have some agreement on the order in which certain events occur. Lamport used these two observations to define the “happensbefore” relation (also often referred to within the context of Lamport’s Timestamps).

Fall 2007cs4256 The “Happens-Before” Relation (1) If A and B are events in the same process, and A occurs before B, then we can state that “A happens-before B” is true. Equally, if A is the event of a message being sent by one process, and B is the event of the same message being received by another process, then “A happens- before B” is also true. –Note that a message cannot be received before it is sent, since it takes a finite, nonzero amount of time to arrive … and, of course, time is not allowed to run backwards). Obviously, if “A happens-before B” and “B happens-before C”, then it follows that “A happens-before C”.

Fall 2007cs4257 The “Happens-Before” Relation (2) Now, assume three processes are in a DS: A, B and C. All have their own physical clocks A sends a message to B If the time the message was sent (attached to the message) exceeds the time of arrival at B, things are NOT OK (as “A happens-before B” is not true, and this cannot be allowed as the receipt of a message has to occur after it was sent).

Fall 2007cs4258 The “Happens-Before” Relation (3) The question to ask is: –How can some event that “happens-before” some other event possibly have occurred at a later time?? The answer is: it can’t! –So, Lamport’s solution is to have the receiving process adjust its clock forward to one more than the sending timestamp value. This allows the “happens-before” relation to hold, and also keeps all the clocks running in a synchronised state. The clocks are all kept in sync relative to each other.

Fall 2007cs4259 Example of Lamport’s Timestamps Lamport’s algorithm corrects the clock

Fall 2007cs42510 Summary – Lamport’s Timestamps It is hard to synchronise time across two systems Every message is timestamped If the local clock disagree, update the local clock (and always move it forward!) If two systems do not exchange messages, they do not need a common clock –If do not communicate, no ordering –Is it possible to achieve total ordering of events in the system?

Fall 2007cs42511 Applications of the Lamport’s Timestamps Total ordering of events in the system “Totally Ordered Multicasting” –To solve our problem with the replicated database Vector Timestamps (to capture causality) –Bulletin Board Systems, Chat rooms Global state –Local state of each process + messages in transit  Termination detection Read the book!

Fall 2007cs42512 Election Algorithms Many Distributed Systems require a single process to act as coordinator (for various reasons). –Time server in the Berkley’s algorithm –Coordinator in the two-phase commit protocol –Master process in distributed computations –Master database server Coordinator may fail → the distributed group of processes must execute an election algorithm to determine a new coordinator process.

Fall 2007cs42513 Election Algorithms For simplicity, we assume the following: –Processes each have a unique, positive identifier.  Processor ID  IP number –All processes know all other process identifiers. –The process with the highest valued identifier is duly elected coordinator.

Fall 2007cs42514 Goal of Election Algorithms The overriding goal of all election algorithms is to have all the processes in a group agree on a coordinator.

Fall 2007cs42515 Bully Election Algorithm (1) Bully: “jis ki lathi us ke bhens!” –A person who deliberately intimidates or is cruel to weaker people Assumptions Reliable message delivery (but processes may crash) The system is synchronous (timeouts to detect a process failure) Each process knows which processes have higher identifiers and can communicate with them

Fall 2007cs42516 Bully Election Algorithm (2) When any process, P, notices that the coordinator is no longer responding it initiates an election: P sends an ELECTION message to all processes with higher id numbers. If no one responds, P wins the election and becomes coordinator. If a higher process responds, it takes over. Process P’s job is done.

Fall 2007cs42517 Bully Election Algorithm (3) At any moment, a process can receive an ELECTION message from one of its lower numbered colleagues. The receiver sends an OK back to the sender and conducts its own election. Eventually only the bully process remains. The bully announces victory to all processes in the distributed group.

Fall 2007cs42518 Bully Election Algorithm (4) When a process “notices” that the current coordinator is no longer responding (4 deduces that 7 is down), it sends out an ELECTION message to any higher numbered process. If none responds, it (ie. 4) becomes the coordinator (sending out a COORDINATOR message to all other processes informing them of this change of coordinator). If a higher numbered process responds to the ELECTION message with an OK message, the election is cancelled and the higher-up process starts its own election

Fall 2007cs42519 Bully Election Algorithm (5) 6 wins the election When the original coordinator (ie. 7) comes back on-line, it simply sends out a COORDINATOR message, as it is the highest numbered process (and it knows it). Simply put: the process with the highest numbered identifier bullies all others into submission.

Fall 2007cs42520 Pop-quiz What happens when two processes detect the demise of the coordinator simultaneously and both decide to hold an election?

Fall 2007cs42521 Mutual Exclusion Critical section is a non-re-entrant piece of code that can only be executed by one process at a time. Mutual exclusion is a collection of techniques for sharing resources so that different uses do not conflict and cause unwanted interactions. Semaphore is a datatype (integer) used to implement mutual exclusion. Java synchronized methods: –int synchronized criticalSection(){…}

Fall 2007cs42522 Mutual Exclusion Essential requirements for mutual exclusion: –Safety: At most one process may execute in the critical section (CS) at a time –Liveness: Requests to enter and exit CS eventually succeed Liveness implies freedom of deadlocks and starvation (indefinite postponements of entry for a process that has requested it)

Fall 2007cs42523 Mutual Exclusion in DS Two major approaches: Centralized: a single coordinator controls whether a process can enter a critical region. Distributed: the group confers to determine whether or not it is safe for a process to enter a critical region.

Fall 2007cs42524 Centralized Algorithm Assume a coordinator has been elected. A process sends a message to the coordinator requesting permission to enter a critical section. If no other process is in the critical section, permission is granted. If another process then asks permission to enter the same critical region, the coordinator does not reply (Or, it sends “permission denied”) and queues the request. When a process exits the critical section, it sends a message to the coordinator. The coordinator takes first entry off the queue and sends that process a message granting permission to enter the critical section.

Fall 2007cs42525 Mutual Exclusion: A Centralized Algorithm a)Process 1 asks the coordinator for permission to enter a critical region. Permission is granted b)Process 2 then asks permission to enter the same critical region. The coordinator does not reply. c)When process 1 exits the critical region, it tells the coordinator, when then replies to 2

Fall 2007cs42526 Comments Advantages: It works. It is fair. There’s no process starvation. Easy to implement. Disadvantages: There’s a single point of failure! The coordinator is a bottleneck on busy systems. Critical Question: When there is no reply, does this mean that the coordinator is “dead” or just busy?

Fall 2007cs42527 A Toke Ring Algorithm Processes are organized into a logical ring. A token circulates around the ring A critical region can only be entered when the token is held –No token, no access! When the critical region is exited, the token is released.

Fall 2007cs42528 Comments Advantages: It works (as there’s only one token, so mutual exclusion is guaranteed). It’s fair – everyone gets a shot at grabbing the token at some stage. Disadvantages: Lost token! How is the loss detected (it is in use or is it lost)? How is the token regenerated? Process failure can cause problems – a broken ring! Every process is required to maintain the current logical ring in memory – not easy.

Fall 2007cs42529 Distributed Algorithm (1) Ricart and Agrawala algorithm (1981) assumes there is a mechanism for “totally ordering of all events” in the system (e.g. Lamport’s algorithm) and a reliable message system. A process wanting to enter critical sections (cs) sends a message with (cs name, process id, current time) to all processes (including itself). When a process receives a cs request from another process, it reacts based on its current state with respect to the cs requested.

Fall 2007cs42530 Distributed Algorithm (2) There are three possible cases: 1.If the receiver is not in the cs and it does not want to enter the cs, it sends an OK message to the sender. 2.If the receiver is in the cs, it does not reply and queues the request. 3.If the receiver wants to enter the cs but has not yet, it compares the timestamp of the incoming message with the timestamp of its message sent to everyone. {The lowest timestamp wins.} –If the incoming timestamp is lower, the receiver sends an OK message to the sender. –If its own timestamp is lower, the receiver queues the request and sends nothing.

Fall 2007cs42531 Distributed Algorithm (3) After a process sends out a request to enter a cs, it waits for an OK from all the other processes. When all are received, it enters the cs. Upon exiting cs, it sends OK messages to all processes on its queue for that cs and deletes them from the queue.

Fall 2007cs42532 A Distributed Algorithm a)Two processes want to enter the same critical region at the same moment. b)Process 0 has the lowest timestamp, so it wins. c)When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

Fall 2007cs42533 Comments Advantages: It works. –The algorithm works because in the case of a conflict, the lowest timestamp wins as everyone agrees on the total ordering of the events in the distributed system. There is no single point of failure Disadvantages: We now have multiple points of failure!!! A “crash” is interpreted as a denial of entry to a critical region. (A patch to the algorithm requires all messages to be ACKed). Worse is that all processes must maintain a list of the current processes in the group (and this can be tricky) Worse still is that one overworked process in the system can become a bottleneck to the entire system – so, everyone slows down.

Fall 2007cs42534 Comparison None are perfect – they all have their problems! The “Centralized” algorithm is simple and efficient, but suffers from a single point-of-failure. The “Distributed” algorithm has nothing going for it – it is slow, complicated, inefficient of network bandwidth, and not very robust. (bekar!!) The “Token-Ring” algorithm suffers from the fact that it can sometimes take a long time to reenter a critical region having just exited it. All perform poorly when a process crashes, and they are all generally poorer technologies than their non-distributed counterparts. Only in situations where crashes are very infrequent should any of these techniques be considered. Algorithm Messages per entry/exit Delay before entry (in message times) Problems Centralized32Coordinator crash Distributed2 ( n – 1 ) Crash of any process Token ring 1 to  0 to n – 1 Lost token, process crash

Fall 2007cs42535 Summary & Questions? That’s all for today!