Synchronization in Distributed Systems In a single CPU system, critical regions, mutual exclusion and other synchronization problems are generally solved.

Slides:



Advertisements
Similar presentations
Synchronization.
Advertisements

Distributed Systems CS Synchronization Lecture 6, Sep 26, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Distributed Computing
Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.
Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Distributed Systems Spring 2009
1. Explain why synchronization is so important in distributed systems by giving an appropriate example When each machine has its own clock, an event that.
Teaching material based on Distributed Systems: Concepts and Design, Edition 3, Addison-Wesley Copyright © George Coulouris, Jean Dollimore, Tim.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Synchronization in Distributed Systems. Mutual Exclusion To read or update shared data, a process should enter a critical region to ensure mutual exclusion.
SynchronizationCS-4513, D-Term Synchronization in Distributed Systems CS-4513 D-Term 2007 (Slides include materials from Operating System Concepts,
Synchronization in Distributed Systems CS-4513 D-term Synchronization in Distributed Systems CS-4513 Distributed Computing Systems (Slides include.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
1 CLOCK SYNCHRONIZATION " Synchronization in distributed systems is more complicated than in centralized ones because the former have to use distributed.
Clock Synchronization and algorithm
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Lecture 9: Time & Clocks CDK4: Sections 11.1 – 11.4 CDK5: Sections 14.1 – 14.4 TVS: Sections 6.1 – 6.2 Topics: Synchronization Logical time (Lamport) Vector.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
1 Synchronization Part 1 REK’s adaptation of Claypool’s adaptation of Tanenbaum’s Distributed Systems Chapter 5.
Physical Clocks.
Synchronization Chapter 6 Part I Clock Synchronization & Logical clocks Part II Mutual Exclusion Part III Election Algorithms Part IV Transactions.
Synchronization.
1 Physical Clocks need for time in distributed systems physical clocks and their problems synchronizing physical clocks u coordinated universal time (UTC)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Naming CSCI 4780/6780. Attribute-based Naming Flat and structured names provide location transparency Structured names are also human-friendly Increasingly,
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.
Synchronization CSCI 4900/6900. Importance of Clocks & Synchronization Avoiding simultaneous access of resources –Cooperate to grant exclusive access.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Synchronization Chapter 5.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Principles and Paradigms Chapter 05 Synchronization.
1 Clock Synchronization & Mutual Exclusion in Distributed Operating Systems Brett O’Neill CSE 8343 – Group A6.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
Distributed Coordination. Turing Award r The Turing Award is recognized as the Nobel Prize of computing r Earlier this term the 2013 Turing Award went.
Synchronization CSCI 4900/6900. Importance of Clocks & Synchronization Avoiding simultaneous access of resources –Cooperate to grant exclusive access.
Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,
Distributed Process Coordination Presentation 1 - Sept. 14th 2002 CSE Spring 02 Group A4:Chris Sun, Min Fang, Bryan Maden.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
6 SYNCHRONIZATION. introduction processes synchronize –exclusive access. –agree on the ordering of events much more difficult compared to synchronization.
Synchronization. Clock Synchronization In a centralized system time is unambiguous. In a distributed system agreement on time is not obvious. When each.
Synchronization Chapter 5. Clock Synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
Chapter 10 Time and Global States
Prof. Leonardo Mostarda University of Camerino
Distributed Computing
CSC 8320 Advanced Operating Systems Spring 2006
Time and Clock.
Logical time (Lamport)
Time and Clock.
EEC 688/788 Secure and Dependable Computing
Chapter 5 Synchronization
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
Chapter 5 (through section 5.4)
Physical clock synchronization
CDK: Sections 11.1 – 11.4 TVS: Sections 6.1 – 6.2
Distributed Synchronization
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Logical time (Lamport)
Logical time (Lamport)
Chap 5 Distributed Coordination
Logical time (Lamport)
Last Class: Naming Name distribution: use hierarchies DNS
Outline Theoretical Foundations
Presentation transcript:

Synchronization in Distributed Systems In a single CPU system, critical regions, mutual exclusion and other synchronization problems are generally solved using methods such as semaphores and monitors and highly rely on shared memory. Not true for distributed systems. Even the simplest thing such as determining whether event A happened before or after event B require careful thought. Clock Synchronization: In general, distributed algorithms have the following properties: –The relevant info. is scattered among multiple machines –Processes make decisions based on local info. –A single point of failure in the system should be avoided –No common clock or other precise global time source exists.

Clock Synchronization In a centralized system, time is unambiguous. A process will make a system call and the kernel will tell it the time. If process A asks for a time, and then a little later process B asks for the time, the value that B gets will be higher than or possibly equal to the value A got. Let’s use the program make as an example: –Normally, in UNIX, large program is splitted into multiple sources files, so that a change to one source file only requires one file to be re-compiled. –The way make works is simple, it just check the.o file with the.c (source) file. From the time the files are last modified, make knows which source files have to re-compile again. –In the following scenario, the newly modified output.c will not be re-compiled by the make program because of a slightly slower clock on the editor’s machine. Computer on which compiler runs 2144 output.o created Time according to local clock Computer on which editor runs 2142 output.c modified Time according to local clock

Logical Clocks Every computer has a local clock -- a timer is more appropriate. A timer is usually a precisely machined quartz crystal. When kept under tension, quartz crystals oscillate at a well-defined frequency that depends on the kind of crystal, how it is cut, and the amount of tension. Associated with each crystal are two registers, a counter and a holding register. Each oscillation of the crystal decrements the counter by one. When the counter reach zero, an interrupt is generated and the counter is reload from the holding register. Each interrupt is called a clock tick. For PC, the clock ticks are 54.9 msec apart (18.3 per second) Within a single CPU system, it does not matter much if this clock is off by a small amount. Since all processes used the same clock, they will be internally consistent. In a distributed system, although the frequency at which a crystal oscillator runs is fairy stable, there is no way to guarantee that the crystals in different computer all run at exactly the same frequency. Crystals running at different rates will results in clocks gradually out of sync and give different value when read out. This differences in time value is called clock skew.

Logical Clocks (continue) Lamport (1978) showed that clock synchronization is possible and presented an algorithm for it. He also pointed out that clock synchronization need not be absolute. If two processes do not interact, it is not necessary that their clocks be synchronized because the lack of synchronization would not be observable and thus could not cause problems. What usually matters is not that all processes agree on exactly what time it is, but rather, that they agree on the order in which event occur. Therefore, it is the internal consistency of the clocks that matters, not whether they are particularly close to the real time. It is conventional to speak of the clocks as logical clocks. On the other hand, when clocks are required to be the same, and also must not deviate from the real time by more than a certain amount, these clocks are called physical clocks.

Lamport’s Algorithm Lamport defined a relation called happens-before The expression a->b is read “a happens before b” and means that all processes agree that the first event a occurs, then afterward, event b occurs. The happens-before relation can be observed in two situation: –If a and b are events in the same process, and event a occurs before event b, then a -> b is true. –If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a ->b is also true. A message cannot be received before it is sent, or even at the same time it is sent, since it takes a finite amount of time to arrive. Extra note: –Happens-before is a transitive relation. –If two events, x and y, happen in different processes, and do not exchange message (not even through 3rd party), then x->y is false and these events are said to be concurrent –For every event a, we assign a time value C(a) on which all processes agree.

Lamport’s Algorithm (continue) Consider the three processes in Figure 3-2(a): –Each process runs on a different machine –Each with its own clocks –Each clock is running at its own speed. (constant speed but different rates) A B C D A B C D Update clocks running

Some additional notes on Lamport’s algorithm –Every two events, the clock must tick at least once. –If a process sends or receives two messages in quick succession, it must advance its clock by (at least) one tick in between them. –In some situations, we want no two events ever occur at exactly the same time. –To achieve this goal, we can attach the number of the process in which the event occurs to the low-order end of the time, separated by a decimal point. Thus if event happen at processes 1 and 2, both at time 40, the former one gets 40.1 and the latter one gets –Therefore, we have If a happens before b in the same process, C(a) < C(b). If a and b represent the sending and receiving of a message, C(a) < C(b) For all events a and b, C(a) <> C(b) –This algorithm gives us a way to provide a total ordering of all events in the system. Lamport’s Algorithm (continue.)

Clock Synchronization Algorithms In a perfect world, we would have Cp(t) = t for all p and all t. We have a perfect clock when dC/dt = 1; a fast clock when dC/dt > 1; and a slow clock when dC/dt < 1; If there exists some constant  such that 1 -  <= dC/dt <= 1 +  the timer can be said to be working within its specification. The constant  is specified by the manufacturer and is known as the maximum drift rate. If two clocks are drifting from the Universal Coordinated Time (UTC) in opposite direction, at a time  t after they are synchronized, they maybe as much as 2*  *  t apart. If the operating system designer want to guarantee that no two clocks ever differ by more than , clocks must be synchronized at least every  /2  seconds.

Cristian’s Algorithm Periodically, certainly no more than every  /2  seconds, each machine sends a message to the time server asking it for the current time. The time server responds as fast as it can with a message containing its current time. The major problem is that time must never run backward. Such a change must be introduced gradually. To gradually advance or slow down a clock, one can add 11 msec or add 9 msec for every clock interrupt until the correction has been made. The minor problem is that it takes a non-zero amount of time for the time server’s reply to get back to the sender and this delay may be large and vary with the network load. To estimate the network delay, we can have (T1 - T0) / 2 If we know the time server’s interrupt handling time, I, we can now estimate the network delay to be (T1 - T0 - I) / 2. Cristian suggested making a series of measurement, and throw away measurements in which T1 - T0 exceeds some threshold value. He assume that these discarded entries are victims of network congestion. These measurements are then averaged to give a better number for network delay and got added to the current time. Cristian stated that the message that came back the fastest can be taken to be the most accurate since it presumably encountered the least traffic.

The Berkeley Algorithm While Cristian’s algorithm is passive, Berkeley UNIX takes an opposite approach. The time server will poll every machine periodically to ask what time it is there. Based on the answers, it computes an average time and tells all the other machines to advance their clocks to the new time or slow their clocks until some specified reduction has been achieved. This method is suitable for a system in which no machine has a WWV receiver getting the UTC. The time daemon’s time must be set manually by the operator periodically. 3:00 3:252:50 3:00 3:252: : Time daemon

Averaging Algorithms Both Cristian’s and Berkeley’s methods are highly centralized, with the usual disadvantages - single point of failure, congestion around the server, … etc. One class of decentralized clock synchronization algorithms works by dividing time into fixed-length re-synchronization intervals. The ith interval starts at T0 + iR and runs until T0 + (i+1)R, where T0 is an agreed upon moment in the past, and R is a system parameter. At the beginning of each interval, every machine broadcasts the current time according to its clock. After a machine broadcasts its time, it starts a local timer to collect all other broadcasts that arrive during some interval S. When all broadcasts arrive, an algorithm is run to compute a new time. Some algorithms: –average out the time. –discard the m highest and m lowest and average the rest -- this is to prevent up to m faulty clocks sending out nonsense –correct each message by adding to it an estimate propagation time from the source. This estimate can be made from the known topology of the network, or by timing how long it takes for probe message to be echoed.