Synchronization in Distributed Systems In a single CPU system, critical regions, mutual exclusion and other synchronization problems are generally solved using methods such as semaphores and monitors and highly rely on shared memory. Not true for distributed systems. Even the simplest thing such as determining whether event A happened before or after event B require careful thought. Clock Synchronization: In general, distributed algorithms have the following properties: –The relevant info. is scattered among multiple machines –Processes make decisions based on local info. –A single point of failure in the system should be avoided –No common clock or other precise global time source exists.
Clock Synchronization In a centralized system, time is unambiguous. A process will make a system call and the kernel will tell it the time. If process A asks for a time, and then a little later process B asks for the time, the value that B gets will be higher than or possibly equal to the value A got. Let’s use the program make as an example: –Normally, in UNIX, large program is splitted into multiple sources files, so that a change to one source file only requires one file to be re-compiled. –The way make works is simple, it just check the.o file with the.c (source) file. From the time the files are last modified, make knows which source files have to re-compile again. –In the following scenario, the newly modified output.c will not be re-compiled by the make program because of a slightly slower clock on the editor’s machine. Computer on which compiler runs 2144 output.o created Time according to local clock Computer on which editor runs 2142 output.c modified Time according to local clock
Logical Clocks Every computer has a local clock -- a timer is more appropriate. A timer is usually a precisely machined quartz crystal. When kept under tension, quartz crystals oscillate at a well-defined frequency that depends on the kind of crystal, how it is cut, and the amount of tension. Associated with each crystal are two registers, a counter and a holding register. Each oscillation of the crystal decrements the counter by one. When the counter reach zero, an interrupt is generated and the counter is reload from the holding register. Each interrupt is called a clock tick. For PC, the clock ticks are 54.9 msec apart (18.3 per second) Within a single CPU system, it does not matter much if this clock is off by a small amount. Since all processes used the same clock, they will be internally consistent. In a distributed system, although the frequency at which a crystal oscillator runs is fairy stable, there is no way to guarantee that the crystals in different computer all run at exactly the same frequency. Crystals running at different rates will results in clocks gradually out of sync and give different value when read out. This differences in time value is called clock skew.
Logical Clocks (continue) Lamport (1978) showed that clock synchronization is possible and presented an algorithm for it. He also pointed out that clock synchronization need not be absolute. If two processes do not interact, it is not necessary that their clocks be synchronized because the lack of synchronization would not be observable and thus could not cause problems. What usually matters is not that all processes agree on exactly what time it is, but rather, that they agree on the order in which event occur. Therefore, it is the internal consistency of the clocks that matters, not whether they are particularly close to the real time. It is conventional to speak of the clocks as logical clocks. On the other hand, when clocks are required to be the same, and also must not deviate from the real time by more than a certain amount, these clocks are called physical clocks.
Lamport’s Algorithm Lamport defined a relation called happens-before The expression a->b is read “a happens before b” and means that all processes agree that the first event a occurs, then afterward, event b occurs. The happens-before relation can be observed in two situation: –If a and b are events in the same process, and event a occurs before event b, then a -> b is true. –If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a ->b is also true. A message cannot be received before it is sent, or even at the same time it is sent, since it takes a finite amount of time to arrive. Extra note: –Happens-before is a transitive relation. –If two events, x and y, happen in different processes, and do not exchange message (not even through 3rd party), then x->y is false and these events are said to be concurrent –For every event a, we assign a time value C(a) on which all processes agree.
Lamport’s Algorithm (continue) Consider the three processes in Figure 3-2(a): –Each process runs on a different machine –Each with its own clocks –Each clock is running at its own speed. (constant speed but different rates) A B C D A B C D Update clocks running
Some additional notes on Lamport’s algorithm –Every two events, the clock must tick at least once. –If a process sends or receives two messages in quick succession, it must advance its clock by (at least) one tick in between them. –In some situations, we want no two events ever occur at exactly the same time. –To achieve this goal, we can attach the number of the process in which the event occurs to the low-order end of the time, separated by a decimal point. Thus if event happen at processes 1 and 2, both at time 40, the former one gets 40.1 and the latter one gets –Therefore, we have If a happens before b in the same process, C(a) < C(b). If a and b represent the sending and receiving of a message, C(a) < C(b) For all events a and b, C(a) <> C(b) –This algorithm gives us a way to provide a total ordering of all events in the system. Lamport’s Algorithm (continue.)
Clock Synchronization Algorithms In a perfect world, we would have Cp(t) = t for all p and all t. We have a perfect clock when dC/dt = 1; a fast clock when dC/dt > 1; and a slow clock when dC/dt < 1; If there exists some constant such that 1 - <= dC/dt <= 1 + the timer can be said to be working within its specification. The constant is specified by the manufacturer and is known as the maximum drift rate. If two clocks are drifting from the Universal Coordinated Time (UTC) in opposite direction, at a time t after they are synchronized, they maybe as much as 2* * t apart. If the operating system designer want to guarantee that no two clocks ever differ by more than , clocks must be synchronized at least every /2 seconds.
Cristian’s Algorithm Periodically, certainly no more than every /2 seconds, each machine sends a message to the time server asking it for the current time. The time server responds as fast as it can with a message containing its current time. The major problem is that time must never run backward. Such a change must be introduced gradually. To gradually advance or slow down a clock, one can add 11 msec or add 9 msec for every clock interrupt until the correction has been made. The minor problem is that it takes a non-zero amount of time for the time server’s reply to get back to the sender and this delay may be large and vary with the network load. To estimate the network delay, we can have (T1 - T0) / 2 If we know the time server’s interrupt handling time, I, we can now estimate the network delay to be (T1 - T0 - I) / 2. Cristian suggested making a series of measurement, and throw away measurements in which T1 - T0 exceeds some threshold value. He assume that these discarded entries are victims of network congestion. These measurements are then averaged to give a better number for network delay and got added to the current time. Cristian stated that the message that came back the fastest can be taken to be the most accurate since it presumably encountered the least traffic.
The Berkeley Algorithm While Cristian’s algorithm is passive, Berkeley UNIX takes an opposite approach. The time server will poll every machine periodically to ask what time it is there. Based on the answers, it computes an average time and tells all the other machines to advance their clocks to the new time or slow their clocks until some specified reduction has been achieved. This method is suitable for a system in which no machine has a WWV receiver getting the UTC. The time daemon’s time must be set manually by the operator periodically. 3:00 3:252:50 3:00 3:252: : Time daemon
Averaging Algorithms Both Cristian’s and Berkeley’s methods are highly centralized, with the usual disadvantages - single point of failure, congestion around the server, … etc. One class of decentralized clock synchronization algorithms works by dividing time into fixed-length re-synchronization intervals. The ith interval starts at T0 + iR and runs until T0 + (i+1)R, where T0 is an agreed upon moment in the past, and R is a system parameter. At the beginning of each interval, every machine broadcasts the current time according to its clock. After a machine broadcasts its time, it starts a local timer to collect all other broadcasts that arrive during some interval S. When all broadcasts arrive, an algorithm is run to compute a new time. Some algorithms: –average out the time. –discard the m highest and m lowest and average the rest -- this is to prevent up to m faulty clocks sending out nonsense –correct each message by adding to it an estimate propagation time from the source. This estimate can be made from the known topology of the network, or by timing how long it takes for probe message to be echoed.