Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.

Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 2 Topics  Clock synchronization  Logical clocks  Global State

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 3 How processes can synchronize  Multiple processes must be able to cooperate in granting each other temporary exclusive access to a resource  Also, multiple processes may need to agree on the ordering of events, such as whether message m1 from process P was sent before or after message m2 from process Q.

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 4 Centralized system  Time is unambiguous  If a process wants to know the time, it makes a system call and finds out  If process A asks for the time and gets it and then process B asks for the time and gets it, the time that B was told will be later than the time that A was told.  Simple, no?

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 5 Physical Clocks  Physical computer clocks are not clocks; they are timers  Quartz crystal that oscillates at a well-defined frequency that depends on physical properties  Two registers: counter and a holding register  Each oscillation decrements the counter by one  When counter reaches zero, generates an interrupt and the counter is reloaded from the holding register  Each interrupt is called a clock tick  Interrupt service procedure adds 1 to time stored in memory so the software clock is kept up to date

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 6 The one and the many  What if the clock is “off” by a little?  All processes on single machine use the same clock so they will still be internally consistent  What matters is relative time  Impossible to guarantee that crystals in different computers run at exactly the same frequency  Gradually software clocks get out of synch -- skew  A program that expects time to be independent of the machine on which it is run... fails

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 7 Hey buddy, can you spare me a second?  To provide UTC (translates as Universal Coordinated Time) to those who need precise time, NIST operates a short wave radio station WWV from Fort Collins, CO  WWV broadcasts a short pulse at the start of each second  There are stations in other countries plus satellites  Using either short wave or satellite services requires an accurate knowledge of the relative position of the sender and receiver. Why?

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 8 To WWV or not to WWV  If one computer has a WWV receiver, the goal is keeping all the others synchronized to it.  If no machines have WWV receivers, each machine keeps track of its own time  Goal -- keep all machines together as well as possible  There are many algorithms

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 9 Underlying model for synchronization models  Each machine has a timer that interrupts H times a second  Interrupt handler adds 1 to a software clock that keeps track of the number of ticks since some agreed-upon time in the past  Call the value of the clock C  Notationally, when UTC time is t, the value of the clock on machine p is C p (t)  In a perfect world, C p (t) = t for all p and all t

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 10 Back to reality  Theoretically, a timer with H=60 should generate 216,000 ticks per hour  Relative error is about 10^-5 meaning a particular machine gets a value in the range 215,998 to 216,002  There is a constant called the maximum drift rate and a timer will work with “perfect” + maximum drift rate.  If two clocks are drifting in the opposite direction at a time delta-t after they were synchronized  may be as much as twice the max drift rate apart  To differ by no more than delta, clocks must be resynchronized every (delta/2*max-drift-rate) seconds

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 11 Cristian’s algorithm  Well suited to one machine with a WWV receiver and a goal to have all other machines stay synchronized with it.  Call the one with the WWV receiver the time server  Periodically, each machine sends a message to the time server asking for the current time  Machine responds with C UTC as fast as it can  1st approximation, requester sets its clock to C UTC  What’s wrong with that?

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 12 Big Trouble  Major problem  Time really should never run backward -- why?  If sender’s clock was fast, C UTC will be smaller than the sender’s current value of C  Change must be introduced gradually  If timer generates 100 interrupts/second, each interrupt adds 10 ms to the time  To slow down, ISR adds only 9 ms until correct  To speed up, add 11 ms at each interrupt

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 13 Little Trouble  Minor problem  Takes a nonzero amount of time for the time server’s reply to get back to the sender  Delay may be large and vary with network load  Cristian attempts to measure send and receive times, subtract, divide by 2; add this to received C UTC  Better: length of time server’s ISR, I, and incoming message processing time: (T 1 - T 0 - I)/2  To improve accuracy, measure several and average

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 14 If no WWV Receiver  Berkeley UNIX algorithm  The time server (actually time daemon) is active, not passive  It polls every machine and asks what time it is  Based on answers, it computes an average time and tells all machines to adjust their clocks to the new time  The time daemon’s time is set manually by the operator periodically  Centralized algorithm though the time daemon does not have a WWV receiver

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 15 Decentralized synchronization  Cristian and Berkeley UNIX are centralized algorithms with the usual downside. What?  There are several decentralized algorithms, for example:  Divide time into fixed length resynchronization intervals  At the beginning of each interval, every machine broadcasts its current time  Each starts a local timer to collect all broadcasts arriving during a certain interval  Algorithm to compute a new time based on some/all

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 16 Internet Synchronization  New hardware and software technology in the past few years make it possible to keep millions of clocks synchronized to within a few ms of UTC  New algorithms using these synchronized clocks are beginning to appear  Synchronized clocks can be used  to achieve cache consistency  to use time-out tickets in distributed system authentication  to handle commitment in atomic transactions

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 17 Logical Clocks  See also notes from 3 weeks ago  For many purposes, it is sufficient that machines agree on the same time even if it is not the “right” time  Internal consistency of the clocks matters  Clock synchronization is possible but does not have to be absolute  If 2 processes do not interact, their clocks need not be synchronized; the lack of synch would not be seen  What is important is that all processes agree on the order in which events occur

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 18 Lamport timestamps  a happens-before b means that all processes agree that first event a occurs, then afterward, event b occurs  We write a happens-before b as a --> b  If a occurs before b in the same process, we say a --> b is true  If the event a sends a message and event b receives that message in another process, a --> b is also true because a message cannot be received until after it is sent.  happens-before is transitive

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 19 Ya cain’t say  If x and y happen in different processes that do not exchange messages, then  we cannot say x --> y  we cannot say y --> x  nothing can be said about when the events happened or which event happened first  we call these events concurrent

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 20 Invent time  Need a way of measuring time so that for every event a we can assign a time C(a) on which all processes agree.  Such that, if a --> b, then C(a) < C(b)  If a and b are two events in the same process and a happens before b, then C(a) < C(b)  If a is the sending of a msg by one process and b is the receiving of that msg by another, then C(a) and C(b) must be assigned so that everyone agrees on the values of C(a) and C(b) with C(a) < C(b)  Corrections to C can only be made by addition, never subtraction so that the clock time always goes forward

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 21 If msg leaves at time N, it arrives at >= N+1  Each message carries the time according to its sender’s clock  When it arrives, if the receiver’s clock shows a value prior to the time the message was sent, the receiver fast forwards its clock to be 1 more than the sending time  Between every two events the clock must tick at least once  If a process sends or receives 2 messages in quick succession, it must advance its clock by (at least) 1 tick in between  No 2 events ever occur at exactly the same time

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 22 Totally-ordered Multicast  Consider a bank with replicated data in San Francisco and New York City.  Customer in SF wants to add $100 to the account of $1000  Meanwhile, a bank employee in NY initiates an update by which the customer’s account will be increased with 1% interest.  Due to communication delays, the instructions could arrive at the replicated sites in different orders with differing final answers  Should have been performed at both sites in same order

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 23 Using Lamport timestamps to get totally ordered multicast  Consider group of processes multicasting messages to each other  Each message is timestamped with the current (logical) time of its sender  Conceptually, if multicast, the msg is also sent to its sender  We assume msgs from the same sender are received in the order they were sent and that no messages were lost

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 24 totally ordered multicast (cont.)  When a process receives a message, it goes into a local queue ordered according to its timestamp  The receiver multicasts an acknowledgement  Using Lamport’s algorithm for adjusting local clocks, the timestamp of the received msg is lower than the timestamp of the acknowledgement  All processes will eventually have the same copy of the local queue because each msg is multicast, plus acks  We assumed msgs are delivered in the order sent by sender

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 25  Each process inserts a received msg in its local queue according to the timestamp in that msg.  Lamport’s clocks ensure no two messages have the same timestamp  Also, the timestamps reflect a consistent global ordering of events  A process delivers a queued msg to the application it is running when that message is at the head of the queue and has been acknowledged by each other process  The msg removed from queue; associated acks removed. totally ordered multicast (cont. more)

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 26 Vector Timestamps  With Lamport timestamps, nothing can be said about the relationship between a and b simply by comparing their timestamps C(a) and C(b).  Just because C(a) < C(b), doesn’t mean a happened before b (remember concurrent events)  Consider network news where processes post articles and react to posted articles  Postings are multicast to all members  Want reactions delivered after associated postings

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 27 Will totally-ordered multicasting work?  That scheme does not mean that if msg B is delivered after msg A, B is a reaction to msg A. They may be completely independent.  What’s missing?  If causal relationships are maintained within a group of processes, then receipt of a reaction to an article should always follow the receipt of the article.  If two items are independent, their order of delivery should not matter at all

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 28 Vector Timestamps capture causality  VT(a) < VT(b) means event a causally precedes event b.  Let each process Pi maintain vector Vi such that  V i [i] is the number of events that have occurred so far at P i  If V i [j] = k then P i knows that k events have occurred at P j  We increment V i [i] at the occurrence of each new event that happens at process P i  Piggyback vectors with msgs that are sent. When P i sends msg m, it sends its current vector along as a timestamp vt.

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 29  Receiver thus knows the number of events that have occurred at P i  Receiver is also told how many events at other processes have taken place before P i sent message m.  timestamp vt of m tells the receiver how many events in other processes have preceded m and on which m may causally depend  When P j receives m, it adjusts its own vector by setting each entry V j [k] to max{V j [k], vt[k]}  The vector now reflects the # of msgs that P j must receive to have at least seen the same msgs that preceded the sending of m.  V j [i] is incremented by 1 representing the event of receiving msg m as the next message from P i

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 30 When are messages delivered?  Vector timestamps are used to deliver msgs when no causality constraints are violated.  When process Pi posts an article, it multicasts that article as a msg a with timestamp vt(a) set equal to Vi.  When another process Pj receives a, it will have adjusted its own vector such that Vj[i] > vt(a)[i]  Now suppose Pj posts a reaction by multicasting msg r with timestamp vt(r) equal to Vj. vt(r)[i] > vt(a)[i].  Both msg a and msg r will arrive at Pk in some order

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 31  When receiving r, P k inspects timestamp vt(r) and decides to postpone delivery until all msgs that causally precede r have been received as well.  In particular, r is delivered only if the following conditions are met 1. vt(r)[j] = V k [j] + 1 2. vt(r)[i] <= V k [i] for all i not equal to j 1. says r is the next msg P k was expecting from P j 2. says P k has seen no msg not seen by P j when it sent r. In particular, P k has already seen message a.

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 32 Controversy  There has been some debate about  whether support for totally-ordered and causally- ordered multicasting should be provided as part of the message-communication layer or  whether applications should handle ordering  Comm layer doesn’t know what it contains, only potential causality  2 msgs from same sender will always be marked as causally related even if they are not  Application developer may not want to think about it

Global State

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 34 Global state of a distributed system  Local state of each process  The messages that are currently in transit (sent but not received)

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 35 Who cares, globally speaking?  When it is known that local computations have stopped and that there are no more messages in transit, the system has obviously entered a state in which no more progress can be made.  deadlocked?  correctly terminated?

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 36 How to record the global state  Distributed snapshot  reflects a state in which the distributed system might have been  reflects a consistent global state  If we have recorded that process P has received a msg from another process Q, then we should also have recorded that process Q had actually sent the msg  The reverse condition (Q has sent a msg that P has not yet received) is allowed.

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 37 Cut!  A cut represents the last event that has been recorded for each of several processes.  All recorded msg receipts have a corresponding recorded send event  An inconsistent cut would have a receipt of a msg but no corresponding send event

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 38 The algorithm (Chandy & Lamport)  Assume the distributed system can be represented as a collection of processes connected to each other through uni-directional point-to-point communication channels.  Any process may initiate the algorithm.  P records its own local state  It sends a marker along each of its outgoing channels, indicating that the receiver should participate in recording the global state ...

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 39 Chandy & Lamport algorithm, cont  When process Q receives the marker through an incoming channel C, its action depends on whether or not it has already saved its local state  If it has not  it first records its local state and also sends a marker along its own outgoing channels  If it has  the marker on channel C is an indicator that Q should record the state of the channel, namely, the sequence of messages received by Q since the last time it recorded its own local state and before it received the marker.

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 40 Chandy & Lamport algorithm, cont  A process has finished its part of the algorithm when it has received a marker along each of its incoming channels and processed each one.  Its recorded local state as well as the state it recorded for each incoming channel, can be collected and sent to the process that initiated the snapshot  The initiator can subsequently analyze the current state  Meanwhile, the distributed system as a whole can continue to run normally

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 41 Photo album  Because any process can initiate the algorithm, the construction of several snapshots may be in progress at the same time  A marker is tagged with the identifier and possibly also a version number of the process that initiated the snapshot  Only after a process has received that marker through each of its incoming channels, can it finish its part in the construction of the marker’s associated snapshot

Application of a snapshot

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 43 Termination Detection  If a process Q receives the marker requesting a snapshot for the first time,  considers the process that sent that marker as its predecessor  When Q completes its part of the snapshot, it sends its predecessor a DONE msg.  By recursion, when the initiator of the distributed snapshot has received a DONE msg from all of its successors, it knows the snapshot has been completely taken

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 44 What if msgs still in transit?  A snapshot may show a global state in which msgs are still in transit  Suppose a process records that it had rec’d msgs along one of its incoming channels  between the point where it had recorded its local state  and the point where it received the marker through that channel  Cannot conclude the distributed computation is completed  Termination requires a snapshot in which all channels are empty

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 45 Modified algorithm  When a process Q finishes its part of a snapshot, it either returns DONE or CONTINUE to its predecessor  A DONE msg is returned only when  All of Q’s successors have returned a DONE msg  Q has not received any msg between the point it recorded its own local state and the point it had received the marker along each of its incoming channels  In all other cases, Q sends a CONTINUE msg to its predecessor

November 2, 2005ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado 46 Modified algorithm, continued  The original initiator of the snapshot will either receive at least one CONTINUE or only DONE msgs from its successors  When only DONE messages are received, it is known that no regular msgs are in transit  Conclusion? The computation has terminated.  If a CONTINUE appears, P initiates another snapshot and continues to do so until only DONE msgs are returned. (There are lots of other algorithms, too.)

Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.

Similar presentations

Presentation on theme: "Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.

Similar presentations

Presentation on theme: "Time and Global States ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder."— Presentation transcript:

Similar presentations

About project

Feedback