CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks
2 CMPT 431 © A. Fedorova A Distributed Computation With Data Dependencies System A System B System C Ask work from System B Compute Send result to user Ask data from System C Respond to System A A waits for B System C became unresponsive
3 CMPT 431 © A. Fedorova Detecting and Fixing the Problem User gets no response from System A How do we detect that System C is causing the problem? Easy to detect if we get a snapshot of a global state Constructing global state in an asynchronous distributed system is a difficult problem
4 CMPT 431 © A. Fedorova A Naïve Approach Each system records each event it performed and its timestamp Suppose events in the this system happened in this real order: –Time Tc0: System C sent data to System B (before C stopped responding) –Time Ta0: System A asked for work from System B –Time Tb0: System B asked for data from System C Tc0Ta0Tb0
5 CMPT 431 © A. Fedorova A Naïve Approach (cont) Ideally, we will construct real order of events from local timestamps and detect this dependency chain: System A System B System C System C sent data Tc Ta System A asked for work Tb System B asked for data
6 CMPT 431 © A. Fedorova A Naïve Approach (cont) But in reality, we do not know if Tc occurred before Ta and Tb, because in an asynchronous distributed system clocks are not synchronized! System A System B System C System C sent data Tc Ta System A asked for work Tb System B asked for data System C sent data Tc
7 CMPT 431 © A. Fedorova This Lecture’s Topic We will learn how to solve this problem in an asynchronous distributed system Our goal is to construct a consistent global state in presence of unsynchronized local clocks We will begin by creating a formal model for –Distributed computation –Event precedence in a distributed computation We will use event precedence model to define consistent global state Then we will learn to construct consistent global states
8 CMPT 431 © A. Fedorova Model of a Distributed Computation Collection of processes p 0, …, p n Processes communicate by sending messages Event send(m) enqueues message m on the sending end Event receive(m) dequeues message m on the receiving end Events e i 0, e i 1, …, e i n are steps of a distributed computation at process p i h i = e i 0, e i 1, …, e i n is a local history of p i
9 CMPT 431 © A. Fedorova Events and Causal Precedence In an asynchronous distributed system there is no notion of global clock Events cannot be ordered according to some real order We can only talk about causal event ordering, that is if one event affects the outcome of another event For example events send(m) and receive(m) are in causal order, because receive(m) could not have occurred before send(m)
10 CMPT 431 © A. Fedorova Formal Rules for Ordering of Events If local events precede one another, then they also precede one another in the global ordering: –If e i k,e i m Є h i and k < m, then e i k →e i m Sending a message always precedes receipt of that message: –If e i = send(m) and e j = receive(m), then e i →e j Event ordering is associative: –If e → e’ and e’ → e”, then e → e”
11 CMPT 431 © A. Fedorova Space-time Diagram of a Distributed Computation p1p1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 p2p2 e21e21 e22e22 e23e23 p3p3 e31e31 e32e32 e33e33 e34e34 e35e35 e36e36 e21→e36e21→e36 e 2 2 || e 3 6
12 CMPT 431 © A. Fedorova Cuts of a Distributed Computation Suppose there is an external monitor process External monitor constructs a global state: –Asks processes to send it local history Global state constructed from these local histories is a cut of a distributed computation
13 CMPT 431 © A. Fedorova Example Cuts e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 e21e21 e22e22 e23e23 e31e31 e32e32 e33e33 e34e34 e35e35 e36e36 C C’ p1p1 p2p2 p3p3
14 CMPT 431 © A. Fedorova Consistent vs. Inconsistent Cuts A cut is consistent if for any event e included in the cut, any event e’ that causally precedes e is also included in that cut For cut C: (e Є C) Λ (e’→ e) => e’ Є C
15 CMPT 431 © A. Fedorova Are These Cuts Consistent? e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 e21e21 e22e22 e23e23 e31e31 e32e32 e33e33 e34e34 e35e35 e36e36 C p1p1 p2p2 p3p3
16 CMPT 431 © A. Fedorova Are These Cuts Consistent? e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 e21e21 e22e22 e23e23 e31e31 e32e32 e33e33 e34e34 e35e35 e36e36 C’ inconsistent included in C causally precedes e 3 6 …but not included in C
17 CMPT 431 © A. Fedorova Cuts and Global State Recall that our goal is to construct a consistent global state We’d like to get a snapshot of a distributed system as would have been observed by a perfect observer A consistent cut corresponds to a consistent global state What’s next: we will learn how to construct consistent cuts in the absence of synchronized clocks Consistent global states can be constructed via: –Active monitoring –Passive monitoring
18 CMPT 431 © A. Fedorova Passive vs. Active Monitoring Passive monitoring: –There is a process p 0 external to the distributed computation –There are processes p i (1 ≤ i ≤ n), part of the computation –Each process p i sends to p 0 a timestamp of each event it executes Active monitoring: –There is a process p 0 external to the distributed computation –When p 0 desires to find out the state of the system it asks all processes p i to send its state
19 CMPT 431 © A. Fedorova Passive vs. Active Monitoring You need both passive and active monitoring The kind of monitoring you use depends on what property of a distributed system you are trying to detect Passive monitoring gives you more information about the distributed computation than active monitoring –Passive monitoring records the entire event history Passive monitoring uses l ogical clocks or vector clocks – a way to timestamp an event in the absence of global clock Active monitoring does not depend of special clocks, but it is also less powerful
20 CMPT 431 © A. Fedorova What Do We Need to Know to Construct a Consistent Cut? e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 e21e21 e22e22 e23e23 e31e31 e32e32 e33e33 e34e34 e35e35 e36e36 C’ inconsistent included in C causally precedes e 3 6 …but not included in C We must know the causal ordering of events. If we do we can detect an inconsistent cut
21 CMPT 431 © A. Fedorova Constructing a Consistent Cut Via Passive Monitoring We must know the causal ordering of event If there was a global clock, we would detect this as follows: RC(e) – timestamp of event e given by a global clock Clock condition: e’→ e => RC(e) < RC(e’) What to do if there is no global clock? We will use logical clocks and vector clocks
22 CMPT 431 © A. Fedorova Logical Clocks Each process maintains a local value of a logical clock LC Logical clock of process p counts how many events in a distributed computation causally preceded the current event at p (including the current event). LC(e i ) – the logical clock value at process p i at event e i Suppose we had a distributed system with only a single process e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 LC=1LC=2LC=3LC=4LC=5LC=6
23 CMPT 431 © A. Fedorova Logical Clocks (cont.) In a system with more than one process logical clocks are updated as follows: Each message m that is sent contains a timestamp TS(m) TS(m) is the logical clock value associated with sending event at the sending process e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 LC=1 send(m) TS(m) = 1
24 CMPT 431 © A. Fedorova Logical Clocks (cont) When the receiving process receives message m, it updates its logical clock to: max{LC, TS(m)} + 1 e11e11 e12e12 e13e13 e14e14 e15e15 e16e16 LC=1 send(m) TS(m) = 1 e22e22 What is the LC value of e 2 2 ? LC=2 e21e21 LC=1
25 CMPT 431 © A. Fedorova Illustration of a Logical Clock \ p1p1 p1p1 p1p1
26 CMPT 431 © A. Fedorova Causal Delivery with Logical Clocks We have a external monitor process p 0 Each process p i sends an event notification message m to p 0 after each event e i To let p 0 construct a correct causal ordering of events, we define the following delivery rule for event notification messages: Causal delivery rule: Event notification messages are delivered to p 0 in the increasing logical clock timestamp order
27 CMPT 431 © A. Fedorova Causal Delivery Rule p1p1 p2p2 p0p0 m(LC=1) m(LC=2)m(LC=4) m(LC=3) 1234
28 CMPT 431 © A. Fedorova Gap Detection Rule p1p1 p2p2 p0p0 m(LC=1) m(LC=2)m(LC=4) m(LC=3) 1234 m(LC=1) Oops! Must not deliver a message unless 100% sure that all messages with smaller timestamps have been received
29 CMPT 431 © A. Fedorova Rules to Achieve Causal Delivery Rule #1: Do not deliver event notification message from a process p i unless all messages with smaller timestamps have been delivered (FIFO delivery) Rule #2: Do not deliver to p event notification message with a timestamp TS(m) unless p received messages with timestamps one smaller, equal or greater than TS(m) from all other processes
30 CMPT 431 © A. Fedorova Causal Delivery Rules p1p1 p2p2 p0p0 m(LC=1) m(LC=2)m(LC=4) m(LC=3) 1 m(LC=1) 12
31 CMPT 431 © A. Fedorova Causal Precedence Relation Detecting causal precedence relationship with logical clocks requires sending more than just timestamps!
32 CMPT 431 © A. Fedorova Causal Precedence or Concurrency? p1p1 p2p2 p0p e11e11 e22e22 e 1 1 and e 2 2 are concurrent
33 CMPT 431 © A. Fedorova Causal Precedence or Concurrency? p1p1 p2p2 p0p e11e11 e22e22 Now e 1 1 and e 2 2 are causally related But timestamps have not changed!!!
34 CMPT 431 © A. Fedorova Causal Precedence Relation Using just timestamps from logical clocks does not help us detect whether two events are causally related or concurrent We could fix this by including a local history in the event notification message Not a good idea: messages will become large A better idea: vector clocks
35 CMPT 431 © A. Fedorova Introduction to Vector Clocks The clock of each process is represented by a vector The vector dimension is the number of processes in the system Each entry of a vector clock corresponds to a process Suppose there are three processes: p 1, p 2, and p 3 Then the vector clock at each process will look like this: VC[1] – entry for process p 1 VC[2] – entry for process p 2 VC[3] – entry for process p 3
36 CMPT 431 © A. Fedorova Using Vector Clocks p1p1 p2p2 VC[1] = 0 VC[2] = 0 VC[1] = 0 VC[2] = 0 Initialize to 0 VC[1] = 0 VC[2] = 1 VC[1] = 1 VC[2] = 0 Increment local component to “1” for an internal of “send” event VC[1] = 1 VC[2] = 2 VC[1] = 1 VC[2] = 0 Arrives with the message Increment other processes’ components to be no less than at the incoming VC Increment the local component
37 CMPT 431 © A. Fedorova The Meaning of Vector Clocks An entry of a vector clock for process p i is the number of events at p i VC[1] = 3 – number of events at p 1 VC[2] = 0 – number of events at p 2 VC[3] = 1 – number of events at p 3 p 1 knows for sure how many events it executed But how can it be sure about p 2 and p 3 ? It can’t! So p 1 ‘s entries for p 2 and p 3 are... what p 1 thinks about p 2 and p 3
38 CMPT 431 © A. Fedorova Meaning of Vector Clocks (cont) p 1 thinks incorrectly about other processes if p 1 does not communicate with other processes Once p 1 exchanges messages with other processes, it updates its information about other processes using vector clock timestamps received from other processes
39 CMPT 431 © A. Fedorova Another Vector Clock Example p1p1 p2p2 p3p3 VC[1] = 0 VC[2] = 0 VC[3] = 1 VC[1] = 1 VC[2] = 0 VC[3] = 0 VC[1] = 0 VC[2] = 1 VC[3] = 0 VC[1] = 2 VC[2] = 1 VC[3] = 0 VC[1] = 1 VC[2] = 0 VC[3] = 2 VC[1] = 1 VC[2] = 0 VC[3] = 3 VC[1] = 3 VC[2] = 1 VC[3] = 3 VC[1] = 1 VC[2] = 0 VC[3] = 4 VC[1] = 1 VC[2] = 2 VC[3] = 4 VC[1] = 4 VC[2] = 1 VC[3] = 3 VC[1] = 4 VC[2] = 3 VC[3] = 4 VC[1] = 1 VC[2] = 0 VC[3] = 5 VC[1] = 5 VC[2] = 1 VC[3] = 3 VC[1] = 5 VC[2] = 1 VC[3] = 6 VC[1] = 6 VC[2] = 1 VC[3] = 3
40 CMPT 431 © A. Fedorova Causal Precedence or Concurrency? p1p1 p2p2 p0p0 e11e11 e22e22 e 1 1 and e 2 2 are concurrent VC[1] = 1 VC[2] = 0 VC[1] = 0 VC[2] = 1 VC[1] = 0 VC[2] = 2 VC[1] = 2 VC[2] = 0 VC[1] = 1 VC[2] = 0 VC[1] = 0 VC[2] = 1 VC[1] = 0 VC[2] = 2 VC[1] = 2 VC[2] = 0
41 CMPT 431 © A. Fedorova Causal Precedence or Concurrency? p1p1 p2p2 p0p0 e11e11 e22e22 Now e 1 1 and e 2 2 are causally related VC[1] = 1 VC[2] = 0 VC[1] = 0 VC[2] = 1 VC[1] = 1 VC[2] = 2 VC[1] = 2 VC[2] = 0 VC[1] = 1 VC[2] = 0 VC[1] = 0 VC[2] = 1 VC[1] = 1 VC[2] = 2 VC[1] = 2 VC[2] = 0
42 CMPT 431 © A. Fedorova Vector Clocks Detect Causal Precedence Let us look at timestamps received by observer p 0 : In case when events were concurrent: –From p1: –From p2: In case when events where causally related: –From p1: –From p2: VC[1] = 2 VC[2] = 0 VC[1] = 0 VC[2] = 2 VC[1] = 2 VC[2] = 0 VC[1] = 1 VC[2] = 2 Now the timestamps help detect causal precedence
43 CMPT 431 © A. Fedorova Use Vector Clocks to Construct Consistent Cuts A consistent cut is a cut that does not include pairwise inconsistent events So the first step is to understand pairwise inconsistency
44 CMPT 431 © A. Fedorova Informal Definition of Pairwise Inconsistency Look at vector timestamps of two events: e i at p i and e j at p j If at event e i process p i thinks that process p j executed more events than process p j thinks it has executed at event e j, the events are pairwise inconsistent Example: timestamp at p 3 timestamp at p 1 –The events are pairwise inconsistent –Process p 1 must know better about what it’s doing than process p 3 could ever know. VC[1] = 5 VC[2] = 1 VC[3] = 6 VC[1] = 4 VC[2] = 1 VC[3] = 3 p 3 thinks that p 1 has executed 5 events p 1 has executed 4 events!
45 CMPT 431 © A. Fedorova Formal Definition of Pairwise Inconsistency Two events e i and e j, are pairwise inconsistent if: (VC(e i )[i] < VC(e j )[i]) ٧ (VC(e j )[j] < VC(e i )[j]) the number of events p i thinks to have executed at event e i the number of events p j thinks that p i has executed at event e j
46 CMPT 431 © A. Fedorova Cut Consistency Among all events in the frontier of the cut, check each pair of events for pairwise inconsistency If at least two events are pairwise inconsistent, then the cut is inconsistent
47 CMPT 431 © A. Fedorova Is this Cut Consistent? p1p1 p2p2 p3p3 VC[1] = 0 VC[2] = 0 VC[3] = 1 VC[1] = 1 VC[2] = 0 VC[3] = 0 VC[1] = 0 VC[2] = 1 VC[3] = 0 VC[1] = 2 VC[2] = 1 VC[3] = 0 VC[1] = 1 VC[2] = 0 VC[3] = 2 VC[1] = 1 VC[2] = 0 VC[3] = 3 VC[1] = 3 VC[2] = 1 VC[3] = 3 VC[1] = 1 VC[2] = 0 VC[3] = 4 VC[1] = 1 VC[2] = 2 VC[3] = 4 VC[1] = 4 VC[2] = 1 VC[3] = 3 VC[1] = 4 VC[2] = 3 VC[3] = 4 VC[1] = 1 VC[2] = 0 VC[3] = 5 VC[1] = 5 VC[2] = 1 VC[3] = 3 VC[1] = 5 VC[2] = 1 VC[3] = 6 VC[1] = 6 VC[2] = 1 VC[3] = 3
48 CMPT 431 © A. Fedorova And What About This One? p1p1 p2p2 p3p3 VC[1] = 0 VC[2] = 0 VC[3] = 1 VC[1] = 1 VC[2] = 0 VC[3] = 0 VC[1] = 0 VC[2] = 1 VC[3] = 0 VC[1] = 2 VC[2] = 1 VC[3] = 0 VC[1] = 1 VC[2] = 0 VC[3] = 2 VC[1] = 1 VC[2] = 0 VC[3] = 3 VC[1] = 3 VC[2] = 1 VC[3] = 3 VC[1] = 1 VC[2] = 0 VC[3] = 4 VC[1] = 1 VC[2] = 2 VC[3] = 4 VC[1] = 4 VC[2] = 1 VC[3] = 3 VC[1] = 4 VC[2] = 3 VC[3] = 4 VC[1] = 1 VC[2] = 0 VC[3] = 5 VC[1] = 5 VC[2] = 1 VC[3] = 3 VC[1] = 5 VC[2] = 1 VC[3] = 6 VC[1] = 6 VC[2] = 1 VC[3] = 3
49 CMPT 431 © A. Fedorova Active Monitoring Recall: we used vector clocks so we can construct consistent global states (or cuts) from processes’ local histories via passive monitoring Remember, there is another type of monitoring – active monitoring: –Does not depend of vector clocks, but it is less powerful Active monitoring is also referred to as constructing a distributed snapshot
50 CMPT 431 © A. Fedorova Distributed Snapshot Protocol (Chandy and Lamport) Monitor p 0 sends “take snapshot” message to all processes p i If p i receives such “take snapshot” message for the first time, it: –Records its state –Stops doing any activity related to the distributed computation –Relays the “take snapshot” message on all of its outgoing channels –Starts recording a state of its incoming channels – records all messages that have been delivered after the receipt of the first “take snapshot” message
51 CMPT 431 © A. Fedorova Distributed Snapshot Protocol (Chandy and Lamport)(cont.) When p i receives a “take snapshot” message for the second time, say from process p j, it stops recording the state of the channel between itself and p j When p i receives “take snapshot” message beyond the first one from all processes in a distributed computation, it stops recording the snapshot and sends it to p o
52 CMPT 431 © A. Fedorova Properties of Chandy-Lamport Protocol Let’s see why this protocol constructs only consistent snapshots, provided that FIFO channels are used Recall the key property of a consistent snapshot (i.e., consistent cut): –If event (e → e”) and e” is included in the snapshot, then e must also be included in that snapshot Let’s see why this property holds if we use the Chandy- Lamport distributed snapshot protocol
53 CMPT 431 © A. Fedorova Properties of Chandy-Lamport Protocol e11e11 e21e21 “take snapshot” message p1p1 p2p2 p0p0 This is the only situation when e 2 1 would be included in the snapshot and e 1 1 would not be Suppose p 1 finished taking the snapshot before p 2 received the first “take snapshot” message Let’s see why this won’t happen snapshot finish Can it happen that e 2 1 is included in the snapshot and e 1 1 is not?
54 CMPT 431 © A. Fedorova Properties of Chandy-Lamport Protocol e11e11 e21e21 p1p1 p2p2 p0p0 Recall that processes relay the first “take snapshot” message on all outgoing channels So p 1 cannot have finished taking the snapshot before p 2 received the first “take snapshot” message. By protocol, p 1 cannot finish the snapshot until it receives the second “take snapshot” message from all other processes snapshot finish So that situation will not happen snapshot finish
55 CMPT 431 © A. Fedorova Using Global States for Evaluating System Properties We now know how to construct consistent global states using active and passive monitoring Let us look how consistent global states can be used to solve problems in distributed systems Global state is constructed in order to evaluate a predicate: –“Is the system in deadlock?” –“Is the memory unreachable (can it be garbage collected)?” –“Does x=y?” (For distributed debugging)
56 CMPT 431 © A. Fedorova Stable and Unstable Predicates Predicates can be stable and unstable A stable predicate does not change its value after the system has reached this state –Deadlock – once the system deadlocks it stays there –Garbage collection – once the memory is no longer referenced, no one could acquire a reference for it An unstable predicate can change its value –Does x=y? If this were true at one point, it could no longer be true later Stable predicates are easy to check – they can be checked from distributed snapshots or vector clock based causal event histories
57 CMPT 431 © A. Fedorova Unstable Predicates Are Harder to Check e12e12 e21e21 p1p1 p2p2 VC[1] = 1 VC[2] = 0 e11e11 VC[1] = 0 VC[2] = 1 x = 3x = 4 e13e13 e22e22 y = 6 y = 4 VC[1] = 2 VC[2] = 1 VC[1] = 0 VC[2] = 2 Want to evaluate predicate: (y-x) = 2 An external monitor knows causal ordering of events, not the actual Valid actual sequences are: 1.e 1 1 e 2 1 e 1 2 e 1 3 e e 1 1 e 2 1 e 1 2 e 2 2 e 1 3 They are both valid because they preserve causal ordering e 2 1 → e 1 2
58 CMPT 431 © A. Fedorova Unstable Predicates e12e12 e21e21 p1p1 p2p2 VC[1] = 1 VC[2] = 0 e11e11 VC[1] = 0 VC[2] = 1 x = 3x = 4 e13e13 e22e22 y = 6 y = 4 VC[1] = 2 VC[2] = 1 VC[1] = 0 VC[2] = 2 Valid actual sequences: e 1 1 e 2 1 e 1 2 e 1 3 e 2 2 y –x = 2 is TRUE after e 1 3 e 1 1 e 2 1 e 1 2 e 2 2 e 1 3 y –x = 2 is NOT TRUE y = 6x = 4 y = 6 y = 4x = 4 x = 3
59 CMPT 431 © A. Fedorova Detecting “Possibly True” With unstable predicates you cannot always detect whether a predicate was definitely true But you can detect if it was possibly true Need to look at all possible valid sequences of events If the predicate is true in at least one of them, then it is possibly true Detecting “possibly true” is useful for distributed debugging –If a wrong condition is possibly true, the system has a bug –Even if the bug was not triggered it is possible that it will be triggered under a different ordering of events
60 CMPT 431 © A. Fedorova Detecting “Definitely True” Sometimes you can detect definitely true Construct all valid system states If the condition is true in all of them, then the condition is definitely true Any actual ordering of events would have triggered the condition, given the correct causal ordering
61 CMPT 431 © A. Fedorova Summary Observing a global state of the system is useful for: –Distributed debugging –Distributed garbage collection –Deadlock detection Constructing a consistent global state is difficult in absence of global clocks Consistent global states can be constructed using –Active monitoring (distributed snapshots) –Passive monitoring (local event histories with vector clock timestamps)
62 CMPT 431 © A. Fedorova Summary (cont.) Distributed snapshots have limited use –Can be used to evaluate stable predicates Local event histories with vector clock timestamps contain more information – one can reconstruct a causal history of a distributed computation –Can be used to evaluate unstable predicates Unstable predicates cannot always be evaluated with certainty –One can always evaluate “possibly true” –Cannot always evaluate “definitely true”