Logical Time and Logical Clocks Béat Hirsbrunner References G. Coulouris, J. Dollimore and T. Kindberg "Distributed Systems: Concepts and Design", Ed. 4, Addison-Wesley 2005, Chap. 11.4 Michel Raynal, Mukesh Singhal "Capturing Causality in Distributed Systems", IEEE Computer, Febr. 1996, pp. 49-56 Distributed Systems Béat Hirsbrunner (UniFr), Peter Kropf (UniNe) and Pierre Kuonen (EiaFr) Summer Semester 2007, Lecture 2b, 30 March 2007 nfnfdnfnfn
Causality In relativity, if L causes R, then the order of L and R must be the same for all observers Woudn't this be a nice property for a distributed computer system? nfnfdnfnfn
Logical Time: Basic Observation If two events occurred at the same process pi (i = 1, 2, … N) then they occurred in the order observed by pi, that is in the order When a message m is sent between two processes, the event of sending the message occured before the event of receiving the message: send(m) receive(m) Fig. 11.5. Events occuring at three processes nfnfdnfnfn
Lamport's Happened-Before Relation HB1: for any pair of events e and e’, if there is a process pi such that e i e’, then e e’ HB2: for any pair of events e and e’ and for any message m, if e = send(m) and e’ = receive(m), then e e’ HB3: if e, e’ and e’’ are events and if e e’ and e’ e’’, then e e’’ (transitivity) Remarks. HB defines only a partial order, i.e. not all events are related by the relation . Concurrency: if not (e e’) and not (e' e), events e and e' are concurrent, and are denoted as e || e'. HB captures only potential causality, i.e. two events can be related by even though there is no real connection between them. nfnfdnfnfn
Lamport's Logical Clock In a system of logical clock, every process has a logical clock that is advanced using a set of rules. Lamport's logical clocks capture the happen-before relation : each process pi keeps its own clock Li. Lamport's timestamp of event e at pi is denoted by Li(e). The rules to update Li are: LC1: Li is incremented before each event at pi: Li := Li + 1. LC2a: When a process pi sends a message m, it piggybacks on m the value t := Li. LC2b: On receiving (m,t), a process pj computes Lj := max(Lj, t) and then applies LC1 before timestamping the event receive(m). nfnfdnfnfn
Example Remark 1 Note that e e’ => L(e) < L(e'). Fig. 11.6 Remark 1 Note that e e’ => L(e) < L(e'). Unfortunately, the inverse is not true! Remark 2 L generates only a partial order, i.e. some distincts events have numerically identical Lamport timestamps. A total order can be defined as follows: we timestamps an event e occuring at pi with local timestamp Ti by (Ti, i) and define (Ti,i) < (Tj,j) if and only if Ti < Tj or Ti = Tj and i < j. nfnfdnfnfn
Vector Clocks Vector clocks overcome the shortcoming of Lamport logical clocks: L(e) < L(e’) does not imply e happened before e’. As for Lamport's clock, each process keeps its own vector clock Vi: Vi[i] is the number of events that pi has timestamped, Vi[j], i≠j, is the number of events that have occured at pj that pi has potentially been affected to. The rules to update the vector Vi of N intergers are: VC1: Initially Vi[j] := 0 for i, j = 1, 2, …N. VC2: Just before pi timestamps an event, it sets Vi[i] := Vi[i] +1. VC3: pi piggybacks t := Vi on every message it sends. VC4: when pi receives (m,t) it sets Vi[j] := max(Vi[j] , t[j]) for all j, and then applies VC2 before timestamping the event receive(m). nfnfdnfnfn
Example Theorem: e e’ <=> V(e) < V(e') Fig. 11.7 Vector comparison definition: V = V’ iff V[j] = V’[j] for all j V V’ iff V[j] V’[j] for all j V < V’ iff V V’ and V V’ Example: V(b) < V(d), i.e. b d V(e) is unorded to V(d), i.e. e || d A last remark: matrix clocks have been defined, whereby processes keep estimates of other processes' times as well as their own, cf [Raynal 96]. Theorem: e e’ <=> V(e) < V(e') nfnfdnfnfn