Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

Global States.
Logical Clocks (2).
Logical Clocks.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
CS542 Topics in Distributed Systems Diganta Goswami.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Consistency and Replication (3). Topics Consistency protocols.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Systems Spring 2009
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Time, Clocks and the Ordering of Events in a Distributed System - by Leslie Lamport.
EEC-681/781 Distributed Computing Systems Lecture 10 Wenbing Zhao Cleveland State University.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Synchronization Chapter 6 Part I Clock Synchronization & Logical clocks Part II Mutual Exclusion Part III Election Algorithms Part IV Transactions.
Logical Clocks (2). Topics r Logical clocks r Totally-Ordered Multicasting r Vector timestamps.
Computer Science 425 Distributed Systems (Fall 2009) Lecture 5 Multicast Communication Reading: Section 12.4 Klara Nahrstedt.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chapter 5.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Page 1 Logical Clocks Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
Synchronization Chapter 5.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Massachusetts Computer Associates,Inc. Presented by Xiaofeng Xiao.
Distributed Coordination. Turing Award r The Turing Award is recognized as the Nobel Prize of computing r Earlier this term the 2013 Turing Award went.
CIS825 Lecture 2. Model Processors Communication medium.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Distributed Systems Topic 5: Time, Coordination and Agreement
D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
6.2 Logical Clocks Kranthi Koya09/23/2015. Overview Introduction Lamport’s Logical Clocks Vector Clocks Research Areas Conclusion.
Fault Tolerance (2). Topics r Reliable Group Communication.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Logical time Causality between events is fundamental to the design of parallel and distributed systems. In distributed systems, it is not possible to have.
CSC 8320 Advanced Operating System
Overview of Ordering and Logical Time
EECS 498 Introduction to Distributed Systems Fall 2017
Time And Global Clocks CMPT 431.
Event Ordering.
Chapter 5 (through section 5.4)
Lecture 9: Ordered Multicasting
CSE 486/586 Distributed Systems Reliable Multicast --- 2
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Outline Theoretical Foundations
Presentation transcript:

Logical Clocks

Topics Logical clocks Totally-Ordered Multicasting Vector timestamps

Readings Van Steen and Tanenbaum: 5.2 Coulouris: 10.4 L. Lamport, “Time, Clocks and the Ordering of Events in Distributed Systems,” Communications of the ACM, Vol. 21, No. 7, July 1978, pp C.J. Fidge, “Timestamps in Message-Passing Systems that Preserve the Partial Ordering”, Proceedings of the 11 th Australian Computer Science Conference, Brisbane, pp , February 1988.

Ordering of Events For many applications, it is sufficient to be able to agree on the order that events occur and not the actual time of occurrence. It is possible to use a logical clock to unambiguously order events May be totally unrelated to real time. Lamport showed this is possible (1978).

The Happened-Before Relation Lamport’s algorithm synchronizes logical clocks and is based on the happened-before relation: u a  b is read as “a happened before b” The definition of the happened-before relation: u If a and b are events in the same process and a occurs before b, then a  b u For any message m, send(m) send(m)  rcv(m), where send(m) is the event of sending the message and rcv(m) is event of receiving it. u If a, b and c are events such that a  b and b  c then a  c

The Happened-Before Relation If two events, x and y, happen in different processes that do not exchange messages, then x  y is not true, but neither is y  x The happened-before relation is sometimes referred to as causality.

Example Say in process P 1 you have a code segment as follows: 1.1 x = 5; 1.2 y = 10*x; 1.3 send(y,P 2 ); Say in process P 2 you have a code segment as follows: 2.1 a=8; 2.2 b=20*a; 2.3 rcv(y,P 1 ); 2.4 b = b+y; Let’s say that you start P 1 and P 2 at the same time. You know that 1.1 occurs before 1.2 which occurs before 1.3; You know that 2.1 occurs before 2.2 which occurs before 2.3 which is before 2.4. You do not know if 1.1 occurs before 2.1 or if 2.1 occurs before 1.1. You do know that 1.3 occurs before 2.3 and 2.4

Example Continuing from the example on the previous page – The order of actual occurrence of operations is often not consistent from execution to execution. For example: u Execution 1 (order of occurrence): 1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 2.4 u Execution 2 (order of occurrence): 2.1,2.2,1.1,1.2,1.3, 2.3,2.4 u Execution 3 (order of occurrence) 1.1, 2.1, 2.2, 1.2, 1.3, 2.3, 2.4 We can say that 1.1 “happens before” 2.3, but not that 1.1 “happens before” 2.2 or that 2.2 “happens before” 1.1. Note that the above executions provide the same result.

Lamport’s Algorithm We need a way of measuring time such that for every event a, we can assign it a time value C(a) on which all processes agree on the following: u The clock time C must monotonically increase i.e., always go forward. u If a  b then C(a) < C(b) Each process, p, maintains a local counter C p The counter is adjusted based on the rules presented on the next page.

Lamport’s Algorithm 1. C p is incremented before each event is issued at process p: C p = C p When p sends a message m, it piggybacks on m the value t=C p 3. On receiving (m,t), process q computes C q = max(C q,t) and then applies the first rule before timestamping the event rcv(m).

Example a b P1P2 P3 c d e f g h i j k l Assume that each process’s logical clock is set to 0

Example a b P1P2 P3 c d e f g h i j k l Assume that each process’s logical clock is set to

Example From the timing diagram on the previous slide, what can you say about the following events? u Between a and b: a  b u Between b and f: b  f u Between e and k: concurrent u Between c and h: concurrent u Between k and h: k  h

Total Order A timestamp of 1 is associated with events a, e, j in processes P 1, P 2, P 3 respectively. A timestamp of 2 is associated with events b, k in processes P 1, P 3 respectively. The times may be the same but the events are distinct. We would like to create a total order of all events i.e. for an event a, b we would like to say that either a  b or b  a

Total Order Create total order by attaching a process number to an event. P i timestamps event e with C i (e).i We then say that C i (a).i happens before C j (b).j iff: u C i (a) < C j (b); or u C i (a) = C j (b) and i < j

Example (total order) a b P1P2 P3 c d e f g h i j k l Assume that each process’s logical clock is set to

Example: Totally-Ordered Multicast Application of Lamport timestamps (with total order) Scenario u Replicated accounts in New York(NY) and San Francisco(SF) u Two transactions occur at the same time and multicast n Current balance: $1,000 n Add $100 at SF n Add interest of 1% at NY n If not done in the same order at each site then one site will record a total amount of $1,111 and the other records $1,110.

Example: Totally-Ordered Multicasting Updating a replicated database and leaving it in an inconsistent state.

Example: Totally-Ordered Multicasting We must ensure that the two update operations are performed in the same order at each copy. Although it makes a difference whether the deposit is processed before the interest update or the other way around, it does matter which order is followed from the point of view of consistency. We need totally-ordered multicast, that is a multicast operation by which all messages are delivered in the same order to each receiver. u NOTE: Multicast refers to the sender sending a message to a collection of receivers.

Example: Totally Ordered Multicast Algorithm u Update message is timestamped with sender’s logical time u Update message is multicast (including sender itself) u When message is received n It is put into local queue n Ordered according to timestamp, n Multicast acknowledgement

Example:Totally Ordered Multicast Message is delivered to applications only when u It is at head of queue u It has been acknowledged by all involved processes u P i sends an acknowledgement to P j if n P i has not made an update request n P i ’s identifier is greater than P j ’s identifier n P i ’s update has been processed; Lamport algorithm (extended for total order) ensures total ordering of events

Example: Totally Ordered Multicast On the next slide m corresponds to “Add $100” and n corresponds to “Add interest of 1%”. When sending an update message (e.g., m, n) the message will include the timestamp generated when the update was issued.

Example: Totally Ordered Multicast San Francisco (P1) New York (P2) Issue m Send m Recv n Issue n Send n Recv m 3.1

Example: Totally Ordered Multicast The sending of message m consists of sending the update operation and the time of issue which is 1.1 The sending of message n consists of sending the update operation and the time of issue which is 1.2 Messages are multicast to all processes in the group including itself. u Assume that a message sent by a process to itself is received by the process almost immediately. u For other processes, there may be a delay.

Example: Totally Ordered Multicast At this point, the queues have the following: u P1: (m,1.1), (n,1.2) u P2: (m,1.1), (n,1.2) P1 will multicast an acknowledgement for (m,1.1) but not (n,1.2). u Why? P1’s identifier is higher then P2’s identifier and P1 has issued a request u 1.1 < 1.2 P2 will multicast an acknowledgement for (m,1.1) and (n,1.2) u Why? P2’s identifier is not higher then P1’s identifier u 1.1 < 1.2

Example: Totally Ordered Multicast P1 does not issue an acknowledgement for (n,1.2) until operation m has been processed. u 1< 2 Note: The actual receiving by P1 of message (n,1.2) is assigned a timestamp of 3.1. Note: The actual receiving by P2 of message (m,1.1) is assigned a timestamp of 3.2

Example: Totally Ordered Multicast If P2 gets (n,1.2) before (m,1.1) does it still multicast an acknowledgement for (n,1.2)? u Yes! At this point, how does P2 know that there are other updates that should be done ahead of the one it issued? u It doesn’t; u It does not proceed to do the update specified in (n,1.2) until it gets an acknowledgement from all other processes which in this case means P1. Does P2 multicast an acknowledgement for (m,1.1) when it receives it? u Yes, it does since 1 < 2

Example: Totally Ordered Multicast San Francisco (P1) New York (P2) Issue m Send m Recv n Issue n Send n Recv m Send ack(m) Recv ack(m) Note: The figure does not show a process sending a message to itself or the multicast acks that it sends for the updates it issues.

Example: Totally Ordered Multicast To summarize, the following messages have been sent: u P1 and P2 have issued update operations. u P1 has multicasted an acknowledgement message for (m,1.1). u P2 has multicasted acknowledgement messages for (m,1.1), (n,1.2). P1 and P2 have received an acknowledgement message from all processes for (m,1.1). Hence, the update represented by m can proceed in both P1 and P2.

Example: Totally Ordered Multicast San Francisco (P1) New York (P2) Issue m Send m Recv n Issue n Send n Recv m Send ack(m) Recv ack(m) Process m Note: The figure does not show the sending of messages it oneself Process m

Example: Totally Ordered Multicast When P1 has finished with m, it can then proceed to multicast an acknowledgement for (n,1.2). When P1 and P2 both have received this acknowledgement, then it is the case that acknowledgements from all processes have been received for (n,1.2). At this point, it is known that the update represented by n can proceed in both P1 and P2.

Example: Totally Ordered Multicast San Francisco (P1) New York (P2) Issue m Send m Recv n Issue n Send n Recv m Send ack(m) 6.1 Send ack(n) Recv ack(m) 7.2 Recv ack(n) Process m Process n Process m

Example: Totally Ordered Multicast What if there was a third process e.g., P3 that issued an update (call it o) at about the same time as P1 and P2. The algorithm works as before. u P1 will not multicast an acknowledgement for o until m has been done. u P2 will not multicast an acknowledgement for o until n has been done. Since an operation can’t proceed until acknowledgements for all processes have been received, o will not proceed until n and m have finished.

Problems with Lamport Clocks Lamport timestamps do not capture causality. With Lamport’s clocks, one cannot directly compare the timestamps of two events to determine their precedence relationship. u If C(a) < C(b) is not true then a  b is also not true. u Knowing that C(a) < C(b) is true does not allow us to conclude that a  b is true. u Example: In the first timing diagram, C(e) = 1 and C(b) = 2; thus C(e) < C(b) but it is not the case that e  b

Problem with Lamport Clocks The main problem is that a simple integer clock cannot order both events within a process and events in different processes. C. Fidge developed an algorithm that overcomes this problem. Fidge’s clock is represented as a vector [v 1,v 2,…,v n ] with an integer clock value for each process (v i contains the clock value of process i). This is a vector timestamp.

Fidge’s Algorithm Properties of vector timestamps v i [i] is the number of events that have occurred so far at P i If v i [j] = k then P i knows that k events have occurred at P j

Fidge’s Algorithm The Fidge’s logical clock is maintained as follows: 1. Initially all clock values are set to the smallest value (e.g., 0). 2. The local clock value is incremented at least once before each primitive event in a process i.e., v i [i] = v i [i] The current value of the entire logical clock vector is delivered to the receiver for every outgoing message. 4. Values in the timestamp vectors are never decremented.

Fidge’s Algorithm 5. Upon receiving a message, the receiver sets the value of each entry in its local timestamp vector to the maximum of the two corresponding values in the local vector and in the remote vector received. u Let v q be piggybacked on the message sent by process q to process p; We then have: n For i = 1 to n do v p [i] = max(v p [i], v q [i] ); v p [p] = v p [p] + 1;

Fidge’s Algorithm For two vector timestamps, v a and v b u v a is not equal to v b if there exists an i such that v a [i] is not equal to v b [i] u v a <= v b if for all i v a [i] <= v b [i] u v a < v b if for all i v a [i] < = v b [i] AND v a is not equal to v b Events a and b are causally related if v a < v b or v b < v a.

Example P2 a b P1 c d e f g h i P3 j k l [1,0,0] [2,0,0] [3,0,0] [4,0,0] [0,1,0] [2,2,0] [2,3,2] [2,4,2] [4,5,2] [0,0,1] [0,0,2] [0,0,3]

Vector Timestamps and Causality We have looked at total order of messages where all messages are processed in the same order at each process. It is possible to have any order where all you care about is that a message reaches all processes, but you don’t care about the order of execution. Causal order is used when a message received by a process can potentially affect any subsequent message sent by that process. Those messages should be received in that order at all processes. Unrelated messages may be delivered in any order.

Causality and Modified Vector Timestamps With a slight adjustment, vector timestamps can be used to guarantee causal message delivery. We will illustrate this adjustment, the definition of causality and the motivation through an example.

Example Application:Bulletin Board The Internet’s electronic bulletin board service (network news) Users (processes) join specific groups (discussion groups). Postings, whether they are articles or reactions, are multicast to all group members. Could use a totally-ordered multicasting scheme.

Display from a Bulletin Board Program Users run bulletin board applications which multicast messages One multicast group per topic (e.g. os.interesting) Require reliable multicast - so that all members receive messages Ordering: Bulletin board: os.interesting Item FromSubject 23A.HanlonMach 24G.JosephMicrokernels 25A.HanlonRe: Microkernels 26T.L’HeureuxRPC performance 27M.WalkerRe: Mach end Figure Colouris total (makes the numbers the same at all sites) FIFO (gives sender order) causal (makes replies come after original message)

Example Application: Bulletin Board A totally-ordered multicasting scheme does not imply that if message B is delivered after message A, that B is a reaction to A. Totally-ordered multicasting is too strong in this case. The receipt of an article causally precedes the posting of a reaction. The receipt of the reaction to an article should always follow the receipt of the article.

Example Application: Bulletin Board If we look at the bulletin board example, it is allowed to have items 26 and 27 in different order at different sites. Items 25 and 26 may be in different order at different sites.

Example Application: Bulletin Board Vector timestamps can be used to guarantee causal message delivery. A slight variation of Fidge’s algorithm is used. Each process P i has an array V i where V i [j] denotes the number of events that process P i knows have taken place.

Example Application: Bulletin Board Vector timestamps are assumed to be updated only when posting or receiving articles i.e., when a message is sent or received. Let V q be piggybacked on the message sent by process q to process p; When p receives the message, then p does the following: n For i = 1 to n do Vp [i] = max(V p [i], V q [i] ); When p sends a message, it does the following: V p [p] = V p [p] + 1;

Example Application: Bulletin Board When a process P i posts an article, it multicasts that article as a message with the vector timestamp. Let’s call this message a. Assume that the value of the timestamp is V i Process P j posts a reaction. Let’s call this message r. Assume that the value of the timestamp is V j Note that V j > V i Message r may arrive at P k before message a.

Example Application: Bulletin Board P k will postpone delivery of r to the display of the bulletin board until all messages that causally precede r have been received as well. Message r is delivered iff the following conditions are met: u V j [j] = V k [j]+1 n This states that r is the next message that P k was expecting from process P j u V j [i] <= V k [i] for all i not equal to j n This states that P k has seen at least as many messages as seen by P j when it sent message r.

Example Application: Bulletin Board Initially V j [i] = 0 and V k [i] =0 for all i. This makes sense since no messages have been sent. Let’s say that P j sends a message where V j [j] = 1. This implies that this is the first message sent by P j Since V k [j] is 0 then it will accept the message sent by P j since it is expecting a first message from P j. Let’s say that P j sent a message where V j [j] = 5 (indicating a 5 th message sent) and that V k [j] is 3 which indicates P k is expecting the 4 th message sent by P j. This indicates that the received message should be held back.

Example Application: Bulletin Board Assume that V j [i] V k [i] for some i. This indicates that P i sent a message that was received by P j but not P k. In this case P k will not deliver this message until it gets the missing message.

Example Application: Bulletin Board P2 a P1 c d P3 e g [1,0,0] [1,0,1] Post a r: Reply a Message a arrives at P2 before the reply r from P3 does b [1,0,1] [0,0,0]

Example Application: Bulletin Board P2 a P1P3 d g [1,0,0] [1,0,1] Post a r: Reply a Buffered c [1,0,0] The message a arrives at P2 after the reply from P3; The reply is not delivered right away. b [1,0,1] [0,0,0] Deliver r

Summary No notion of a globally shared clock. Local (physical) clocks must be synchronized based on algorithms that take into account network latency. Knowing the absolute time is not necessary. Logical clocks can be used for ordering purposes.