Presentation is loading. Please wait.

Presentation is loading. Please wait.

Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application.

Similar presentations


Presentation on theme: "Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application."— Presentation transcript:

1 Global Predicate Detection and Event Ordering

2 Our Problem To compute predicates over the state of a distributed application

3 Model Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock

4 Clock Synchronization External Clock Synchronization: keeps processor clock within some maximum deviation from an external time source. can exchange info about timing events of different systems can take actions at real-time deadlines synchronization within 0.1 ms Internal Clock Synchronization: keeps processor clocks within some maximum deviation from each other. can measure duration of distributed activities that start on one process and terminate on another can totally order events that occur on a distributed system

5 The Model n processes with hardware clocks , bound on drift on correct clocks , bound on message delivery time f, bound on number of faulty processes–their clocks are outside of the blue envelope (real time) (clock time)

6 Requirements of Synchronization Processes adjust their clocks periodically to obtain logical clocks C that satisfy: Agreement–logical clocks are never too far apart Accuracy–logical clocks maintain some relation to real time

7 What accuracy can be achieved? Provably,  ¸  Traditionally,  >  synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? (real time)

8 What accuracy can be achieved? Provably,  ¸  Traditionally,  >  synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? Optimal Accuracy: (1+  ) -1 t · C i · (1+  )t (real time)

9 How to synchronize? Repeat forever: Agree on “when” to resynchronize Agree on “updated clock value” at resynchronization Traditionally: Periodic resynchronization with fixed periods (e.g. every hour) Updated clock value := “average” of all clocks

10 Averaging

11 Problems with Averaging Accumulation of error Fault-tolerance

12 Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min.

13 Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min. Time (ms) 93.17 4.484.22 % of messages Problem: 5000 message run (IBM Almaden)

14 Clock Synchronization: Take 2 No upper bound on message delivery time......but lower bound min on message delivery time Use timeout max to detect process failures slaves send messages to master Master averages slaves value; computes fault-tolerant average Precision: 4.maxp - min

15 Probabilistic Clock Synchronization (Cristian) Master-Slave architecture Master is connected to external time source Slaves read master’s clock and adjust their own How accurately can a slave read the master’s clock?

16 The Idea Clock accuracy depends on message roundtrip time if roundtrip is small, master and slave cannot have drifted by much! Since no upper bound on message delivery, no certainty of accurate enough reading... … but very accurate reading can be achieved by repeated attempts

17 Asynchronous systems Weakest possible assumptions Weak assumptions ´ less vulnerabilities Asynchronous  slow “Interesting” model w.r.t. failures

18 Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response s c

19 Client-Server Processes exchange messages using Remote Procedure Call (RPC) The server computes the response (possibly asking other servers) and returns it to the client A client requests a service by sending the server a message. The client blocks while waiting for a response s #!?%! c

20 Deadlock!

21 Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds

22 Draw arrow from p i to p j if p j has received a request but has not responded yet Wait-For Graphs

23 Draw arrow from p i to p j if p j has received a request but has not responded yet Cycle in WFG ) deadlock Deadlock ) ¦ cycle in WFG Wait-For Graphs

24 The protocol p 0 sends a message to p 1  p 3 On receipt of p 0 ‘s message, p i replies with its state and wait-for info

25 An execution

26

27 Ghost Deadlock!

28 We have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events

29 Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive e p i is the i -th event of process p The local history h p of process p is the sequence of events executed by process h p k : prefix that contains first k events h p 0 : initial, empty sequence The history H is the set h p 0 [ h p 1 [ … h p n -1 N OTE: In H, local histories are interpreted as sets, rather than sequences, of events

30 Ordering events Observation 1: Events in a local history are totally ordered time

31 Ordering events Observation 1: Events in a local history are totally ordered Observation 2: For every message m, send ( m ) precedes receive ( m ) time

32 Happened-before (Lamport[1978]) A binary relation defined over events 1. if e i k, e i l 2h i and k<l, then e i k !e i l 2. if e i = send ( m ) and e j = receive ( m ), then e i ! e j 3. if e!e ’ and e ’ ! e ‘’, then e! e ‘’

33 Space-Time diagrams A graphic representation of a distributed execution time

34 Space-Time diagrams A graphic representation of a distributed execution time

35 Space-Time diagrams A graphic representation of a distributed execution time

36 Space-Time diagrams A graphic representation of a distributed execution time

37 Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

38 Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

39 Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

40 Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

41 Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h 1, h 2, …, h n is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by ! A single distributed computation may correspond to several consistent runs!

42 Cuts A cut C is a subset of the global history of H

43 A cut C is a subset of the global history of H The frontier of C is the set of events Cuts

44 Global states and cuts The global state of a distributed computation is an tuple of n local states   = (  1...  n ) To each cut (  1 c 1,...  n c n ) corresponds a global state

45 Consistent cuts and consistent global states A cut is consistent if A consistent global state is one corresponding to a consistent cut

46 What sees

47 Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p 3 but not the corresponding send event

48 Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...

49 Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m timestamped with with T ( send ( m ))

50 Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state  i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

51 Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state  i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

52 Correctness Theorem: Snapshot I produces a consistent cut Proof: Need to prove

53 Clock Condition Can the Clock Condition be implemented some other way?

54 Lamport Clocks Each process maintains a local variable LC LC ( e ) = value of LC for event e

55 Increment Rules Timestamp m with

56 Space-Time Diagrams and Logical Clocks 2 1 3 456 6 7 7 8 8 9

57 A subtle problem when LC=t do S doesn’t make sense for Lamport clocks! there is no guarantee that LC will ever be t S is anyway executed after Fixes: If e is internal/send and LC = t-2 execute e and then S If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1) put message back in channel re-enable e ; set LC=t-1 ; execute S

58 An obvious problem No t ss ! Choose  large enough that it cannot be reached by applying the update rules of logical clocks

59 An obvious problem No t ss ! Choose  large enough that it cannot be reached by applying the update rules of logical clocks Doing so assumes upper bound on message delivery time upper bound relative process speeds Better relax it

60 Snapshot II p 0 selects  p 0 sends “take a snapshot at t ss ” to all processes; it waits for all of them to reply and then sets its logical clock to  when clock of p i reads  then p i records its local state  i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to 

61 Relaxing synchrony Process does nothing for the protocol during this time! take a snapshot at empty message: monitors channels records local state sends empty message: Use empty message to announce snapshot!

62 Snapshot III Processor p 0 sends itself “take a snapshot “ when p i receives “take a snapshot” for the first time from p j : records its local state  i sends “take a snapshot” along its outgoing channels sets channel from p j to empty starts recording messages received over each of its other incoming channels when receives “take a snapshot” beyond the first time from p k : p i stops recording channel from p k when p i has received “take a snapshot” on all channels, it sends collected state to p 0 and stops.

63 Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state

64 Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that  s could have occurred

65 Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that  s could have occurred We are evaluating predicates on states that may have never occurred!

66 An Execution and its Lattice

67

68

69

70

71

72

73

74

75

76

77

78

79

80 Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

81 Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

82 Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

83 Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

84 So, why do we care about  s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in  i and terminates in  f, then

85 So, why do we care about  s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in  i and terminates in  f, then Deadlock in  s implies deadlock in  f No deadlock in  s implies no deadlock in  i

86 Same problem, different approach Monitor process does not query explicitly Instead, it passively collects information and uses it to build an observation. (reactive architectures, Harel and Pnueli [1985]) An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events.

87 Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

88 Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

89 Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

90 Causal delivery FIFO delivery guarantees:

91 Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO:

92 Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

93 Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

94 Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

95 Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 1

96 Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 12

97 Causal Delivery in Synchronous Systems We use the upper bound  on message delivery time

98 Causal Delivery in Synchronous Systems We use the upper bound on message delivery time DR1: At time t, p 0 delivers all messages it received with timestamp up to t -  in increasing timestamp order

99 Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.

100 Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 1

101 Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 14 Should p 0 deliver?

102 Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. Problem: Lamport Clocks don’t provide gap detection 14 Should p 0 deliver? Given two events e and e ’ and their clock values LC(e) and LC(e ’ ) — where LC(e) < LC(e’), determine whether some e ’’ event exists s.t. LC(e) < LC(e ’’ ) < LC(e ’ )

103 Stability DR2: Deliver all received stable messages in increasing (logical clock) timestamp order. A message m received by p is stable at p if p will never receive a future message m s.t. TS(m ’ ) < TS(m)

104 Implementing Stability Real-time clocks wait for  time units

105 Implementing Stability Real-time clocks wait for time units Lamport clocks wait on each channel for m s.t. TS(m) > LC(e) Design better clocks!

106 Clocks and STRONG Clocks Lamport clocks implement the clock condition: We want new clocks that implement the strong clock condition:

107 Causal Histories The causal history of an event e in (H,!) is the set

108 Causal Histories The causal history of an event e in (H,!) is the set

109 Causal Histories The causal history of an event e in (H,!) is the set

110 How to build Each process : p i initializes  = 0 if e i k is an internal or send event, then if e i k is a receive event for message m, then

111 Pruning causal histories Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting) Use a more clever way to encode  (e)

112 Vector Clocks Consider  i (e), the projection of  (e) on p i  i (e) is a prefix of h i :  i (e) = h i k i – it can be encoded using k i  ( e ) =  1 (e) [  2 (e) [... [  n (e) can be encoded using Represent  using an n-vector VC such that

113 Update rules Message m is timestamped with

114 Example [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

115 Operational interpretation = [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

116 Operational interpretation ´ no. of events executed p i by up to and including e i ´ [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

117 Operational interpretation ´ no. of events executed p i by up to and including e i ´ no. of events executed by p j that happen before e i of p i [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

118 VC properties: event ordering 1. Given two vectors V and V +, less than is defined as: V < V + ´ (V  V + ) Æ (8 k : 1· k· n : V[k] · V + [k]) Strong Clock Condition: Simple Strong Clock Condition: Given e i of p i and e j of p j, where i  j Concurrency: Given e i of p i and e j of p j, where i  j

119 VC properties: consistency Pairwise inconsistency Events e i of p i and e j of p j (i  j) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if Consistent Cut A cut defined by (c 1,..., c n ) is consistent if and only if

120 VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2]

121 VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2] [2,1,1] [0,0,1] [1,0,1]

122 VC properties: strong gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t Strong gap detection Given e i of p i and e j of p j, if VC(e i )[i] < VC(e j )[i] for some k  j, then there exists e i ’ s.t

123 VCs for Causal Delivery Each process increments the local component of its VC only for events that are notified to the monitor Each message notifying event e is timestamped with VC(e) The monitor keeps all notification messages in a set M

124 Stability Suppose p 0 has received m j from p j. When is it safe for p 0 to deliver m j ?

125 Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M

126 Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j no. of p j messages delivered by p 0

127 Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j There is no earlier message m k ’’ from p k (k  j) … ? no. of p j messages delivered by p 0

128 Checking for. Let m k ’ be the last message p 0 delivered from p k By strong gap detection, m k ’’ exists only if Hence, deliver m j as soon as

129 The protocol p 0 maintains an array D[1,..., n] of counters D[i] = TS(m i )[i] where m i is the last message delivered from p i DR3: Deliver m from p j as soon as both of the following conditions are satisfied: 1. 2.

130 Multiple Monitors Create a group of monitor processes increased performance increased reliability Notify through a causal multicast to the group Each replica will construct a (possibly different) observation if property stable, if one monitor detects, eventually all monitors do otherwise either use Possibly and Definitely or use causal atomic multicast What about failures?


Download ppt "Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application."

Similar presentations


Ads by Google