Download presentation
Presentation is loading. Please wait.
1
Global Predicate Detection and Event Ordering
2
Our Problem To compute predicates over the state of a distributed application
3
Model Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock
4
Clock Synchronization External Clock Synchronization: keeps processor clock within some maximum deviation from an external time source. can exchange info about timing events of different systems can take actions at real-time deadlines synchronization within 0.1 ms Internal Clock Synchronization: keeps processor clocks within some maximum deviation from each other. can measure duration of distributed activities that start on one process and terminate on another can totally order events that occur on a distributed system
5
The Model n processes with hardware clocks , bound on drift on correct clocks , bound on message delivery time f, bound on number of faulty processes–their clocks are outside of the blue envelope (real time) (clock time)
6
Requirements of Synchronization Processes adjust their clocks periodically to obtain logical clocks C that satisfy: Agreement–logical clocks are never too far apart Accuracy–logical clocks maintain some relation to real time
7
What accuracy can be achieved? Provably, ¸ Traditionally, > synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? (real time)
8
What accuracy can be achieved? Provably, ¸ Traditionally, > synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? Optimal Accuracy: (1+ ) -1 t · C i · (1+ )t (real time)
9
How to synchronize? Repeat forever: Agree on “when” to resynchronize Agree on “updated clock value” at resynchronization Traditionally: Periodic resynchronization with fixed periods (e.g. every hour) Updated clock value := “average” of all clocks
10
Averaging
11
Problems with Averaging Accumulation of error Fault-tolerance
12
Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min.
13
Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min. Time (ms) 93.17 4.484.22 % of messages Problem: 5000 message run (IBM Almaden)
14
Clock Synchronization: Take 2 No upper bound on message delivery time......but lower bound min on message delivery time Use timeout max to detect process failures slaves send messages to master Master averages slaves value; computes fault-tolerant average Precision: 4.maxp - min
15
Probabilistic Clock Synchronization (Cristian) Master-Slave architecture Master is connected to external time source Slaves read master’s clock and adjust their own How accurately can a slave read the master’s clock?
16
The Idea Clock accuracy depends on message roundtrip time if roundtrip is small, master and slave cannot have drifted by much! Since no upper bound on message delivery, no certainty of accurate enough reading... … but very accurate reading can be achieved by repeated attempts
17
Asynchronous systems Weakest possible assumptions Weak assumptions ´ less vulnerabilities Asynchronous slow “Interesting” model w.r.t. failures
18
Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response s c
19
Client-Server Processes exchange messages using Remote Procedure Call (RPC) The server computes the response (possibly asking other servers) and returns it to the client A client requests a service by sending the server a message. The client blocks while waiting for a response s #!?%! c
20
Deadlock!
21
Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds
22
Draw arrow from p i to p j if p j has received a request but has not responded yet Wait-For Graphs
23
Draw arrow from p i to p j if p j has received a request but has not responded yet Cycle in WFG ) deadlock Deadlock ) ¦ cycle in WFG Wait-For Graphs
24
The protocol p 0 sends a message to p 1 p 3 On receipt of p 0 ‘s message, p i replies with its state and wait-for info
25
An execution
27
Ghost Deadlock!
28
We have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events
29
Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive e p i is the i -th event of process p The local history h p of process p is the sequence of events executed by process h p k : prefix that contains first k events h p 0 : initial, empty sequence The history H is the set h p 0 [ h p 1 [ … h p n -1 N OTE: In H, local histories are interpreted as sets, rather than sequences, of events
30
Ordering events Observation 1: Events in a local history are totally ordered time
31
Ordering events Observation 1: Events in a local history are totally ordered Observation 2: For every message m, send ( m ) precedes receive ( m ) time
32
Happened-before (Lamport[1978]) A binary relation defined over events 1. if e i k, e i l 2h i and k<l, then e i k !e i l 2. if e i = send ( m ) and e j = receive ( m ), then e i ! e j 3. if e!e ’ and e ’ ! e ‘’, then e! e ‘’
33
Space-Time diagrams A graphic representation of a distributed execution time
34
Space-Time diagrams A graphic representation of a distributed execution time
35
Space-Time diagrams A graphic representation of a distributed execution time
36
Space-Time diagrams A graphic representation of a distributed execution time
37
Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order
38
Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order
39
Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order
40
Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order
41
Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h 1, h 2, …, h n is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by ! A single distributed computation may correspond to several consistent runs!
42
Cuts A cut C is a subset of the global history of H
43
A cut C is a subset of the global history of H The frontier of C is the set of events Cuts
44
Global states and cuts The global state of a distributed computation is an tuple of n local states = ( 1... n ) To each cut ( 1 c 1,... n c n ) corresponds a global state
45
Consistent cuts and consistent global states A cut is consistent if A consistent global state is one corresponding to a consistent cut
46
What sees
47
Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p 3 but not the corresponding send event
48
Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...
49
Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m timestamped with with T ( send ( m ))
50
Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to t ss
51
Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to t ss
52
Correctness Theorem: Snapshot I produces a consistent cut Proof: Need to prove
53
Clock Condition Can the Clock Condition be implemented some other way?
54
Lamport Clocks Each process maintains a local variable LC LC ( e ) = value of LC for event e
55
Increment Rules Timestamp m with
56
Space-Time Diagrams and Logical Clocks 2 1 3 456 6 7 7 8 8 9
57
A subtle problem when LC=t do S doesn’t make sense for Lamport clocks! there is no guarantee that LC will ever be t S is anyway executed after Fixes: If e is internal/send and LC = t-2 execute e and then S If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1) put message back in channel re-enable e ; set LC=t-1 ; execute S
58
An obvious problem No t ss ! Choose large enough that it cannot be reached by applying the update rules of logical clocks
59
An obvious problem No t ss ! Choose large enough that it cannot be reached by applying the update rules of logical clocks Doing so assumes upper bound on message delivery time upper bound relative process speeds Better relax it
60
Snapshot II p 0 selects p 0 sends “take a snapshot at t ss ” to all processes; it waits for all of them to reply and then sets its logical clock to when clock of p i reads then p i records its local state i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to
61
Relaxing synchrony Process does nothing for the protocol during this time! take a snapshot at empty message: monitors channels records local state sends empty message: Use empty message to announce snapshot!
62
Snapshot III Processor p 0 sends itself “take a snapshot “ when p i receives “take a snapshot” for the first time from p j : records its local state i sends “take a snapshot” along its outgoing channels sets channel from p j to empty starts recording messages received over each of its other incoming channels when receives “take a snapshot” beyond the first time from p k : p i stops recording channel from p k when p i has received “take a snapshot” on all channels, it sends collected state to p 0 and stops.
63
Snapshots: a perspective The global state s saved by the snapshot protocol is a consistent global state
64
Snapshots: a perspective The global state s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that s could have occurred
65
Snapshots: a perspective The global state s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that s could have occurred We are evaluating predicates on states that may have never occurred!
66
An Execution and its Lattice
80
Reachability kl is reachable from ij if there is a path from kl to ij in the lattice
81
Reachability kl is reachable from ij if there is a path from kl to ij in the lattice
82
Reachability kl is reachable from ij if there is a path from kl to ij in the lattice
83
Reachability kl is reachable from ij if there is a path from kl to ij in the lattice
84
So, why do we care about s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in i and terminates in f, then
85
So, why do we care about s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in i and terminates in f, then Deadlock in s implies deadlock in f No deadlock in s implies no deadlock in i
86
Same problem, different approach Monitor process does not query explicitly Instead, it passively collects information and uses it to build an observation. (reactive architectures, Harel and Pnueli [1985]) An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events.
87
Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications
88
Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications
89
Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications
90
Causal delivery FIFO delivery guarantees:
91
Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO:
92
Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event
93
Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event
94
Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event
95
Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 1
96
Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 12
97
Causal Delivery in Synchronous Systems We use the upper bound on message delivery time
98
Causal Delivery in Synchronous Systems We use the upper bound on message delivery time DR1: At time t, p 0 delivers all messages it received with timestamp up to t - in increasing timestamp order
99
Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.
100
Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 1
101
Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 14 Should p 0 deliver?
102
Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. Problem: Lamport Clocks don’t provide gap detection 14 Should p 0 deliver? Given two events e and e ’ and their clock values LC(e) and LC(e ’ ) — where LC(e) < LC(e’), determine whether some e ’’ event exists s.t. LC(e) < LC(e ’’ ) < LC(e ’ )
103
Stability DR2: Deliver all received stable messages in increasing (logical clock) timestamp order. A message m received by p is stable at p if p will never receive a future message m s.t. TS(m ’ ) < TS(m)
104
Implementing Stability Real-time clocks wait for time units
105
Implementing Stability Real-time clocks wait for time units Lamport clocks wait on each channel for m s.t. TS(m) > LC(e) Design better clocks!
106
Clocks and STRONG Clocks Lamport clocks implement the clock condition: We want new clocks that implement the strong clock condition:
107
Causal Histories The causal history of an event e in (H,!) is the set
108
Causal Histories The causal history of an event e in (H,!) is the set
109
Causal Histories The causal history of an event e in (H,!) is the set
110
How to build Each process : p i initializes = 0 if e i k is an internal or send event, then if e i k is a receive event for message m, then
111
Pruning causal histories Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting) Use a more clever way to encode (e)
112
Vector Clocks Consider i (e), the projection of (e) on p i i (e) is a prefix of h i : i (e) = h i k i – it can be encoded using k i ( e ) = 1 (e) [ 2 (e) [... [ n (e) can be encoded using Represent using an n-vector VC such that
113
Update rules Message m is timestamped with
114
Example [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]
115
Operational interpretation = [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]
116
Operational interpretation ´ no. of events executed p i by up to and including e i ´ [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]
117
Operational interpretation ´ no. of events executed p i by up to and including e i ´ no. of events executed by p j that happen before e i of p i [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]
118
VC properties: event ordering 1. Given two vectors V and V +, less than is defined as: V < V + ´ (V V + ) Æ (8 k : 1· k· n : V[k] · V + [k]) Strong Clock Condition: Simple Strong Clock Condition: Given e i of p i and e j of p j, where i j Concurrency: Given e i of p i and e j of p j, where i j
119
VC properties: consistency Pairwise inconsistency Events e i of p i and e j of p j (i j) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if Consistent Cut A cut defined by (c 1,..., c n ) is consistent if and only if
120
VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2]
121
VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2] [2,1,1] [0,0,1] [1,0,1]
122
VC properties: strong gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k j, then there exists e k s.t Strong gap detection Given e i of p i and e j of p j, if VC(e i )[i] < VC(e j )[i] for some k j, then there exists e i ’ s.t
123
VCs for Causal Delivery Each process increments the local component of its VC only for events that are notified to the monitor Each message notifying event e is timestamped with VC(e) The monitor keeps all notification messages in a set M
124
Stability Suppose p 0 has received m j from p j. When is it safe for p 0 to deliver m j ?
125
Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M
126
Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j no. of p j messages delivered by p 0
127
Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j There is no earlier message m k ’’ from p k (k j) … ? no. of p j messages delivered by p 0
128
Checking for. Let m k ’ be the last message p 0 delivered from p k By strong gap detection, m k ’’ exists only if Hence, deliver m j as soon as
129
The protocol p 0 maintains an array D[1,..., n] of counters D[i] = TS(m i )[i] where m i is the last message delivered from p i DR3: Deliver m from p j as soon as both of the following conditions are satisfied: 1. 2.
130
Multiple Monitors Create a group of monitor processes increased performance increased reliability Notify through a causal multicast to the group Each replica will construct a (possibly different) observation if property stable, if one monitor detects, eventually all monitors do otherwise either use Possibly and Definitely or use causal atomic multicast What about failures?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.