Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application.

Global Predicate Detection and Event Ordering

Our Problem To compute predicates over the state of a distributed application

Model Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock

Clock Synchronization External Clock Synchronization: keeps processor clock within some maximum deviation from an external time source. can exchange info about timing events of different systems can take actions at real-time deadlines synchronization within 0.1 ms Internal Clock Synchronization: keeps processor clocks within some maximum deviation from each other. can measure duration of distributed activities that start on one process and terminate on another can totally order events that occur on a distributed system

The Model n processes with hardware clocks , bound on drift on correct clocks , bound on message delivery time f, bound on number of faulty processes–their clocks are outside of the blue envelope (real time) (clock time)

Requirements of Synchronization Processes adjust their clocks periodically to obtain logical clocks C that satisfy: Agreement–logical clocks are never too far apart Accuracy–logical clocks maintain some relation to real time

What accuracy can be achieved? Provably,  ¸  Traditionally,  >  synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? (real time)

What accuracy can be achieved? Provably,  ¸  Traditionally,  >  synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? Optimal Accuracy: (1+  ) -1 t · C i · (1+  )t (real time)

How to synchronize? Repeat forever: Agree on “when” to resynchronize Agree on “updated clock value” at resynchronization Traditionally: Periodic resynchronization with fixed periods (e.g. every hour) Updated clock value := “average” of all clocks

Averaging

Problems with Averaging Accumulation of error Fault-tolerance

Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min.

Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min. Time (ms) 93.17 4.484.22 % of messages Problem: 5000 message run (IBM Almaden)

Clock Synchronization: Take 2 No upper bound on message delivery time......but lower bound min on message delivery time Use timeout max to detect process failures slaves send messages to master Master averages slaves value; computes fault-tolerant average Precision: 4.maxp - min

Probabilistic Clock Synchronization (Cristian) Master-Slave architecture Master is connected to external time source Slaves read master’s clock and adjust their own How accurately can a slave read the master’s clock?

The Idea Clock accuracy depends on message roundtrip time if roundtrip is small, master and slave cannot have drifted by much! Since no upper bound on message delivery, no certainty of accurate enough reading... … but very accurate reading can be achieved by repeated attempts

Asynchronous systems Weakest possible assumptions Weak assumptions ´ less vulnerabilities Asynchronous  slow “Interesting” model w.r.t. failures

Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response s c

Client-Server Processes exchange messages using Remote Procedure Call (RPC) The server computes the response (possibly asking other servers) and returns it to the client A client requests a service by sending the server a message. The client blocks while waiting for a response s #!?%! c

Deadlock!

Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds

Draw arrow from p i to p j if p j has received a request but has not responded yet Wait-For Graphs

Draw arrow from p i to p j if p j has received a request but has not responded yet Cycle in WFG ) deadlock Deadlock ) ¦ cycle in WFG Wait-For Graphs

The protocol p 0 sends a message to p 1  p 3 On receipt of p 0 ‘s message, p i replies with its state and wait-for info

An execution

Ghost Deadlock!

We have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events

Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive e p i is the i -th event of process p The local history h p of process p is the sequence of events executed by process h p k : prefix that contains first k events h p 0 : initial, empty sequence The history H is the set h p 0 [ h p 1 [ … h p n -1 N OTE: In H, local histories are interpreted as sets, rather than sequences, of events

Ordering events Observation 1: Events in a local history are totally ordered time

Ordering events Observation 1: Events in a local history are totally ordered Observation 2: For every message m, send ( m ) precedes receive ( m ) time

Happened-before (Lamport[1978]) A binary relation defined over events 1. if e i k, e i l 2h i and k<l, then e i k !e i l 2. if e i = send ( m ) and e j = receive ( m ), then e i ! e j 3. if e!e ’ and e ’ ! e ‘’, then e! e ‘’

Space-Time diagrams A graphic representation of a distributed execution time

Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h 1, h 2, …, h n is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by ! A single distributed computation may correspond to several consistent runs!

Cuts A cut C is a subset of the global history of H

A cut C is a subset of the global history of H The frontier of C is the set of events Cuts

Global states and cuts The global state of a distributed computation is an tuple of n local states   = (  1...  n ) To each cut (  1 c 1,...  n c n ) corresponds a global state

Consistent cuts and consistent global states A cut is consistent if A consistent global state is one corresponding to a consistent cut

What sees

Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p 3 but not the corresponding send event

Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...

Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m timestamped with with T ( send ( m ))

Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state  i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state  i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

Correctness Theorem: Snapshot I produces a consistent cut Proof: Need to prove

Clock Condition Can the Clock Condition be implemented some other way?

Lamport Clocks Each process maintains a local variable LC LC ( e ) = value of LC for event e

Increment Rules Timestamp m with

Space-Time Diagrams and Logical Clocks 2 1 3 456 6 7 7 8 8 9

A subtle problem when LC=t do S doesn’t make sense for Lamport clocks! there is no guarantee that LC will ever be t S is anyway executed after Fixes: If e is internal/send and LC = t-2 execute e and then S If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1) put message back in channel re-enable e ; set LC=t-1 ; execute S

An obvious problem No t ss ! Choose  large enough that it cannot be reached by applying the update rules of logical clocks

An obvious problem No t ss ! Choose  large enough that it cannot be reached by applying the update rules of logical clocks Doing so assumes upper bound on message delivery time upper bound relative process speeds Better relax it

Snapshot II p 0 selects  p 0 sends “take a snapshot at t ss ” to all processes; it waits for all of them to reply and then sets its logical clock to  when clock of p i reads  then p i records its local state  i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to 

Relaxing synchrony Process does nothing for the protocol during this time! take a snapshot at empty message: monitors channels records local state sends empty message: Use empty message to announce snapshot!

Snapshot III Processor p 0 sends itself “take a snapshot “ when p i receives “take a snapshot” for the first time from p j : records its local state  i sends “take a snapshot” along its outgoing channels sets channel from p j to empty starts recording messages received over each of its other incoming channels when receives “take a snapshot” beyond the first time from p k : p i stops recording channel from p k when p i has received “take a snapshot” on all channels, it sends collected state to p 0 and stops.

Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state

Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that  s could have occurred

Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that  s could have occurred We are evaluating predicates on states that may have never occurred!

An Execution and its Lattice

Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

So, why do we care about  s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in  i and terminates in  f, then

So, why do we care about  s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in  i and terminates in  f, then Deadlock in  s implies deadlock in  f No deadlock in  s implies no deadlock in  i

Same problem, different approach Monitor process does not query explicitly Instead, it passively collects information and uses it to build an observation. (reactive architectures, Harel and Pnueli [1985]) An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events.

Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

Causal delivery FIFO delivery guarantees:

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO:

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 1

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 12

Causal Delivery in Synchronous Systems We use the upper bound  on message delivery time

Causal Delivery in Synchronous Systems We use the upper bound on message delivery time DR1: At time t, p 0 delivers all messages it received with timestamp up to t -  in increasing timestamp order

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 1

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 14 Should p 0 deliver?

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. Problem: Lamport Clocks don’t provide gap detection 14 Should p 0 deliver? Given two events e and e ’ and their clock values LC(e) and LC(e ’ ) — where LC(e) < LC(e’), determine whether some e ’’ event exists s.t. LC(e) < LC(e ’’ ) < LC(e ’ )

Stability DR2: Deliver all received stable messages in increasing (logical clock) timestamp order. A message m received by p is stable at p if p will never receive a future message m s.t. TS(m ’ ) < TS(m)

Implementing Stability Real-time clocks wait for  time units

Implementing Stability Real-time clocks wait for time units Lamport clocks wait on each channel for m s.t. TS(m) > LC(e) Design better clocks!

Clocks and STRONG Clocks Lamport clocks implement the clock condition: We want new clocks that implement the strong clock condition:

Causal Histories The causal history of an event e in (H,!) is the set

How to build Each process : p i initializes  = 0 if e i k is an internal or send event, then if e i k is a receive event for message m, then

Pruning causal histories Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting) Use a more clever way to encode  (e)

Vector Clocks Consider  i (e), the projection of  (e) on p i  i (e) is a prefix of h i :  i (e) = h i k i – it can be encoded using k i  ( e ) =  1 (e) [  2 (e) [... [  n (e) can be encoded using Represent  using an n-vector VC such that

Update rules Message m is timestamped with

Example [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

Operational interpretation = [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

Operational interpretation ´ no. of events executed p i by up to and including e i ´ [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

Operational interpretation ´ no. of events executed p i by up to and including e i ´ no. of events executed by p j that happen before e i of p i [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

VC properties: event ordering 1. Given two vectors V and V +, less than is defined as: V < V + ´ (V  V + ) Æ (8 k : 1· k· n : V[k] · V + [k]) Strong Clock Condition: Simple Strong Clock Condition: Given e i of p i and e j of p j, where i  j Concurrency: Given e i of p i and e j of p j, where i  j

VC properties: consistency Pairwise inconsistency Events e i of p i and e j of p j (i  j) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if Consistent Cut A cut defined by (c 1,..., c n ) is consistent if and only if

VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2]

VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2] [2,1,1] [0,0,1] [1,0,1]

VC properties: strong gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t Strong gap detection Given e i of p i and e j of p j, if VC(e i )[i] < VC(e j )[i] for some k  j, then there exists e i ’ s.t

VCs for Causal Delivery Each process increments the local component of its VC only for events that are notified to the monitor Each message notifying event e is timestamped with VC(e) The monitor keeps all notification messages in a set M

Stability Suppose p 0 has received m j from p j. When is it safe for p 0 to deliver m j ?

Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M

Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j no. of p j messages delivered by p 0

Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j There is no earlier message m k ’’ from p k (k  j) … ? no. of p j messages delivered by p 0

Checking for. Let m k ’ be the last message p 0 delivered from p k By strong gap detection, m k ’’ exists only if Hence, deliver m j as soon as

The protocol p 0 maintains an array D[1,..., n] of counters D[i] = TS(m i )[i] where m i is the last message delivered from p i DR3: Deliver m from p j as soon as both of the following conditions are satisfied: 1. 2.

Multiple Monitors Create a group of monitor processes increased performance increased reliability Notify through a causal multicast to the group Each replica will construct a (possibly different) observation if property stable, if one monitor detects, eventually all monitors do otherwise either use Possibly and Definitely or use causal atomic multicast What about failures?

Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application.

Similar presentations

Presentation on theme: "Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application.

Similar presentations

Presentation on theme: "Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application."— Presentation transcript:

Similar presentations

About project

Feedback