Global Predicate Detection and Event Ordering. Our Problem To compute predicates over the state of a distributed application.

Slides:

Advertisements

Similar presentations

Advertisements

Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.

CS542 Topics in Distributed Systems Diganta Goswami.

Time and synchronization (“There’s never enough time…”)

Time and Clock Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.

Time and Global States Ali Fanian Isfahan University of Technology

Distributed Systems Spring 2009

CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.

CS 582 / CMPE 481 Distributed Systems

Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.

Ordering and Consistent Cuts Presented By Biswanath Panda.

CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.

Distributed Systems Fall 2009 Logical time, global states, and debugging.

CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.

Clock Synchronization Ken Birman. Why do clock synchronization?  Time-based computations on multiple machines Applications that measure elapsed time.

© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 9: Time, Coordination and Replication Dr. Michael R. Lyu Computer.

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.

Ordering and Consistent Cuts Presented by Chi H. Ho.

EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.

Cloud Computing Concepts

Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.

Composition Model and its code. bound:=bound+1.

Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms CS 249 Project Fall 2005 Wing Wong.

State Machines CS 614 Thursday, Feb 21, 2002 Bill McCloskey.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

1DT066 D ISTRIBUTED I NFORMATION S YSTEM Time, Coordination and Agreement 1.

CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.

1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.

1 A Modular Approach to Fault-Tolerant Broadcasts and Related Problems Author: Vassos Hadzilacos and Sam Toueg Distributed Systems: 526 U1580 Professor:

1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.

Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Naming Name distribution: use hierarchies DNS X.500 and LDAP.

Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.

Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:

Chapter 10: Time and Global States

“Virtual Time and Global States of Distributed Systems”

Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?

Distributed Systems Fall 2010 Logical time, global states, and debugging.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

Time This powerpoint presentation has been adapted from: 1) sApr20.ppt.

Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

Physical clock synchronization Question 1. Why is physical clock synchronization important? Question 2. With the price of atomic clocks or GPS coming down,

Distributed Systems Topic 5: Time, Coordination and Agreement

Hwajung Lee. Primary standard = rotation of earth De facto primary standard = atomic clock (1 atomic second = 9,192,631,770 orbital transitions of Cesium.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

Distributed systems. distributed systems and protocols distributed systems: use components located at networked computers use message-passing to coordinate.

CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

Distributed Systems Lecture 6 Global states and snapshots 1.

Distributed Web Systems Time and Global State Lecturer Department University.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

Time and Global States Ali Fanian Isfahan University of Technology

CSE 486/586 Distributed Systems Global States

Time and Clock.

COT 5611 Operating Systems Design Principles Spring 2012

Time and Clock.

Slides for Chapter 14: Time and Global States

Time And Global Clocks CMPT 431.

Chapter 5 (through section 5.4)

Physical clock synchronization

Slides for Chapter 14: Time and Global States

Lecture 8 Processes and events Local and global states Time

CSE 486/586 Distributed Systems Global States

Jenhui Chen Office number:

COT 5611 Operating Systems Design Principles Spring 2014

Last Class: Naming Name distribution: use hierarchies DNS

Presentation transcript:

Global Predicate Detection and Event Ordering

Our Problem To compute predicates over the state of a distributed application

Model Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time No bound on relative process speeds No centralized clock

Clock Synchronization External Clock Synchronization: keeps processor clock within some maximum deviation from an external time source. can exchange info about timing events of different systems can take actions at real-time deadlines synchronization within 0.1 ms Internal Clock Synchronization: keeps processor clocks within some maximum deviation from each other. can measure duration of distributed activities that start on one process and terminate on another can totally order events that occur on a distributed system

The Model n processes with hardware clocks , bound on drift on correct clocks , bound on message delivery time f, bound on number of faulty processes–their clocks are outside of the blue envelope (real time) (clock time)

Requirements of Synchronization Processes adjust their clocks periodically to obtain logical clocks C that satisfy: Agreement–logical clocks are never too far apart Accuracy–logical clocks maintain some relation to real time

What accuracy can be achieved? Provably,  ¸  Traditionally,  >  synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? (real time)

What accuracy can be achieved? Provably,  ¸  Traditionally,  >  synchronization caused a loss in accuracy with respect to real time... variable message delays failures Is this loss of accuracy unavoidable? Optimal Accuracy: (1+  ) -1 t · C i · (1+  )t (real time)

How to synchronize? Repeat forever: Agree on “when” to resynchronize Agree on “updated clock value” at resynchronization Traditionally: Periodic resynchronization with fixed periods (e.g. every hour) Updated clock value := “average” of all clocks

Averaging

Problems with Averaging Accumulation of error Fault-tolerance

Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min.

Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery time Guarantee that processes stay synchronized within max - min. Time (ms) % of messages Problem: 5000 message run (IBM Almaden)

Clock Synchronization: Take 2 No upper bound on message delivery time......but lower bound min on message delivery time Use timeout max to detect process failures slaves send messages to master Master averages slaves value; computes fault-tolerant average Precision: 4.maxp - min

Probabilistic Clock Synchronization (Cristian) Master-Slave architecture Master is connected to external time source Slaves read master’s clock and adjust their own How accurately can a slave read the master’s clock?

The Idea Clock accuracy depends on message roundtrip time if roundtrip is small, master and slave cannot have drifted by much! Since no upper bound on message delivery, no certainty of accurate enough reading... … but very accurate reading can be achieved by repeated attempts

Asynchronous systems Weakest possible assumptions Weak assumptions ´ less vulnerabilities Asynchronous  slow “Interesting” model w.r.t. failures

Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response s c

Client-Server Processes exchange messages using Remote Procedure Call (RPC) The server computes the response (possibly asking other servers) and returns it to the client A client requests a service by sending the server a message. The client blocks while waiting for a response s #!?%! c

Deadlock!

Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds

Draw arrow from p i to p j if p j has received a request but has not responded yet Wait-For Graphs

Draw arrow from p i to p j if p j has received a request but has not responded yet Cycle in WFG ) deadlock Deadlock ) ¦ cycle in WFG Wait-For Graphs

The protocol p 0 sends a message to p 1  p 3 On receipt of p 0 ‘s message, p i replies with its state and wait-for info

An execution

Ghost Deadlock!

We have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events

Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive e p i is the i -th event of process p The local history h p of process p is the sequence of events executed by process h p k : prefix that contains first k events h p 0 : initial, empty sequence The history H is the set h p 0 [ h p 1 [ … h p n -1 N OTE: In H, local histories are interpreted as sets, rather than sequences, of events

Ordering events Observation 1: Events in a local history are totally ordered time

Ordering events Observation 1: Events in a local history are totally ordered Observation 2: For every message m, send ( m ) precedes receive ( m ) time

Happened-before (Lamport[1978]) A binary relation defined over events 1. if e i k, e i l 2h i and k<l, then e i k !e i l 2. if e i = send ( m ) and e j = receive ( m ), then e i ! e j 3. if e!e ’ and e ’ ! e ‘’, then e! e ‘’

Space-Time diagrams A graphic representation of a distributed execution time

Space-Time diagrams A graphic representation of a distributed execution time

Space-Time diagrams A graphic representation of a distributed execution time

Space-Time diagrams A graphic representation of a distributed execution time

Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

Space-Time diagrams A graphic representation of a distributed execution time H and impose a partial order

Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h 1, h 2, …, h n is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by ! A single distributed computation may correspond to several consistent runs!

Cuts A cut C is a subset of the global history of H

A cut C is a subset of the global history of H The frontier of C is the set of events Cuts

Global states and cuts The global state of a distributed computation is an tuple of n local states   = (  1...  n ) To each cut (  1 c 1,...  n c n ) corresponds a global state

Consistent cuts and consistent global states A cut is consistent if A consistent global state is one corresponding to a consistent cut

What sees

Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p 3 but not the corresponding send event

Our task Develop a protocol by which a processor can build a consistent global state Informally, we want to be able to take a snapshot of the computation Not obvious in an asynchronous system...

Our approach Develop a simple synchronous protocol Refine protocol as we relax assumptions Record: processor states channel states Assumptions: FIFO channels Each m timestamped with with T ( send ( m ))

Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state  i b. starts recording messages received on each of incoming channels c. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

Snapshot I 1. p 0 selects t ss 2. p 0 sends “take a snapshot at t ss ” to all processes 3. when clock of p i reads t ss then a. records its local state  i b. sends an empty message along its outgoing channels c. starts recording messages received on each of incoming channels d. stops recording a channel when it receives first message with timestamp greater than or equal to t ss

Correctness Theorem: Snapshot I produces a consistent cut Proof: Need to prove

Clock Condition Can the Clock Condition be implemented some other way?

Lamport Clocks Each process maintains a local variable LC LC ( e ) = value of LC for event e

Increment Rules Timestamp m with

Space-Time Diagrams and Logical Clocks

A subtle problem when LC=t do S doesn’t make sense for Lamport clocks! there is no guarantee that LC will ever be t S is anyway executed after Fixes: If e is internal/send and LC = t-2 execute e and then S If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1) put message back in channel re-enable e ; set LC=t-1 ; execute S

An obvious problem No t ss ! Choose  large enough that it cannot be reached by applying the update rules of logical clocks

An obvious problem No t ss ! Choose  large enough that it cannot be reached by applying the update rules of logical clocks Doing so assumes upper bound on message delivery time upper bound relative process speeds Better relax it

Snapshot II p 0 selects  p 0 sends “take a snapshot at t ss ” to all processes; it waits for all of them to reply and then sets its logical clock to  when clock of p i reads  then p i records its local state  i sends an empty message along its outgoing channels starts recording messages received on each incoming channel stops recording a channel when receives first message with timestamp greater than or equal to 

Relaxing synchrony Process does nothing for the protocol during this time! take a snapshot at empty message: monitors channels records local state sends empty message: Use empty message to announce snapshot!

Snapshot III Processor p 0 sends itself “take a snapshot “ when p i receives “take a snapshot” for the first time from p j : records its local state  i sends “take a snapshot” along its outgoing channels sets channel from p j to empty starts recording messages received over each of its other incoming channels when receives “take a snapshot” beyond the first time from p k : p i stops recording channel from p k when p i has received “take a snapshot” on all channels, it sends collected state to p 0 and stops.

Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state

Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that  s could have occurred

Snapshots: a perspective The global state  s saved by the snapshot protocol is a consistent global state But did it ever occur during the computation? a distributed computation provides only a partial order of events many total orders (runs) are compatible with that partial order all we know is that  s could have occurred We are evaluating predicates on states that may have never occurred!

An Execution and its Lattice

Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

Reachability  kl is reachable from  ij if there is a path from  kl to  ij in the lattice

So, why do we care about  s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in  i and terminates in  f, then

So, why do we care about  s again? Deadlock is a stable property Deadlock If a run R of the snapshot protocol starts in  i and terminates in  f, then Deadlock in  s implies deadlock in  f No deadlock in  s implies no deadlock in  i

Same problem, different approach Monitor process does not query explicitly Instead, it passively collects information and uses it to build an observation. (reactive architectures, Harel and Pnueli [1985]) An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events.

Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

Observations: a few observations An observation puts no constraint on the order in which the monitor receives notifications

Causal delivery FIFO delivery guarantees:

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO:

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 1

Causal delivery FIFO delivery guarantees: Causal delivery generalizes FIFO: send event receive event deliver event 12

Causal Delivery in Synchronous Systems We use the upper bound  on message delivery time

Causal Delivery in Synchronous Systems We use the upper bound on message delivery time DR1: At time t, p 0 delivers all messages it received with timestamp up to t -  in increasing timestamp order

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 1

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. 14 Should p 0 deliver?

Causal Delivery with Lamport Clocks DR1.1: Deliver all received messages in increasing (logical clock) timestamp order. Problem: Lamport Clocks don’t provide gap detection 14 Should p 0 deliver? Given two events e and e ’ and their clock values LC(e) and LC(e ’ ) — where LC(e) < LC(e’), determine whether some e ’’ event exists s.t. LC(e) < LC(e ’’ ) < LC(e ’ )

Stability DR2: Deliver all received stable messages in increasing (logical clock) timestamp order. A message m received by p is stable at p if p will never receive a future message m s.t. TS(m ’ ) < TS(m)

Implementing Stability Real-time clocks wait for  time units

Implementing Stability Real-time clocks wait for time units Lamport clocks wait on each channel for m s.t. TS(m) > LC(e) Design better clocks!

Clocks and STRONG Clocks Lamport clocks implement the clock condition: We want new clocks that implement the strong clock condition:

Causal Histories The causal history of an event e in (H,!) is the set

Causal Histories The causal history of an event e in (H,!) is the set

Causal Histories The causal history of an event e in (H,!) is the set

How to build Each process : p i initializes  = 0 if e i k is an internal or send event, then if e i k is a receive event for message m, then

Pruning causal histories Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting) Use a more clever way to encode  (e)

Vector Clocks Consider  i (e), the projection of  (e) on p i  i (e) is a prefix of h i :  i (e) = h i k i – it can be encoded using k i  ( e ) =  1 (e) [  2 (e) [... [  n (e) can be encoded using Represent  using an n-vector VC such that

Update rules Message m is timestamped with

Example [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

Operational interpretation = [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

Operational interpretation ´ no. of events executed p i by up to and including e i ´ [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

Operational interpretation ´ no. of events executed p i by up to and including e i ´ no. of events executed by p j that happen before e i of p i [1,0,0] [0,1,0] [2,1,0] [1,0,1] [1,0,2][1,0,3] [3,1,2] [1,2,3] [4,1,2] [5,1,2] [4,3,3] [5,1,4]

VC properties: event ordering 1. Given two vectors V and V +, less than is defined as: V < V + ´ (V  V + ) Æ (8 k : 1· k· n : V[k] · V + [k]) Strong Clock Condition: Simple Strong Clock Condition: Given e i of p i and e j of p j, where i  j Concurrency: Given e i of p i and e j of p j, where i  j

VC properties: consistency Pairwise inconsistency Events e i of p i and e j of p j (i  j) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if Consistent Cut A cut defined by (c 1,..., c n ) is consistent if and only if

VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2]

VC properties: weak gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t [2,2,2] [2,0,1] [0,0,2] [2,1,1] [0,0,1] [1,0,1]

VC properties: strong gap detection Weak gap detection Given e i of p i and e j of p j, if VC(e i )[k] < VC(e j )[k] for some k  j, then there exists e k s.t Strong gap detection Given e i of p i and e j of p j, if VC(e i )[i] < VC(e j )[i] for some k  j, then there exists e i ’ s.t

VCs for Causal Delivery Each process increments the local component of its VC only for events that are notified to the monitor Each message notifying event e is timestamped with VC(e) The monitor keeps all notification messages in a set M

Stability Suppose p 0 has received m j from p j. When is it safe for p 0 to deliver m j ?

Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M

Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j no. of p j messages delivered by p 0

Stability Suppose p 0 has received m j from p j When is it safe for p 0 to deliver m j ? There is no earlier message in M There is no earlier message from p j There is no earlier message m k ’’ from p k (k  j) … ? no. of p j messages delivered by p 0

Checking for. Let m k ’ be the last message p 0 delivered from p k By strong gap detection, m k ’’ exists only if Hence, deliver m j as soon as

The protocol p 0 maintains an array D[1,..., n] of counters D[i] = TS(m i )[i] where m i is the last message delivered from p i DR3: Deliver m from p j as soon as both of the following conditions are satisfied: 1. 2.

Multiple Monitors Create a group of monitor processes increased performance increased reliability Notify through a causal multicast to the group Each replica will construct a (possibly different) observation if property stable, if one monitor detects, eventually all monitors do otherwise either use Possibly and Definitely or use causal atomic multicast What about failures?