Distributed Computing 5. Snapshot Shmuel Zaks ©

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Advertisements

Global States.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Global States in a Distributed System By John Kor and Yvonne Cheng.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Lecture 8: Asynchronous Network Algorithms
Distributed Computing 1. Lower bound for leader election on a complete graph Shmuel Zaks ©
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
CS542 Topics in Distributed Systems Diganta Goswami.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Slides for Chapter 10: Time and Global State
Ordering and Consistent Cuts
Ordering and Consistent Cuts Presented by Chi H. Ho.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Consistent cut A cut is a set of events.
Global state and snapshot
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
Lecture 9: Asynchronous Network Algorithms
Distributed Snapshot.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
Logical Clocks and Casual Ordering
Chien-Liang Fok Distribution Seminar Chien-Liang Fok
Distributed Snapshot Distributed Systems.
Abstraction.
Uncoordinated Checkpointing
Slides for Chapter 11: Time and Global State
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
Distributed algorithms
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

Distributed Computing 5. Snapshot Shmuel Zaks ©

2 The snapshot algorithm (Candy and Lamport)

3

4

5 Goal: design a snapshot (=global-state- detection) algorithm that:  will record a collection of states of all system components (which forms a global system state),  will not change the underlying computation,  will not freeze the underlying computation

6 A Process Can…  record its own state,  send and receive messages,  record messages it sends and receives,  cooperate with other processes  Processes do not share clocks or memory  Processes cannot record their state precisely at the same instant

7 Motivation  Many problems in distributed systems can be stated in terms of the problem of detecting global states: Stable property detection problems : termination detection, deadlock detection etc.  Checkpointing

8 Stable Property Detection Problem D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is stable if y(S) implies y(S’) for all S’ reachable from S

 many distributed algorithms are structured as a sequence of phases  A phase: transient part, then a stable part phase termination vs. computation termination  our view on the problem: i.detect the termination of a phase ii.initiate a new phase Notice that “the kth phase has terminated” is a stable property 9

10 Model  Distributed system D is a finite, labeled, directed graph. p q C2 C1  Channels have infinite buffers, are error- free and preserve FIFO  Message delay is bounded, but unknown

11 State of a Channel 1 p q C  [1, 2, 3] – sequence X of messages that were sent  [1] – sequence Y of received messages ( prefix of X )  [2, 3] – state of C1: X \ Y pq C2 C1

12 Example: System Distributed system: p C2C2 C1C1 Initial global state: B A Ø Ø State transitions (same for p and q): A B send receive q

13 A A Ø A A Ø A B Ø Ø B A Ø Ø A computation corresponds to a path in the diagram p qq p p sends q receives q sends p receives q sends C1C1 p C2C2 q deterministic A B send receive Global state transition diagram

14 Distributed system: State transition: p : q : CD send receive A B send receive p C2C2 C1C1 q Example: System

15 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives Global state transition diagram q receives non-deterministic q sends A B send receive CD send receive q receives

16 qp C2C2 C1C1 A D Ø B C Ø B D A C Ø Ø p qq p p sends q sends p receives We look at the following sequence of events: A B send receive CD send receive

17 Each process records its own state p and q cooperate to record the state of C. p C q in the snapshot algorithm:

18 B A Ø p q Example: System A A A A Recorded state: p C q Ø No token C1C1 p C2C2 q A B send receive Record C Record q Record p

19 B A Ø Ø p q Example: System B A A A Ø Recorded state: p C1C1 q Two tokens Record p Record C Record q C1C1 p C2C2 q A B send receive

C’s state recorded time P sends a message on C P’s state recorded C’s state recorded P sends a message on C P’s state recorded 20 Record p Record C Record q Record C Record q Record p

21 q will record the state of C q starts recording C after it records its state p C q p and q have to coordinate ; using a special marker q stops when receiving from p But: how does q know when to record its state?

22 Who starts? We assume one process. The snapshot algorithm Hw: extend discussion + proof to any number of startes.

 Who will record the state of channel C? q  How q knows when to stop recording? p sends right after it records its state, and before sending any other message  q starts recording after it records its state (Intuition for the Algorithm) p C q 23

24 The snapshot algorithm Ends when q receives along C Starts when q records itself channel recording p C q Note : for any q  p 0, the channel along which arrived first is recorded as 

25 p 0 starts. The snapshot algorithm p 0 recoreds its state, and then broadcasts. Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it records its own state State recording

26 1. record the state of p 2. send along c before sending any other message Marker-Receiving Rule for a process q if q’s state is not recorded: 1. record state; 2. record c’s state =  ; else: c’s state is the sequence of messages received since q recorded its state The snapshot algorithm on receiving along channel c: Marker-Sending Rule for a process q

Termination Assumption No marker remains forever in an input channel Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time Proof: by induction 27

28 The Recorded Global State State transition: p : q : C D send receive A B send receive p C2C2 C1C1 q Ex: System

29 A D  B C  B D A C   pqqp p sends q sends p receives A D  qp C2C2 C1C1 A B send receive CD send receive A

30 What did we get?

31  Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c )  e is defined by  e = may occur in global state S if 1. the state of p in S is s 2. if c is directed towards p then c ’s state has M in its head

32 Process State and Global State  A process: set of states, an initial state set of events  A global state S : collection of process states and channel states initially, each process is in its initial state and all channels are empty next(S, e) is the global state after event e in applied to global state S

33 Process State and Global State  seq = (e i : i = 0…n) is a computation of the system iff e i may occur in S i, S i+1 = next(S i, e i ) (S 0 is the initial global state)

34 seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state The Recorded Global State

35 Definition Event e j is called pre-recording if e j is in a process p and p records its state after e j in seq. Event e j is called post-recording if e j is in a process p and p records its state before e j in seq. Assume that e j-1 is a post-recording event before Pre-recording event e j in seq.

36 Claim: Sequence obtained by interchanging e j-1 and e j is a computation. Proof: e j-1 occurs in p and e j in q (other than p). There cannot be a message sent at e j-1 and received at e j. Hence, event e j can occur in global state S j-1. The state of process p is not altered by e j, hence e j-1 can occur after e j.

37 Proof Swap the events till all post-recorded events appear after all pre-recorded events. The acquired computation is seq’. All that is left to show: S* is a global state after all prerecorded events and before all postrecorded events. 1.Process states 2.Channel states

38 Claim: The state of a channel in S* is (sequence of messages corresp. to pre-recorded receives)(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.

39 A D B C D A C   pq q p p sends q sends p receives A D  B post pre post qp C2C2 C1C1 A B send receive CD send receive 

40 A D  A D D A C   p q q p q sends p sends p receives A D  A (Another execution) pre post B  qp C2C2 C1C1 A B send receive CD send receive

What did we get? A configuration that could have happened 41

seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state (The Recorded Global State) 42

(The Recorded Global State)

Stable Detection D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S 44

45 Input: A stable property y Output: a boolean value b with the property: y(S 0 ) b and b y(S t ) Algorithm Algorithm: begin record a global state S* b := y(S*) end

46 Correctness 1. S* is reachable from S 0 2. S t is reachable from S* 3. y(S) y(S’) for all S’ reachable from S S 0 S* S t y(S*)=true y(S t )=true  y(S*)=false  y(S 0 )=false

References K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems 47