Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Computing 5. Snapshot Shmuel Zaks ©

Similar presentations


Presentation on theme: "Distributed Computing 5. Snapshot Shmuel Zaks ©"— Presentation transcript:

1 Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

2

3

4

5 Motivation  Many problems in distributed systems can be stated in terms of the problem of detecting global states: Stable property detection problems : termination detection, deadlock detection etc.  Checkpointing (snapshot)

6 A Process Can…  send and receive messages  record its own state  record messages it sends and receives  cooperate with other processes (snapshot) Processes do not share clocks or memory Processes cannot record their state precisely at the same instant

7 The Global-State-Detection Algorithm  conducts state recording so that recorded set of states forms a global system state  Does not change the underlying computation  Does not freeze the underlying computation (snapshot)

8 Stable Property Detection Problem D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S (snapshot)

9  many distributed algorithms are structured as a sequence of phases  A phase: transient part, then a stable part phase termination vs. computation termination  our view on the problem: i.detect the termination of a phase ii.initiate a new phase Notice that “the kth phase has terminated” is a stable property (snapshot)

10 Model  Distributed system D is a finite, labeled, directed graph. p q C2 C1  Channels have infinite buffers, are error-free and preserve FIFO  Message delay is bounded, but unknown (snapshot)

11 State of a Channel 1 p q C1 23 1  [1, 2, 3] – sequence X of messages that were sent  [1] – sequence Y of received messages: initial subsequence of X  [2, 3] – state of C1: X \ Y (snapshot) pq C2 C1

12  Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c )  e is defined by  e = may occur in global state S if 1. the state of p in S is s 2. if c is directed towards p then c ’s state has M in its head (snapshot)

13 Process State and Global State  A process: set of states, an initial state set of events  A global state S : collection of process states and channel states initially, each process is in its initial state and all channels are empty next(S, e) is the global state after event e in applied to global state S (snapshot)

14 (Process State and Global State)  seq = (e i : i = 0…n) is a computation of the system iff e i may occur in S i, S i+1 = next(S i, e i ) (S 0 is the initial global state)

15 Example: System  Distributed system: p C2C2 C1C1  State transitions: s0 s1 send receive  Initial global state: s1 s0 empty q

16 Global state transition diagram s0 empty s0 empty s0 s1 empty s1 s0 empty A computation corresponds to a path in the diagram p qq p p sends q receives q sends p receives q sends p C2C2 C1C1 q deterministic

17 Distributed system: State transition: p : q : A B send receive CD send receive p C2C2 C1C1 q Example: System

18 p C2C2 C1C1 q A D empty B C B D A C p qq p p sends q sends p receives Global state transition diagram q receives Non-deterministic q sends

19 Intuition for the algorithm Each process records its own state p and q cooperate to record the state of C. p C q

20 s1 s0 empty p q Example: System s0 Recorded state: p C q empty No token

21 s1 s0 empty p q Example: System s1 s0 empty Recorded state: p C1C1 q Two tokens

22 C’s state recorded time P sends a message on C P’s state recorded C’s state recorded P sends a message on C P’s state recorded

23  Who will record the state of channel C? q  How q knows when to stop recording? p sends a marker right after it records its state, and before sending any other message  q starts recording after it records its state (Intuition for the Algorithm) p C q

24 1. record the state of p 2. send a marker along c before sending any other message Marker-Receiving Rule for a process q if q’s state is not recorded: 1. record state; 2. record c’s state = empty ; else c’s state is the sequence of messages received after q’s state was recorded and before receiving the marker The snapshot algorithm on receiving a marker along channel c: Marker-Sending Rule for a process q

25 Termination of the Algorithm Assumption No marker remains forever in an input channel Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time Proof: by induction

26 The Recorded Global State State transition: p : q : C D send receive A B send receive p C2C2 C1C1 q Ex: System

27 p C2C2 C1C1 q A D empty B C B D A C p qq p p sends q sends p receives A D empty

28 What did we get?

29 seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state (The Recorded Global State)

30

31 Definition Event e j is called prerecording iff e j is in a process p and p records its state after e j in seq. Event e j is called postrecording iff e j is in a process p and p records its state before e j in seq. Assume that e j-1 is a postrecording event before prerecording event e j in seq. (snapshot)

32 Claim: Sequence obtained by interchanging e j-1 and e j is a computation. Proof: e j-1 occurs in p and e j in q (other than p ). There cannot be a message sent at e j-1 and received at e j. Hence, event e j can occur in global state S j-1. The state of process p is not altered by e j, hence e j-1 can occur after e j. (snapshot)

33 Proof Swap the events till all postrecorded events appear after all prerecorded events. The acquired computation is seq’. All that is left to show: S* is a global state after all prerecorded events and before all postrecorded events. 1.Process states 2.Channel states - (snapshot)

34 Claim: The state of a channel in S* is (sequence of messages corresp. to prerecorded receives)(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c. - (snapshot)

35 p C2C2 C1C1 q A D empty B C D A C p qq p p sends q sends p receives A D empty B !! post pre post

36 p C2C2 C1C1 q A D empty A D D A C p qq p q sends p sends p receives A D empty B (Another execution) pre post

37 What did we get? A configuration that could have happened

38 Input: A stable property y Output: a boolean value definite with the property: y(S 0 ) definite and definite y(S t ) Stability Detection Algorithm Algorithm: begin record a global state S* definite := y(S*) end

39 (Stability Detection Algorithm) Correctness 1. S* is reachable from S 0 2. S t is reachable from S* 3. y(S) y(S’) for all S’ reachable from S S 0 S* S t y(S*)=true y(S t )=true  y(S*)=false  y(S 0 )=false

40 References K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems


Download ppt "Distributed Computing 5. Snapshot Shmuel Zaks ©"

Similar presentations


Ads by Google