Download presentation
Presentation is loading. Please wait.
1
Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©
5
Motivation Many problems in distributed systems can be stated in terms of the problem of detecting global states: Stable property detection problems : termination detection, deadlock detection etc. Checkpointing (snapshot)
6
A Process Can… send and receive messages record its own state record messages it sends and receives cooperate with other processes (snapshot) Processes do not share clocks or memory Processes cannot record their state precisely at the same instant
7
The Global-State-Detection Algorithm conducts state recording so that recorded set of states forms a global system state Does not change the underlying computation Does not freeze the underlying computation (snapshot)
8
Stable Property Detection Problem D - distributed system y - a predicate function defined on the set of global states of D S, S’ – global states of D y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S (snapshot)
9
many distributed algorithms are structured as a sequence of phases A phase: transient part, then a stable part phase termination vs. computation termination our view on the problem: i.detect the termination of a phase ii.initiate a new phase Notice that “the kth phase has terminated” is a stable property (snapshot)
10
Model Distributed system D is a finite, labeled, directed graph. p q C2 C1 Channels have infinite buffers, are error-free and preserve FIFO Message delay is bounded, but unknown (snapshot)
11
State of a Channel 1 p q C1 23 1 [1, 2, 3] – sequence X of messages that were sent [1] – sequence Y of received messages: initial subsequence of X [2, 3] – state of C1: X \ Y (snapshot) pq C2 C1
12
Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c ) e is defined by e = may occur in global state S if 1. the state of p in S is s 2. if c is directed towards p then c ’s state has M in its head (snapshot)
13
Process State and Global State A process: set of states, an initial state set of events A global state S : collection of process states and channel states initially, each process is in its initial state and all channels are empty next(S, e) is the global state after event e in applied to global state S (snapshot)
14
(Process State and Global State) seq = (e i : i = 0…n) is a computation of the system iff e i may occur in S i, S i+1 = next(S i, e i ) (S 0 is the initial global state)
15
Example: System Distributed system: p C2C2 C1C1 State transitions: s0 s1 send receive Initial global state: s1 s0 empty q
16
Global state transition diagram s0 empty s0 empty s0 s1 empty s1 s0 empty A computation corresponds to a path in the diagram p qq p p sends q receives q sends p receives q sends p C2C2 C1C1 q deterministic
17
Distributed system: State transition: p : q : A B send receive CD send receive p C2C2 C1C1 q Example: System
18
p C2C2 C1C1 q A D empty B C B D A C p qq p p sends q sends p receives Global state transition diagram q receives Non-deterministic q sends
19
Intuition for the algorithm Each process records its own state p and q cooperate to record the state of C. p C q
20
s1 s0 empty p q Example: System s0 Recorded state: p C q empty No token
21
s1 s0 empty p q Example: System s1 s0 empty Recorded state: p C1C1 q Two tokens
22
C’s state recorded time P sends a message on C P’s state recorded C’s state recorded P sends a message on C P’s state recorded
23
Who will record the state of channel C? q How q knows when to stop recording? p sends a marker right after it records its state, and before sending any other message q starts recording after it records its state (Intuition for the Algorithm) p C q
24
1. record the state of p 2. send a marker along c before sending any other message Marker-Receiving Rule for a process q if q’s state is not recorded: 1. record state; 2. record c’s state = empty ; else c’s state is the sequence of messages received after q’s state was recorded and before receiving the marker The snapshot algorithm on receiving a marker along channel c: Marker-Sending Rule for a process q
25
Termination of the Algorithm Assumption No marker remains forever in an input channel Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time Proof: by induction
26
The Recorded Global State State transition: p : q : C D send receive A B send receive p C2C2 C1C1 q Ex: System
27
p C2C2 C1C1 q A D empty B C B D A C p qq p p sends q sends p receives A D empty
28
What did we get?
29
seq = (e i : i ≥ 0) a distributed computation S i – the state of the system right before e i occurs S 0 – the initial state of the system S t – the state of the system at the termination of the algorithm S* - the recorded global state (The Recorded Global State)
31
Definition Event e j is called prerecording iff e j is in a process p and p records its state after e j in seq. Event e j is called postrecording iff e j is in a process p and p records its state before e j in seq. Assume that e j-1 is a postrecording event before prerecording event e j in seq. (snapshot)
32
Claim: Sequence obtained by interchanging e j-1 and e j is a computation. Proof: e j-1 occurs in p and e j in q (other than p ). There cannot be a message sent at e j-1 and received at e j. Hence, event e j can occur in global state S j-1. The state of process p is not altered by e j, hence e j-1 can occur after e j. (snapshot)
33
Proof Swap the events till all postrecorded events appear after all prerecorded events. The acquired computation is seq’. All that is left to show: S* is a global state after all prerecorded events and before all postrecorded events. 1.Process states 2.Channel states - (snapshot)
34
Claim: The state of a channel in S* is (sequence of messages corresp. to prerecorded receives)(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c. - (snapshot)
35
p C2C2 C1C1 q A D empty B C D A C p qq p p sends q sends p receives A D empty B !! post pre post
36
p C2C2 C1C1 q A D empty A D D A C p qq p q sends p sends p receives A D empty B (Another execution) pre post
37
What did we get? A configuration that could have happened
38
Input: A stable property y Output: a boolean value definite with the property: y(S 0 ) definite and definite y(S t ) Stability Detection Algorithm Algorithm: begin record a global state S* definite := y(S*) end
39
(Stability Detection Algorithm) Correctness 1. S* is reachable from S 0 2. S t is reachable from S* 3. y(S) y(S’) for all S’ reachable from S S 0 S* S t y(S*)=true y(S t )=true y(S*)=false y(S 0 )=false
40
References K. M. Chandy and L. Lamport, Distributed Snapshots: Determining Global States of Distributed Systems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.