S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt.
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Global States.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Global States in a Distributed System By John Kor and Yvonne Cheng.
Distributed Computing 5. Snapshot Shmuel Zaks ©
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Slides for Chapter 10: Time and Global State
Ordering and Consistent Cuts
Impossibility of Distributed Consensus with One Faulty Process Michael J. Fischer Nancy A. Lynch Michael S. Paterson Presented by: Oren D. Rubin.
Ordering and Consistent Cuts Presented by Chi H. Ho.
Cloud Computing Concepts
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Consistent cut A cut is a set of events.
Global State Recording
Global state and snapshot
Lecture 3: State, Detection
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
Distributed Snapshot.
Global State Recording
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Distributed Snapshot Distributed Systems.
Uncoordinated Checkpointing
Slides for Chapter 11: Time and Global State
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
Distributed algorithms
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

S NAPSHOT A LGORITHM

W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to have a “picture” of the global system state. Each processor however can only take a “small picture” of the global system (only itself…) But, if we put together all the “small pictures”, we would have a complete description of the global state of the system. The “big picture” we are putting together must be meaningful and informative to be called a snapshot of the system. 2

S NAPSHOT - WHY DO WE WANT IT Stability detection A stable system - the system in a given state holds a certain propriety means that all the possible next states of the system will hold that property too, then we can call the system stable. Examples of stability: Deadlock No tokens in a token ring Computation has terminated 3

T HE DISTRIBUTED SYSTEM MODEL Representation – a directed graph. Vertices - represent the processors Edges - represent the communication channels Assumptions: no synchronization (no clocks) Channels have infinite buffers Channels are error-free Channels deliver messages in the order sent (FIFO) A message in a channel can be delayed for an arbitrary but finite time (all messages will eventually arrive at their destination) 4

T HE DISTRIBUTED SYSTEM MODEL - D EFINITIONS State of a channel - the sequence of messages sent along the channel, excluding the messages received along the channel. State of a processor – a single element of some finite set. no messages sent. state of c is: empty processor p sent M 1 state of c is: M 1 processor p sent M 2 state of c is: M 2 M 1 5

T HE DISTRIBUTED SYSTEM MODEL – D EFINITIONS CONT ’ D Event – an event e is the tuple: where: p – the processor in which the event occurs s – the state of p before the event s’ – the state of p after the event c – the channel whose state was changed by the event (can be null) M – the message sent (or received) from p throw the channel c (can be null) Less formally: an event is an atomic action of a processor, that may change the state of the processors, and the state of at most one channel connected to p. 6

E XAMPLE – THE SINGLE TOKEN CONSERVATION SYSTEM The system properties: two processors, two communication channels, one token processors states: s 0 – no token s 1 – has token initial state for p: s 1, initial state for q: s 0, initial state for channels: empty events in the system can be: e 1 = e 2 = etc’… S1S1 S0S0 S0S0 S1S1 S0S0 S0S0 e1e1 e2e2 7

T HE DISTRIBUTED SYSTEM MODEL – D EFINITIONS CONT ’ D Global state – the set of the processors states and the channels states. initial global state – a global state where each processor is in it’s initial state and each channel is in an empty state. Next(S,e) – a function which value is the global state immediately after the occurrence of the event e in the global state S. next() is defined only if event e can occur in the global state S. for a global state S, and an event e = if next(S,e) = S’ then the state of p in S’ is s’ the state of the channel c in S’ is it’s state in S with the message M added to it’s tail or removed from it’s head 8

e0= E XAMPLE – THE SINGLE TOKEN CONSERVATION SYSTEM the possible global states of the single token conservation system S0S0 e 0 = next(S 0,e 0 ) = S 1 e 3 = next(S 3,e 3 ) = S 0 e 1 = next(S 1,e 1 ) = S 2 e 2 = next(S 2,e 2 ) = S 3 S1S1 S2S2 S3S3 s1s1 s0s0 s0s0 s0s0 s0s0 s1s1 s0s0 s0s0 9

T HE DISTRIBUTED SYSTEM MODEL – D EFINITIONS CONT ’ D Computation of the system – a sequence of events in the system. more formally: given a sequence of events seq = (e 0,e 1,…,e i,…e n ) seq is a computation of the system iff event e i can occur in state S i and next(S i, e i ) = S i+1 (S 0 is the initial global state) in the previews example: the computation of the system was: (e 0,e 1,e 2,e 3 ) but the sequence (e 0, e 2 ) can not be. 10

T HE A LGORITHM REQUIREMENTS The snapshot algorithm must run concurrently with the system computation. The snapshot algorithm can not alter the computation in any way. Any messages sent for recording purpose must not interfere with the computation of the system. 11

S NAPSHOT A LGORITHM - FIRST IDEA Each processor will add its state to the recorded snapshot at some point of the computation (let’s assume we can see the channels states also and record them in the same fashion) What can happen? 12

e0= FIRST IDEA - THE PROBLEM the system is in global state S 0 - “token in p”. p decides to record itself the snapshot received there is no such global state reachable from S 0 ! Lets take a look at the single token conservation system: S0S0 S1S1 s1s1 s0s0 s0s0 s0s0 the system moves to global state S 1 - “token in c” c, c’, q decide to record themselves S* s1s1 s0s0 13

F IRST IDEA - THE PROBLEM CONT ’ D What happened? p was recorded before it sent a message. c was recorded after p sent a message. the snapshot had too many messages in it. Let us denote: n - # of messages in channel right before it’s source was recorded n’ - # of messages in channel right before recording the channel In our case: n=0, n’=1 Can we conclude that if n < n’ the snapshot is inconsistent? 14

e0= FIRST IDEA - THE PROBLEM CONT ’ D the system is in global state S 0 - “token in p”. c decides to record itself the snapshot received there is no such global state reachable from S 0 ! Lets take a look again at the single token conservation system: S0S0 S1S1 s1s1 s0s0 s0s0 s0s0 the system moves to global state S 1 - “token in c” p, c’, q decide to record themselves S* s0s0 s0s0 15

FIRST IDEA - THE PROBLEM CONT ’ D What happened? c was recorded before p sent a message. p was recorded after it sent a message. we lost messages in the snapshot. Remember the denotation: n - # of messages in channel right before it’s source was recorded n’ - # of messages in channel right before recording the channel In our case: n=1, n’=0 Can we conclude that if n > n’ the snapshot is inconsistent? 16

F IRST IDEA - CONCLUSIONS the problem in both cases was that we didn’t had a means to monitor the messages that went throw the channel when the recording was done. we need the algorithm to insure that the snapshot we take will reflect the messages passing in the channel 17

T HE SNAPSHOT ALGORITHM CONDITIONS denotations: for two processor p, q and a channel c between them from p to q n - # of messages sent throw c before p was recorded n’ - # of messages sent throw c before c was recorded m – # of messages received from c before q was recorded m’ – # of messages received from c before c was recorded the following conditions are required from the snapshot: n = n’m = m’ n’ ≥ m’n ≥ m if n’ = m’, the recorded state of c must be the empty sequence if n’ > m’, the recorded state of c must contain the messages: [tail] (n’),…,(m’+1)[head] messages sent by p along c the n’-th messagethe (m’+1)-th message m’ n’ M1M1 M2M2 M3M3 M4M4 M5M5 M6M6 18

T HE SNAPSHOT ALGORITHM CONDITIONS CONT ’ D M6M6 M5M5 M4M4 M3M3 M2M2 M1M1 p recordedq recorded the recording of c In less formal way: The recorded state of c must be the sequence of messages sent along c before the state of p is recorded, excluding the sequence of messages received along c before the state of q is recorded 19

T HE ALGORITHM OUTLINE p will send a special message called a marker after the n message it sent (and before sending other message) q will record channel c’s state. the recorded sate will be the messages received by q after q recorded it’s state and before q received the marker. q will record it’s state spontaneously, or immediately after the marker is received that is, before receiving (or sending) any other messages 20

T HE ALGORITHM CREATORS k. Mani Chandy Leslie Lamport E. W. Dijkstra 21

THE ALGORITHM Marker-Sending Rule for a Processor p: For each channel c directed away from p, p sends one marker along c right after p records its state and before p sends further messages along c. Marker-Receiving Rule for a processor q: On receiving a marker along a channel C if q has not recorded its state then q records its state q records the state of c as the empty sequence else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c. 22

T HE ALGORITHM - R UNNING EXAMPLE p sends the token, then record itself c’qcp the snapshot p sends a marker q receives the token, and then receives the marker. q records itself and the incoming channel c q sends a marker p receives the marker. it already recorded itself, so it only needs to record the state of it’s incoming channel c’ S 0 – no token empty S 1 – has token s1s1 s0s0 s0s0 s0s0 s0s0 s1s1 s0s0 s1s1 s0s0 s1s1 23

S OME NOTES ABOUT THE ALGORITHM The algorithm can be initiated by one or more processors. each processor records its state spontaneously (without receiving markers from other processors) the collection of the snapshot “pieces” from each processor is a topic for a separate discussion but, if we will recall the synchronization algorithm for asynchronies system (with some variations), we can come up with ways to form the “big picture” for each processor. 24

TERMINATION OF THE ALGORITHM 25

TERMINATION OF THE ALGORITHM CONT ’ D 26

E XAMPLE – NON DETERMINISTIC SYSTEM note that the calculation in this case is not deterministic. for example, from S 0 the event occurred could have been also: e 0 = initial global state e 0 = e 1 = e 2 = the system properties: two processors: p, q. two communication channels: c, c’ p has 2 states {A,B} q has 2 states {C,D} p can send the message M while in state A. sending the message cusses it to move to state B. p can receive the message N while in state B. receiving the message cusses it to move back to state A. q works symmetrically to p. ACAD S0S0 S1S1 S2S2 S3S3 27 B C B D

the system is in global state S 0 p records itself and sends the marker c’qcp the snapshot system goes to global state S1 p receives the marker. it already recorded itself so it needs to record the state of c’ A NemptyD system goes to global state S2 system goes to global state S3 q receive the marker. q records itself and the incoming channel c. q sends the marker T HE ALGORITHM - R UNNING EXAMPLE 2 A C B C A D A D B D A D what is strange in this snapshot? 28

the snapshot the algorithm takes is not necessarily a global state the system was in. so, what does the snapshot represent then? the answer is, that the snapshot is a reachable global state of the system. in addition, if the events were to occur in a different order, the snapshot would be one of the global states reached. this makes the snapshot consistent with it’s system. T HE NON DETERMINISTIC EXAMPLE - ANALYSIS 29

Given: seq = (e i, i ≥ 0) a computation of some system S i the global state of the system before event e i S j the initial global state of the system S k the global state of the system when the algorithm terminated (0 ≤ j ≤ k) S* the global state the algorithm recorded (the snapshot) then there is a computation of the system seq’ that: for all i, i < j or i ≥ k, e i ’ = e i for all i, i ≤ j or i ≥ k, S i ’=S i the sub sequence (e i ’, j ≤ i < k) is a permutation of the sub sequence (e i, j ≤ i < k) there exists some t, j ≤ t ≤ k, such that S* = S t ’ SjSj SkSk T HEOREM seq: e0e0 e1e1 e j-1 ejej e k-1 ekek eiei 30

pre-recording event – an event that occurred in processor p before p recorded it’s state. post-recording event - an event that occurred in processor p after p recorded it’s state. note: for event e i in seq : if i < j then e i is a pre-recording event if i ≥ k then e i is a post-recording event note: for event e i in seq such that j < i < k the event e i-1 can be a post-recording event and the event e i can be a pre-recording event if they occurred in different processors. if they occurred in the same processor and e i-1 is a post- recording event then both must be post-recording events P ROOF - DEFINITIONS 31

lets denote e i-1 =, e i = lets assume: e i-1 is a post-recording event e i is a pre-recording event can M=M’ and c’=c? that is, can q be receiving the message p sent? the answer is no. e i-1 is a post-recording event which means that a marker was sent in c before M was sent. the same marker was received by q before M reached it. when q received the marker it recorded itself so if e i = it can only be a post-recording event. in contradiction to the fact that e i is a pre-recording event P ROOF - DETAILS 32

we saw that e i-1 and e i are independent of each other this means we can swap their order in the computation seq the new computation: e i-2,e i,e i-1 will end with the same global state as the original computation: e i-2,e i-1,e i P ROOF – DETAILS CONT ’ D ejej e j+1 e i-1 eiei e k-1 ekek ejej e j+1 eiei e i-1 e k-1 ekek SkSk SiSi S i-1 SkSk S’ i S i-1 S i+1 33

let seq’ be a computation were every post- recording event that occur right before a pre- recording event are swapped we repeat the swapping until seq’ has all pre- recording events before post-recording events note: seq’ is a computation of the system for all i, i < j or i ≥ k, e i ’=e i for all i, i ≤ j or i ≥ k, S i ’=S i P ROOF – DETAILS CONT ’ D ejej e j+1 e i-1 eiei e k-1 ekek e0e0 e’ j e’ j+1 e’ i-1 e’ i e’ k-1 ekek e0e0 34

lets look at the global system state after the last pre- recording event and before the first post-recording event. we will denote this state S t (j ≤ t ≤ k) for some processor p let us assume the last state p was in before recording is a. (that means p recorded a as it’s state) in the global state S t we will see that p is in state a in the snapshot S* we also see that the state of p is a (because p recorded a) we conclude that the state of each processor in S t is the same as in S* P ROOF – DETAILS CONT ’ D 35

for some channel c from p to q: in S t the messages in c are the ones p send before sending a marker in c (before p recorded itself) without the messages q received before recording itself in the snapshot S* c contains all the messages q received in c after it recorded itself and before it received a marker in c we conclude that the messages in c in the global state S t and in the snapshot S* are the same. P ROOF – DETAILS CONT ’ D 36

it is now clear that we have proven our Theorem: there is a computation of the system seq’ that: for all i, i < j or i ≥ k, e i ’ = e i for all i, i ≤ j or i ≥ k, S i ’=S i the sub sequence (e i ’, j ≤ i < k) is a permutation of the sub sequence (e i, j ≤ i < k) there exists some t, j ≤ t ≤ k, such that S* = S t ’ P ROOF – CONCLUSIONS 37

E XAMPLE – PERMUTE A COMPUTATION recall the non deterministic example: the computation we saw was: Next(S 0,e 0 )=S 1 Post-recordinge 0 = S0S0 Next(S 1,e 1 )=S 2 Pre-recordinge 1 = S1S1 Next(S 2,e 2 )=S 3 Post-recordinge 2 = S2S2 and the recorded global state was c’qcp S* NDemptyA now, lets swap the events so all pre-recordings will precede post-recordings: the global state S’ 1 of this computation is exactly the snapshot of the original computation. Next(S’ 0,e’ 0 )=S’ 1 Pre-recordinge’ 0 = S0’S0’ Next(S’ 1,e’ 1 )=S’ 2 Post-recordinge’ 1 = S 1 ’=S* Next(S’ 2,e’ 2 )=S’ 3 Post-recordinge 2 = S’ 2 38

T HE ALGORITHM - FINAL CONCLUSIONS we saw that S t =S*. from this we can see: that the snapshot S* is reachable from S j that S k is reachable from the snapshot S* we saw S* could have been a global state of the computation if events were to occur in a different order this means the snapshot is indeed valuable and informative when judging stability of a system 39

R EFERANCE Chandy, K. M and Lamport, L. Distributed Snapshots: Determining Global States of Distributed Systems Dijkstra, E. W. The distributed snapshot of K. M. Chandy and L. Lamport. 40