Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt.
Advertisements

Global States.
Dan Deng 10/30/11 Ordering and Consistent Cuts 1.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Impossibility of Distributed Consensus with One Faulty Process
Global States in a Distributed System By John Kor and Yvonne Cheng.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Scalable Algorithms for Global Snapshots in Distributed Systems
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Distributed Computing 5. Snapshot Shmuel Zaks ©
OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Slides for Chapter 10: Time and Global State
Ordering and Consistent Cuts
Ordering and Consistent Cuts Presented by Chi H. Ho.
Chapter 10 Global Properties. Unstable Predicate Detection A predicate is stable if, once it becomes true it remains true Snapshot algorithm is not useful.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Impossibility of Distributed Consensus with One Faulty Process By, Michael J.Fischer Nancy A. Lynch Michael S.Paterson.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Consistent cut A cut is a set of events.
Lecture 3: State, Detection
Global state and snapshot
Lecture 3: State, Detection
Theoretical Foundations
Lecture 9: Asynchronous Network Algorithms
Distributed Snapshot.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Logical Clocks and Casual Ordering
Outline Theoretical Foundations - continued Lab 1
Time And Global Clocks CMPT 431.
Non-Distributed Excercises
Distributed Snapshot Distributed Systems.
Uncoordinated Checkpointing
Slides for Chapter 11: Time and Global State
Outline Theoretical Foundations - continued
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
Jenhui Chen Office number:
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport

Outline What this paper is about. Stable property of a system. Distributed System model –Definitions. –Token Conservation system example. –Non-Deterministic Computation example. Global State Determination Algorithm –Requirements of Consistent Global State. –Termination of the algorithm –Properties of the recorded global state. (theorem + example) Stability Detection Algorithm

Goal/Theme Algorithms by which a process in a distributed system can determine global state of the system during a computation. Processes need to cooperate to record state. Individual processes do not share clocks/memory. Global state =  (Process state, channel state) Algorithm must run concurrently with, but not alter underlying computation.

Stable State Why does this come in the paper ? –Global State Detection Algorithm can help solve stability detection. –Several distributed systems problems need to determine stable property. y(S) : predicate function defined on the global states of the distributed system D. y is a stable property if y(S) => y(S’) for all global states S’ in D reachable from S in D. E.g. “computation terminated”, “system deadlocked”.

Distributed System Model Finite set of processes & channels[directed graph]. Channels: infinite buffers, error free, ordered delivery of messages, messages have finite delay. Channel state: sequence of messages sent but exclude those received. Process defined by: initial state, set of events, set of states. Events e =. They are atomic and can change the state of p and at most one channel. M and c = null if e does not change state of any channel.

Contd.

Global state (initial state): processes (initial state), channels (empty sequence). Next (S, e) let seq = (e i : 0 <= i <= n) be a sequence of events in component processes of a distributed system. seq is legal iff: –system starts in S 0 and S i+1 = next (S i, e i ) 0 <= i <= n.

Example 2.1

Contd.

Example 2.2

Contd.

M : marker.M’ : message. In example 2.1 only 1 event was possible… not so the case here. Different permutations of sequence of events will lead to different global states… non- determinism. Non-Determinism : for example the events “p sends M” and q sends M’” may occur in the initial state and the next states after these events are different.

Algorithm Steps: Each process records its own state. 2 processes that a channel is incident on cooperate in recording channel state. All process and channel states will not be recorded at the exact same instant… no global clock. Recorded global state must be “consistent”. Should run concurrently with underlying computation… sending messages…require processes to carry out computation; however algorithm cannot affect underlying computation.

Example 3.1 In-p. record P's state. In-c. record q, c and c’ state. 2 tokens ! n < n’. (n: #msg's sent c before recording P's state, n’: #msg’s sent c before recording C’s state). In-p. record c’s state. In-c. record q, p and c’ state. 0 tokens ! n > n’. Hence need : n = n’ for consistent global state.

Contd. Likewise: need m = m’ where m: #MSG’s received along c before recording Q’s state. m’ : #MSG’s received along c before recording C's state. The state of a channel c that is recorded must be the sequence of messages sent along the channel before the sender’s state is recorded, excluding the sequence of messages received along the channel before the receiver’s state is recorded. Marker has no effect on underlying computation.

Global State Algorithm Outline Marker sending rule : –For each channel c, incident on/directed away from p: –p sends one marker along c after p records its state and before p sends any further messages along c. Marker Receiving rule: –On receiving a marker along channel c: if q has not recorded its state then begin q records it state; q records the state c as the empty sequence. end else q records the state of c as the sequence of messages received along c after q’s state was recorded and before q received the marker along c.

Algorithm Termination To ensure termination in finite time: –L1: no marker remains forever in an incident input channel. –L2: state recording takes finite time. Can prove that the algorithm must terminate in finite time given these conditions.

Properties of Global State Recorded global state may not be the same as any actual state. But is equivalent (reachable from) and is consistent. See next theorem.

Contd. let seq = (e i : 0 <= i <= n) be a distributed computation, and let S i be the global state immediately before the event e i in seq. Let the algorithm be initiated in global state S l and let it terminate in global state S . The recorded global state S* may be different from all global states S k, l <= k <=  We show that: –S* is reachable from S l, and –S  is reachable from S*.

Contd. Specifically, we show that there exists a computation seq’ where –seq’ is a permutation of seq, such that S l, S* and S  occur as global states in seq’. –S l = S* or S l occurs earlier than S*. –S  = S* or S* occurs earlier than S  in seq’.

Theorem 1. There exists a computation seq’=(e i : 0<=i) where –For all i, where i  : e i ’ = e i –the subsequence (e i ’: l<=i<  ) is a permutation of the subsequence (e i : l<=i<  ) –for all i where i  : Si’ = Si –there exists some k, l <= k<=  such that S* = Sk’

Example 4.1 To show how seq’ can be derived from seq. Example 2.2 fig 7. –e0: p sends M, changes state to B (post-recording event). –e1: q sends M’, changes state to D (pre-recording event). –e2: p gets M’, changes state to A (post-recording event). We can interchange interchange e0 and e1 to form the subsequence seq’. (which corresponds to the global state we recorded in fig 8… see after e0’).

Stability Detection Stability Detection Algorithm: Input: A stable property y. Output: A Boolean value definite with the property (y(Si) -> definite) and (definite -> y(So)) Solution to stability detection problem: begin record a global state S*; definite := y(S*) end. Algorithm correctness comes from properties discussed earlier.