1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 14: Time and.
CS 542: Topics in Distributed Systems Diganta Goswami.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2011 August 30, 2011 Lecture 3 Time and Synchronization Reading: Sections
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Slides for Chapter 10: Time and Global State
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Clock Synchronization and algorithm
Cloud Computing Concepts
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Lecture 3-1 Computer Science 425 Distributed Systems Lecture 3 Logical Clock and Global States/ Snapshots Reading: Chapter 11.4&11.5 Klara Nahrstedt.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Distributed Computing 5. Snapshot Shmuel Zaks ©
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Chapter 10: Time and Global States
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 8 Leader Election Section 12.3 Klara Nahrstedt.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 11-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) September 28, 2010 Lecture 11 Leader Election.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
Lecture 5-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 10, 2013 Lecture 5 Time and Synchronization.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
Distributed Systems Lecture 6 Global states and snapshots 1.
Global state and snapshot
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Consistent cut A cut is a set of events.
Global state and snapshot
Lecture 3: State, Detection
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
Distributed Snapshot.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Slides for Chapter 14: Time and Global States
Time And Global Clocks CMPT 431.
Chapter 5 (through section 5.4)
Slides for Chapter 11: Time and Global State
Slides for Chapter 14: Time and Global States
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
CSE 486/586 Distributed Systems Reliable Multicast --- 1
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou

2 Last Lecture Time synchronization –Berkeley algorithm –Cristian’s algorithm –NTP –Is it possible to synchronize two servers’ clocks with error=0? Lamport’s timestamps –Logical timestamps –Do the clock values of two servers need to be the same? –What are “concurrent” events? Vector Timestamps

3 [United Nations photo by Paul Skipworth for Eastman Kodak Company ©1995 ] Example of a Global State

4 The distributed version is challenging and important How would you take this photograph if each country’s premier were sitting in their respective capital, and sending messages to each other? That’s the challenge of distributed global snapshots! In a cloud: multiple servers handling multiple concurrent events and interacting with each other Without the ability to obtain a global photograph of the system, it would be a chaotic system (with potentially lots of inconsistencies)

5 Detecting Global Properties

6 Algorithms to Find Global States Why? –(Distributed) garbage collection –(Distributed) deadlock detection, termination –Two clients buy the last flight ticket at around the same time What? –Global state = state of all processes + state of all communication channels –Capture the instantaneous state of each process –And the instantaneous state of each communication channel, i.e., messages in transit on the channels How? –We’ll see this lecture!

7 Obvious First Solution… Synchronize clocks of all processes Ask all processes to record their states at some time t Time synchronization possible only approximately What about messages in transit? Synchronization not required – causality is enough!

8 Two Processes and Their Initial States

9 Execution of the Processes p 1 p 2 (empty) (empty) c 2 c 1 1. Global state S 0 2. Global state S 1 3. Global state S 2 4. Global state S 3 p 1 p 2 (Order 10, $100) (empty) c 2 c 1 p 1 p 2 (Order 10, $100) (five widgets) c 2 c 1 p 1 p 2 (Order 10, $100) (empty) c 2 c 1 Send 5 widgets

10 Process Histories and States  For a process P i, where events e i 0, e i 1, … occur: history(P i ) = h i = prefix history(P i k ) = h i k = S i k : P i ’s state immediately after k th event  For a set of processes P 1, …,P i, …. : global history: H =  i (h i ) global state: S =  i (S i k i ) a cut C  H = h 1 c1  h 2 c2  …  h n cn the frontier of C = {e i ci, i = 1,2, … n}

11 Consistent States  A cut C is consistent if and only if  e  C (if f  e then f  C)  A global state S is consistent if and only if it corresponds to a consistent cut P1 P2 P3 e10e10 e11e11 e12e12 e13e13 e20e20 e21e21 e22e22 e30e30 e31e31 e32e32 Inconsistent cut Consistent cut Lamport’s “happens-before”

12 The “Snapshot” Algorithm  Records a set of process and channel states such that the combination is a consistent global state.  Assumptions (System Model!):  There is a communication channel between each pair of processes process: N-1 in and N-1 out)  Communication channels are unidirectional and FIFO-ordered  No failure, all messages arrive intact, exactly once  Any process may initiate the snapshot (by sending “Marker” message)  Snapshot does not interfere with normal execution  Each process is able to record its state and the state of its incoming channels (no central collection)

13 The “Snapshot” Algorithm (2) 1. Marker sending rule for initiator process P 0  Record own state. After P 0 has recorded its own state for each outgoing channel C, send a marker message on C 2. Marker receiving rule for a process P k on receipt of a marker over channel C  if P k has not yet recorded its own state -record P k ’s own state -record the state of C as “empty” -for each outgoing channel C, send a marker on C -turn on recording of messages over other incoming channels -else -record the state of C as all the messages received over C since P k saved its own state; stop recording state of C

14 Chandy and Lamport’s ‘Snapshot’ Algorithm Marker receiving rule for process p i On p i ’s receipt of a marker message over channel c: if (p i has not yet recorded its state) it records its process state now; records the state of c as the empty set; turns on recording of messages arriving over other incoming channels; else p i records the state of c as the set of messages it has received over c since it saved its state. end if Marker sending rule for process p i After p i has recorded its state, for each outgoing channel c: p i sends one marker message over c (before it sends any other message over c).

15 Snapshot Example P1 P2 P3 e10e10 e20e20 e23e23 e30e30 e13e13 a b M e 1 1,2 M 1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels C21 and C31 e 2 1, 2,3 M M 2- P2 receives Marker over C12, records its state (S2), sets state(C12) = {} sends Marker to P1 & P3; turns on recording for channel C32 e14e14 3- P1 receives Marker over C21, sets state(C21) = {a} e 3, 1, 2,3 M M 4- P3 receives Marker over C13, records its state (S3), sets state(C13) = {} sends Marker to P1 & P2; turns on recording for channel C23 e24e24 5- P2 receives Marker over C32, sets state(C32) = {b} e35e35 6- P3 receives Marker over C23, sets state(C23) = {} e15e15 7- P1 receives Marker over C31, sets state(C31) = {}

16 Earlier Example with Snapshot Algorithm p 1 p 2 (empty) (empty) c 2 c 1 1. Global state S 0 2. Global state S 1 3. Global state S 2 4. Global state S 3 p 1 p 2 (Order 10, $100), M (empty) c 2 c 1 p 1 p 2 (Order 10, $100), M (five widgets) c 2 c 1 p 1 p 2 (Order 10, $100) M c 2 c 1 Send 5 widgets recorded C 1 channel state = (five widgets) recorded C 2 channel state = empty

17 Provable Assertion: Chandy-Lamport algo. determines a consistent cut Let e i and e j be events occurring at p i and p j, respectively such that e i  e j The snapshot algorithm ensures that if e j is in the cut then e i is also in the cut. if e j  p j records its state, then it must be true that e i  p i records its state. Why?  A stable predicate that is true in S-snap must be true in S-final

18 Global States useful for detecting Global Predicates  A cut is consistent if and only if it does not violate causality  A Run is a total ordering of events in H that is consistent with each h i ’s ordering  A Linearization is a run consistent with happens- before (  ) relation in H.  Linearizations pass through consistent global states.  A global state S k is reachable from global state S i, if there is a linearization, L, that passes through S i and then through S k.  The distributed system evolves as a series of transitions between global states S 0, S 1, ….

19 Global State Predicates  A global-state-predicate is a function from the set of global states to {true, false},e.g., deadlock, termination  If P is a global-state predicate of reaching termination, then a global state S0 satisfies liveness if: liveness(P(S 0 ))   L  linearizations from S0, S L :L passes through S L & P(S L ) = true  A stable global-state-predicate is one that once it becomes true, it remains true in subsequent global states, e.g., an object O is orphaned  if P is a global-state-predicate of being deadlocked, then a global state S0 satisfies this safety if: safety(P(S 0 ))   S reachable from S 0, P(S) = false

20 Quick Note – Liveness versus Safety Can be confusing, but terms are relevant outside CS too: Liveness=guarantee that something good will happen eventually –“Guarantee of termination” is a liveness property –Guarantee that “at least one of the atheletes in the 100m final will win gold” is liveness –A criminal will eventually be jailed Safety=guarantee that something bad will never happen –Deadlock avoidance algorithms provide safety –A peace treaty between two nations provides safety –An innocent person will never be jailed Can be difficult to satisfy both liveness and safety!

21 Summary, Announcements This class: importance of global snapshots, Chandy and Lamport algorithm, violation of causality Next topic: Multicast, broadcast, impossibility of consensus in asynchronous systems (see course website for readings, to be posted soon)

22

23 Side Issue: Causality Violation P1 P2 P Physical Time 4 6 Include(obj1) obj1.method() P2 has obj1 Causality violation occurs when order of messages causes an action based on information that another host has not yet received. In designing a DS, potential for causality violation is important

24 Detecting Causality Violation P1 P2 P3 (1,0,0) (2,0,0) Physical Time (2,0,2) Potential causality violation can be detected by vector timestamps. If the vector timestamp of a message is less than the local vector timestamp, on arrival, there is a potential causality violation. 0,0,0 1,0,0 2,0,1 2,2,2 2,1,2 2,0,2 2,0,0 Violation: (1,0,0) < (2,1,2)