Determining Global States of Distributed Systems Presented by Sanjeev R. Kulkarni.

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt.
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Global States.
Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.
Impossibility of Distributed Consensus with One Faulty Process
Global States in a Distributed System By John Kor and Yvonne Cheng.
Global States and Checkpoints
Distributed Computing 5. Snapshot Shmuel Zaks ©
Lecture 8: Asynchronous Network Algorithms
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Distributed Snapshot (continued)
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Slides for Chapter 10: Time and Global State
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Cloud Computing Concepts
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Consensus and Its Impossibility in Asynchronous Systems.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
The Performance of the Chandy- Mishra Snapshot algorithm Jing Mao Instructor: Mikhail Nesterenko.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Consistent cut A cut is a set of events.
Global state and snapshot
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Distributed Snapshot.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Slides for Chapter 14: Time and Global States
Distributed Snapshot Distributed Systems.
Slides for Chapter 11: Time and Global State
Slides for Chapter 14: Time and Global States
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
Distributed Snapshot.
Presentation transcript:

Determining Global States of Distributed Systems Presented by Sanjeev R. Kulkarni

Global State Detection2 References 1.“ Distributed Snapshots: Determining Global States of Distributed Systems”, K. Mani Chandy and Leslie Lamport, ACM Transactions on Computer Systems, vol 3, no 1, Feb85. 2.“PUBLISHING: A Reliable Broadcast Communication Mechanism”, Michael L. Powell and David L. Presotto, Proceedings of the Ninth ACM Symposium on Operating Systems Principles, Oct Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms, Ozalp Babaoglu and Keith Marzullo, Distributed Systems, Sape J. Mullender, Addison-Wesley, 1993.

Global State Detection3 Outline of the talk Complexities of state detection in Distributed Systems The notion of Consistent States The Distributed Snapshots algorithm Application to detect Stable Properties and Checkpointing Another approach for state recording: Publishing

Global State Detection4 Model of Computation Finite set of processes Process send messages on a finite set of unidirectional channels Channels are error free, FIFO and have infinite buffers Messages experience arbitrary but finite delays Strongly connected network

Global State Detection5 Model of Computation (cont.) A computation is a sequence of events. An event is an atomic action that changes the state of a process and at most one channel state that is incident on that channel. p q ` Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3

Global State Detection6 Happened Before Relation Events e and e` of the same process. –if e happens before e` then e e` e and e` in two different processes –if e = send(m) and e` = recv(m) then e e` Transitive –if e e` and e` e`` then e e``

Global State Detection7 Determining Global States Global State “The global state of a distributed computation is the set of local states of all individual processes involved in the computation plus the state of the communication channels.”

Global State Detection8 More on States process state –memory state + register state + signal masks + open files + kernel buffers + … Or –application specific info like transactions completed, functions executed etc,. channel state –“Messages in transit” i.e. those messages that have been sent but not yet received

Global State Detection9 What’s the need for global states? Many problems in Distributed Computing can be cast as executing some action on reaching a particular state e.g. –distributed deadlock detection is finding a cycle in the Wait For Graph. –Termination detection –Checkpointing –many more…..

Global State Detection10 Why global state determination is difficult in Distributed Systems? Distributed State : Have to collect information that is spread across several machines!! Only Local knowledge : A process in the computation does not know the state of other processes.

Global State Detection11 Difficulties Instantaneous recording not possible –No global clock : Distributed recording of local states cannot be synchronized based on time –Random Network Delays : No centralized process can initiate the detection

Global State Detection12 Difficulties due to Non Determinism Deterministic Computation –At any point in computation there is at most one event that can happen next. Non-Deterministic Computation –At any point in computation there can be more than one event that can happen next.

Global State Detection13 Deterministic Computation Example A Variant of producer-consumer example Producer code: while (1) { produce m; send m; wait for ack; } Consumer code : while (1) { recv m; consume m; send ack; }

Global State Detection14 Example: Initial State m

Global State Detection15 Example m

Global State Detection16 Example m

Global State Detection17 Example a

Global State Detection18 Example a

Global State Detection19 Example a

Global State Detection20 Deterministic state diagram

Global State Detection21 Non-deterministic computation 3 processes m1m1 m2m2 m3m3 p q r

Global State Detection22 p q r q Three possible runs r m1m1 m3m3 m2m2 m1m1 m2m2 m3m3 m1m1 m3m3 m2m2 p r p q

Global State Detection23 A Non-Deterministic Computation All these states are feasible

Global State Detection24 Feasible and Actual States Any state that an external observer could have observed is a feasible state A state that an external observer did observe is an Actual state

Global State Detection25 A Non-Deterministic Computation Only some states are actual

Global State Detection26 Non-Determinism Deterministic computation –A local event would reveal everything about the global state! –The process will know other process’ state Not so for Non-Deterministic computation! m

Global State Detection27 A naïve snapshot algorithm Processes record their state at any arbitrary point A designated process collects these states +So simple!! - Correct??

Global State Detection28 Example Producer Consumer problem p records its state m pq

Global State Detection29 Example pq m

Global State Detection30 Example q records its state pq m

Global State Detection31 Example The recorded state m pq m

Global State Detection32 Where did we err? What did we do? p q m

Global State Detection33 Error!! The sender has no record of the sending The receiver has the record of the receipt Result –Global state has record of the receive event but no send event violating the happened before concept!!

Global State Detection34 The notion of Consistency A global state is consistent if it could have been observed by an external observer If e e` then it is never the case that e` is observed by the external observer and not e All feasible states are consistent

Global State Detection35 An Example p q p q Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 m1m1 m2m2 m3m3

Global State Detection36 A Consistent State? p q p q Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 m1m1 m2m2 m3m3 Sp1Sp1 Sq1Sq1

Global State Detection37 Yes p q p q Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 m1m1 m2m2 m3m3 Sp1Sp1 Sq1Sq1

Global State Detection38 A Consistent State? p q p q Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 m1m1 m2m2 m3m3 Sp2Sp2 Sq3Sq3 m3m3

Global State Detection39 Yes p q p q Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 m1m1 m2m2 m3m3 Sp2Sp2 Sq3Sq3 m3m3

Global State Detection40 An inconsistent State p q p q Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 m1m1 m2m2 m3m3 Sp1Sp1 Sq3Sq3

Global State Detection41 Chandy and Lamport Algorithm Features: –Does not promise us to give us exactly what is there –But gives us consistent state!!

Global State Detection42 A brief sketch of the algorithm (from process p’s perspective) p sends a marker message along all its outgoing channels after it records its state and before it sends any other messages. On receipt of a marker message from channel c –else state ( c ) = messages received on c since it had recorded its state excluding the marker. –if p has not recorded its state record the state state ( c ) = EMPTY

Global State Detection43 Algorithm in Action p q Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 m1m1 m2m2 m3m3

Global State Detection44 Algorithm in Action p q Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 m1m1 m2m2 m3m3 q records state as S q 1, sends marker to p

Global State Detection45 Algorithm in Action p q Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 m1m1 m2m2 m3m3 p records state as S p 2, channel state as empty

Global State Detection46 Algorithm in Action p q Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 m1m1 m2m2 m3m3 q records channel state as m 3

Global State Detection47 Algorithm in Action p q Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 m1m1 m2m2 m3m3 Recorded Global State = ((S p 2, S q 1 ), (0,m 3 ) )

Global State Detection48 Why this is consistent Proof that if recv(m) is recorded then send(m) is also recorded. p q m M

Global State Detection49 Algorithm in Action p q Sq0Sq0 Sq1Sq1 Sq2Sq2 Sq3Sq3 Sp0Sp0 Sp1Sp1 Sp2Sp2 Sp3Sp3 m1m1 m2m2 m3m3 Recorded Global State = ((S p 2, S q 1 ), (0,m 3 ) ) Moral: Computation may not even have passed through the state recorded!

Global State Detection50 What have we recorded The recorded consistent state can be anything!

Global State Detection51 Properties of the recorded global state If S i and S j are the global state when Lamport’s algorithm started and finished respectively and S * is the state recorded by the algorithm then, – S * is reachable from S i – S j is reachable from S *

Global State Detection52 S * Is reachable from S i SiSi SjSj

Global State Detection53 S j Is reachable from S * SiSi SjSj

Global State Detection54 Still what good is it? Stable Properties –A property is called a stable property iff for all states S` reachable from S –Eg: Deadlock, Termination, Token loss

Global State Detection55 Stable Properties SiSi SjSj S*S*

Global State Detection56 Stable Properties SiSi SjSj S*S*

Global State Detection57 Detection of Stable Properties Outcome = false; while ( outcome == false ) { determine Global State S; outcome = (S); }

Global State Detection58 Checkpointing S* serves as a checkpoint On a failure, restart the computation from S* Problem! –Not able to restore to Sj SiSi SjSj S*S*

Global State Detection59 Solution: Publishing A Broadcast medium A central recorder process records all the messages received by each process Processes record their states at their own time and send it to the recorder

Global State Detection60 Architecture of Publishing recorderSp1Sq1 p q

Global State Detection61 q sends the message recorderSp1Sq2 m1m1 p q

Global State Detection62 p sends an ack recorder records m 1 recorderSp2Sq2 p q

Global State Detection63 Determining Global State Recorder can construct global state from –Checkpointed States of all processes Plus –Messages recd since last checkpoint

Global State Detection64 Problems Publishing keeps track of all messages received by each process Expensive! Solution –recorder takes checkpoint of process p at time t –deletes all messages recd by p before t.

Global State Detection65 p checkpoints recorderSp2Sq2 p q

Global State Detection66 Recorder stores Sp2 deletes m 1 recorderSp2Sq2 p q

Global State Detection67 The initial situation recorderSp2Sq2 p q

Global State Detection68 Say p crashes recorderSq2 p q

Global State Detection69 Recorder reinstates p to Sp1 recorderSq2 p q Sp1

Global State Detection70 Replays back m 1 recorderSq2 p q Sp2 m1m1

Global State Detection71 q crashes recorder p q Sp2

Global State Detection72 Recorder reinstates q to Sq1 recorder p q Sp2Sq1

Global State Detection73 Ignore m 1 recorder p q Sp2 m1m1 Sq1

Global State Detection74 Comparison

Global State Detection75 Summary Global State detection difficult in Distributed Systems Snapshot algorithm may not give an actual state but is very helpful in detecting Stable Properties Publishing gives an asynchronous way of determining global states but is unscalable