Global state and snapshot

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt.
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Global States.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 14: Time and.
Global States and Checkpoints
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
CS542 Topics in Distributed Systems Diganta Goswami.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Logical Time Each event is assigned a logical time from a totally ordered set T The logical times for the events must respect any possible dependencies.
OSU CIS Lazy Snapshots Nigamanth Sridhar and Paul A.G. Sivilotti Computer and Information Science The Ohio State University
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
CS 582 / CMPE 481 Distributed Systems
Slides for Chapter 10: Time and Global State
Ordering and Consistent Cuts
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
EEC 688/788 Secure and Dependable Computing Lecture 14 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Cloud Computing Concepts
1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Distributed Computing 5. Snapshot Shmuel Zaks ©
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Consistent cut A cut is a set of events.
Global State Recording
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Theoretical Foundations
Distributed Snapshots & Termination detection
Lecture 9: Asynchronous Network Algorithms
Distributed Snapshot.
Global State Recording
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Slides for Chapter 14: Time and Global States
Logical Clocks and Casual Ordering
Time And Global Clocks CMPT 431.
Chien-Liang Fok Distribution Seminar Chien-Liang Fok
Distributed Snapshot Distributed Systems.
Chapter 5 (through section 5.4)
Uncoordinated Checkpointing
Slides for Chapter 11: Time and Global State
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
CSE 486/586 Distributed Systems Global States
Jenhui Chen Office number:
Distributed algorithms
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
Slides for Chapter 14: Time and Global States
Chandy-Lamport Example
Distributed Snapshot.
Presentation transcript:

Global state and snapshot A distributed system consists of processes that communicate asynchronously with each other through message passing Each process has a local state and states for each of its communication channels The local state of a process is the state of the process’ local memory The state of a communication channel is the set of messages sent over the channel less the messages that have been received along the channel The global state of a distributed system is a collection of the local states of the processes and the states of the communication channels of all the processes

snapshot Recording a snapshot of a distributed system is important for many applications Failure recovery Debugging a distributed system Due to the lack of a global clock, obtaining a consistent global snapshot of a distributed system is not trivial Example: transfer money between two accounts located at two different sites

Consistent global state A global state of a distributed system is consistent if the two conditions below are satisfied: C1: If the sending of a message is recorded in the state of the sender, the receiving of the message must be recorded in the state (i.e. local state or channel states) of the receiver C2: Messages that have not been recorded by the senders should not be recorded by the receivers

Issues in recording a global state How to distinguish between the messages to be recorded in the snapshot from those not to be recorded. Any message that is sent by a process before recording its snapshot, must be recorded in the global snapshot (from C1). Any message that is sent by a process after recording its snapshot, must not be recorded in the global snapshot (from C2). How to determine the instant when a process takes its snapshot. A process pj must record its snapshot before processing a message mij that was sent by process pi after recording its snapshot.

Snapshot algorithm for FIFO channels The Chandy-Lamport algorithm uses a control message, called a marker to separate messages in the channels. After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending out any more messages. A marker separates the messages in the channel into those to be included in the snapshot from those not to be recorded in the snapshot. A process must record its snapshot no later than when it receives a marker on any of its incoming channels.

The algorithm can be initiated by any process by executing the “Marker Sending Rule” by which it records its local state and sends a marker on each outgoing channel. A process executes the “Marker Receiving Rule” on receiving a marker. If the process has not yet recorded its local state, it records the state of the channel on which the marker is received as empty and executes the “Marker Sending Rule” to record its local state. The algorithm terminates after each process has received a marker on all of its incoming channels. All the local snapshots get disseminated to all other processes and all the processes can determine the global state.

Marker Sending Rule for process i Process i records its state. For each outgoing channel C on which a marker has not been sent, i sends a marker along C Marker Receiving Rule for process j On receiving a marker along channel C: if j has not recorded its state then Record the state of C as the empty set Follow the “Marker Sending Rule” else Record the state of C as the set of messages received along C after j ’s state was recorded and before j received the marker along C

Correctness Due to FIFO property of channels, it follows that no message sent after the marker on that channel is recorded in the channel state. Thus, condition C2 is satisfied. When a process pj receives message mij that precedes the marker on channel Cij , it acts as follows: If process pj has not taken its snapshot yet, then it includes mij in its recorded snapshot. Otherwise, it records mij in the state of the channel Cij . Thus, condition C1 is satisfied.

reviews Why is obtaining a consistent global snapshot difficult in a distributed system? Use an example to support your argument. What are the conditions that need to be satisfied to make a global state consistent? Modify the Chandy-Lamport algorithm so that all the channel states are recorded empty. Modify the Chandy-Lamport algorithm so that it handles the case in which several sites initiates the snapshot algorithm simultaneously. You should minimize the amount of messages being exchanged as much as possible.

readings K.M. Chandy and L. Lamport, Distributed snapshots: determining global states of distributed systems, ACM transactions on Computer Systems, 3(1), 1985, 63-75 M. Spezialetti and P. Kearns, Efficient distributed snapshots, Proceedings of the 6th Intl. Conf. on distributed computing systems, 1986, 382-388