Distributed Snapshots & Termination detection

Slides:



Advertisements
Similar presentations
Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Advertisements

Scalable Algorithms for Global Snapshots in Distributed Systems
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
1 Global State $500$200 A B C1: Empty C2: Empty Global State 1 $450$200 A B C1: Tx $50 C2: Empty Global State 2 $450$250 A B C1: Empty C2: Empty Global.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Termination Detection of Diffusing Computations Chapter 19 Distributed Algorithms by Nancy Lynch Presented by Jamie Payton Oct. 3, 2003.
1 Causality. 2 The “happens before” relation happens before (causes)
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Chapter 11 Detecting Termination and Deadlocks. Motivation – Diffusing computation Started by a special process, the environment environment sends messages.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Chapter 5.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
Winter, 2004CSS490 Synchronization1 Textbook Ch6 Instructor: Munehiro Fukuda These slides were compiled from the textbook, the reference books, and the.
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Distributed Systems Lecture 6 Global states and snapshots 1.
Chapter 10 Time and Global States
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global state and snapshot
Consistent cut A cut is a set of events.
Global State Recording
Global state and snapshot
Time and Global States Ali Fanian Isfahan University of Technology
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Lecture 9: Asynchronous Network Algorithms
Distributed Snapshot.
COT 5611 Operating Systems Design Principles Spring 2012
Global State Recording
Distributed Snapshot.
湖南大学-信息科学与工程学院-计算机与科学系
Outline Theoretical Foundations - continued Lab 1
Time And Global Clocks CMPT 431.
Non-Distributed Excercises
Distributed Snapshot Distributed Systems.
Chapter 5 (through section 5.4)
Lecture 8 Processes and events Local and global states Time
Outline Theoretical Foundations - continued
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Distributed Snapshot.
Chap 5 Distributed Coordination
CSE 486/586 Distributed Systems Global States
CIS825 Lecture 5 1.
Consistent cut If this is not true, then the cut is inconsistent
COT 5611 Operating Systems Design Principles Spring 2014
Distributed Snapshot.
Presentation transcript:

Distributed Snapshots & Termination detection Presented by Subashini Balachandran

What is a snapshot A snapshot of a distributed system is a global state where the local states of all processes and of all communication channels are recorded simultaneously Such a causally consistent state in a distributed system without a common clock is extremely complicated to achieve

Where is a snapshot used? Detection of deadlock of a distributed system Compute monotonic functions of the global state such as lower bounds on the simulation time. Check pointing and recovery of distributed data bases Monitoring and debugging of distributed systems.

Consistent and Inconsistent cuts A cut is consistent if no message arrow starts in future and ends in past. (e.g. ) AB Otherwise it is inconsistent ( e.g.) CD C A 1 2 3 4 D B

Consistent cut Algorithm Consistent cut for non-FIFO systems by piggybacking a one bit status onto basic messages every process is initially white and turns red while taking a local snapshot every message sent by a white(red) process is colored white(red) every process takes a local snapshot at its convenience-but before a red message is possibly received

Example -1 The Snapshot is taken till the white color ends for all the process P1 P2 P3 P4

Example - 2 The Snapshot is taken before looking at or processing the red message P1 P2 P3 P4

Consistent cut Algorithm(cont..) cut defined by the white events is consistent No red message sent after the cut is received by a white process before the cut a white process must be able to take a local snapshot at the moment it receives a red basic message

Catching the messages in transit Messages in transit are precisely the white messages which are received by red process so whenever a red process gets a white message ,it can send a copy of it to the snapshot initiator

The Snapshot principle After the snapshot initiator received the last copy of all in-transit messages and the local snapshots of all process, it knows the snapshot is complete. P1 End P2 P3 P4 Local snapshot Copy of messages in transit

Termination Detection A process is considered active if it is white and passive otherwise Only white messages are considered Then white computation has terminated if no process is white no white messages are in transit Problems ? cannot determine when it has received the last white message.

Deficiency Counting TD Each process had a counter being part of process state counter count = (#of basic messages that process has sent) - ( #of basic messages it has received from any other process) together with local snapshot and counters, can determine the total number of messages in transit Thus the end of the snapshot is determined.

Vector Counter Principle TD every process Pi counts the number of white messages it has sent to Pj(i=j) on the j-th component of a local vector Vi of length n (n= number of process ) when a white message is received , its own component is decrement Vi[i] = Vi[i] -1 control vector C circulates the ring ,accumulates the local vector and resets them to zero C = C+Vi ; Vi = 0

Vector Counter Principle TD at the end of first round C[i] indicates the number of white messages that are in transit to Pi for the cut no more new white messages are generated second round is necessary if C[i] >0 waits at each Pi until all the (white) in-transit messages have been received Vi[i]+C[i] <= 0 all the in-transit messages are collected guarantee termination after 2 control rounds

Example P1 P2 P3 P4 1 Accumulated control vector C 2 -1 1 1 1

Conclusion Presented a new algorithm for computing snapshot Basic idea is to use 2 colors indicating the process states to identify the past and the future Termination detection using vector counter method

Thank You :-)