CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

Last Class: Clock Synchronization
Scalable Algorithms for Global Snapshots in Distributed Systems
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 6 Instructor: Haifeng YU.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 4 Instructor: Haifeng YU.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Reliable Multicast Steve Ko Computer Sciences and Engineering University at Buffalo.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Uncoordinated Checkpointing The Global State Recording Algorithm.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
1 Causality. 2 The “happens before” relation happens before (causes)
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 7 Instructor: Haifeng YU.
Logical Time Each event is assigned a logical time from a totally ordered set T The logical times for the events must respect any possible dependencies.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.
CS 582 / CMPE 481 Distributed Systems
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Cloud Computing Concepts
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Chapter 17 Theoretical Issues in Distributed Systems
Lecture 3-1 Computer Science 425 Distributed Systems Lecture 3 Logical Clock and Global States/ Snapshots Reading: Chapter 11.4&11.5 Klara Nahrstedt.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Chapter 9 Global Snapshot. Global state  A set of local states that are concurrent with each other Concurrent states: no two states have a happened before.
Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.
Lecture 6-1 Computer Science 425 Distributed Systems CS 425 / ECE 428 Fall 2013 Indranil Gupta (Indy) September 12, 2013 Lecture 6 Global Snapshots Reading:
Distributed Snapshot. Think about these -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes?
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 8 Instructor: Haifeng YU.
Logical Clocks. Topics Logical clocks Totally-Ordered Multicasting Vector timestamps.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
“Virtual Time and Global States of Distributed Systems”
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Feb 15, 2001CSCI {4,6}900: Ubiquitous Computing1 Announcements.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 9 Instructor: Haifeng YU.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
1 Chapter 11 Global Properties (Distributed Termination)
Hwajung Lee. -- How many messages are in transit on the internet? --What is the global state of a distributed system of N processes? How do we compute.
Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation Author: Friedermann Mattern Presented By: Shruthi Koundinya.
Distributed Systems Lecture 6 Global states and snapshots 1.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global State Recording
Lecture 3: State, Detection
CSE 486/586 Distributed Systems Global States
Distributed Snapshots & Termination detection
Global State Recording
湖南大学-信息科学与工程学院-计算机与科学系
Time And Global Clocks CMPT 431.
Non-Distributed Excercises
Chapter 5 (through section 5.4)
Chap 5 Distributed Coordination
CSE 486/586 Distributed Systems Global States
CIS825 Lecture 5 1.
Presentation transcript:

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 5 Instructor: Haifeng YU

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 22 Review of Last Lecture  Chapter 7 “Models and Clocks”  Goal: Define “time” in a distributed system  Logical clock  “happened before”  smaller clock value  Vector clock  “happened before”  smaller clock value  Matrix clock  Gives a process knowledge about what other processes know

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 23 Today’s Roadmap  Chapter 9 “Global Snapshot”  What is our goal?  Formalizing consistent global snapshot  Protocol for capturing a consistent global snapshot

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 24 Motivation  Goal: Take a snapshot of the global computation  A snapshot of local states on n processes taken at exactly the same time  Note: The book uses two terms “global state” and “global snapshot” – we will stick to “global snapshot” and avoid the other term  Useful for debugging  Useful for backup/check-pointing  Useful for calculating global predicate  E.g., Exactly how much currency do we have in the country (notice that money flows among people constantly)?

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 25 Motivation  If each process have access to completely accurate physical time – then trivial  All processes record local state at 1:00:00pm  We always get $200 total regardless when we take the snapshot user1 starts with $100 user2 starts with $100 give $50 $100 $50 $100 $150

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 26 Motivation  But we don’t have completely accurate physical time  user1 and user2 may record states in slightly different time  Inaccuracy needs to be below minimum message propagation delay  Impossible (due to theory of relativity)! user1 starts with $100 user2 starts with $100 give $50 $100 $50 $100 $150

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 27 Weaken Our Goal  Goal: Take a snapshot of the global computation  A snapshot of local states on n processes taken at exactly the same time  A snapshot of local states on n processes such that the global snapshot could have happened sometime in the past  “In the past”  We don’t know when it occurred  “Could have happened” – User cannot tell the difference  But in reality, may not have happened  Despite all these compromises, still useful (e.g., debugging, backup, computing global predicates)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 28 Review of “Happened-Before” Relation  A partial order among events (i.e., local computation, send, receive)  Process-order, send-receive order, transitivity user1 (process1) user2 (process2) user3 (process3) A B C D

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 29 Consistent Snapshot  Consistent snapshot:  A snapshot of local states on n processes such that the global snapshot could have happened sometime in the past  The green cut could have happened in the past  The red cut could never happen in the past user1 (process1) user2 (process2) user3 (process3)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 210 Formalizing Consistent Snapshot  Global snapshot:  A set of events such that if e2 is in the set and e1 is before e2 in process order, then e1 must be in the set  Consistent global snapshot:  A global snapshot such that if e2 is in the set and e1 is before e2 in send-receive order, then e1 must be in the set  Do we need to include transitive relations as in “Happened-before”? user1 (process1) user2 (process2) user3 (process3)

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 211 Existence of Consistent Global Snapshot  Consider the ordered events e_1, e_2, … on any given process  Theorem: For any m, there exists a consistent global snapshot S such that e_i is in S for i  m, and e_i is not in S for i > m

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 212 Capturing a Consistent Global Snapshot  Communication model  No message loss  Communication channels are unidirectional  FIFO delivery on each channel  Ensuring FIFO:  Each process maintains a message number counter for each channel and stamps each message sent  Receiver will only deliver messages in order

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 213 Protocol Intuition process1 process2  process 1 takes a snapshot here process 2 takes a snapshot, then the cut is consistent  

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 214 Protocol Intuition process1 process2  process 1 takes a snapshot here process 2 takes a snapshot, then the cut is not consistent  process 2 takes a snapshot, then the cut is consistent  The key to make a consistent global snapshot, is to avoid this scenario

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 215 Protocol Intuition process1 process2  process 1 takes a snapshot here process 2 takes a snapshot, then the cut is not consistent  Require process 2 to take a snapshot before M is delivered application-level message M special message used for the snapshot protocol 

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 216 Chandy&Lamport’s Protocol: Taking local snapshots  Each process is either  Red (it has taken the local snapshot), OR  White (it has not taken the local snapshot)  Protocol initiated by a single process by turning itself from White to Red  Once a process turns to Red, it immediately send out Marker messages to all other processes  Upon receiving Marker, a process turns Red

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 217 Chandy&Lamport’s Protocol: Taking snapshots for messages on the fly process1 process2   Note: Figure 9.2 on the textbook cannot really happen with FIFO channel – ignore the figure please.  Such application message is impossible

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 218 process1 process2   process2 records all messages received in this window – These are the exact set of messages that are only the “fly” Chandy&Lamport’s Protocol: Taking snapshots for messages on the fly

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 219 Summary  Chapter 9 “Global Snapshot”  What is our goal?  Formalizing consistent global snapshot  Protocol for capturing a consistent global snapshot

CS4231 Parallel and Distributed Algorithms AY2006/2007 Semester 220 Homework Assignment  Page 161  Problem 9.5 (assuming the you are given the set of ALL events) – Prove that the global snapshot you get is a consistent one  Prove the following  If G and H are both consistent global snapshot, then G  H and G  H are also consistent global snapshots  Homework due a week from today  Read Chapter 12