Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Distributed Synchronization: outline  Introduction  Causality and time o Lamport timestamps.

Slides:



Advertisements
Similar presentations
1 Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create (
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
Global States.
Lecture 8: Asynchronous Network Algorithms
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Distributed Systems Dinesh Bhat - Advanced Systems (Some slides from 2009 class) CS 6410 – Fall 2010 Time Clocks and Ordering of events Distributed Snapshots.
S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to.
Operating Systems, 112 Practical Session 14, Distributed synchronization.
Distributed Systems Spring 2009
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Distributed Mutual Exclusion  Introduction  Ricart and Agrawala's algorithm  Raymond's algorithm.
Lecture 13 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Last quiz Max: 69 / Median: 52 / Min: 24 In a box outside.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
Distributed Mutex EE324 Lecture 11.
Logical Clocks n event ordering, happened-before relation (review) n logical clocks conditions n scalar clocks condition implementation limitation n vector.
Page 1 Logical Clocks Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is.
University of Tampere, CS Department Distributed Transaction Management Jyrki Nummenmaa
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.
Lamport’s Logical Clocks & Totally Ordered Multicasting.
Communication & Synchronization Why do processes communicate in DS? –To exchange messages –To synchronize processes Why do processes synchronize in DS?
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Real-Time & MultiMedia Lab Synchronization Distributed System Jin-Seung,KIM.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
CIS825 Lecture 2. Model Processors Communication medium.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Distributed Process Coordination Presentation 1 - Sept. 14th 2002 CSE Spring 02 Group A4:Chris Sun, Min Fang, Bryan Maden.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.
Distributed Algorithms Luc J. B. Onana Alima Seif Haridi.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Logical Clocks. Topics r Logical clocks r Totally-Ordered Multicasting.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Lecture 12-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2012 Indranil Gupta (Indy) October 4, 2012 Lecture 12 Mutual Exclusion.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Lecture 7- 1 CS 425/ECE 428/CSE424 Distributed Systems (Fall 2009) Lecture 7 Distributed Mutual Exclusion Section 12.2 Klara Nahrstedt.
Lecture 4-1 Computer Science 425 Distributed Systems (Fall2009) Lecture 4 Chandy-Lamport Snapshot Algorithm and Multicast Communication Reading: Section.
1 Chapter 11 Global Properties (Distributed Termination)
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Logical Clocks event ordering, happened-before relation (review) logical clocks conditions scalar clocks  condition  implementation  limitation vector.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Logical Time Steve Ko Computer Sciences and Engineering University at Buffalo.
Operating Systems, 2012, Danny Hendler & Roie Zivan 1 Distributed Synchronization: outline  Introduction  Causality and time o Lamport timestamps o Vector.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Practical Session 13 Distributed synchronization
Distributed Mutex EE324 Lecture 11.
Lecture 9: Asynchronous Network Algorithms
湖南大学-信息科学与工程学院-计算机与科学系
Distributed Synchronization: outline
Distributed Mutual Exclusion
Jenhui Chen Office number:
Presentation transcript:

Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Distributed Synchronization: outline  Introduction  Causality and time o Lamport timestamps o Vector timestamps o Causal communication  Snapshots This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson

2 Operating Systems, 2011, Danny Hendler & Amnon Meisels  A distributed system is a collection of independent computational nodes, communicating over a network, that is abstracted as a single coherent system o Grid computing o Cloud computing (“software as a service”) o Peer-to-peer computing o Sensor networks o …  A distributed operating system allows sharing of resources and coordination of distributed computation in a transparent manner Distributed systems (a), (b) – a distributed system (c) – a multiprocessor

3 Operating Systems, 2011, Danny Hendler & Amnon Meisels  Underlies distributed operating systems and algorithms  Processes communicate via message passing (no shared memory)  Inter-node coordination in distributed systems challenged by o Lack of global state o Lack of global clock o Communication links may fail o Messages may be delayed or lost o Nodes may fail  Distributed synchronization supports correct coordination in distributed systems o May no longer use shared-memory based locks and semaphores Distributed synchronization

4 Operating Systems, 2011, Danny Hendler & Amnon Meisels Distributed Synchronization: outline  Introduction  Causality and time o Lamport timestamps o Vector timestamps o Causal communication  Snapshots

5 Operating Systems, 2011, Danny Hendler & Amnon Meisels  Events o Sending a message o Receiving a message o Timeout, internal interrupt  Processors send control messages to each other o send(destination, action; parameters)  Processes may declare that they are waiting for events: o Wait for A 1, A 2, …., A n A 1 (source; parameters) code to handle A 1.. A n (source; parameters) code to handle A n Distributed computation model

6 Operating Systems, 2011, Danny Hendler & Amnon Meisels  A distributed system has no global state nor global clock  no global order on all events may be determined  Each processor knows total orders on events occurring in it  There is a causal relation between the sending of a message and its receipt Causality and events ordering Lamport's happened-before relation H 1.e 1 < p e 2  e 1 < H e 2 (events within same processor are ordered) 2.e 1 < m e 2  e 1 < H e 2 (each message m is sent before it is received) 3.e 1 < H e 2 AND e 2 < H e 3  e 1 < H e 3 (transitivity) Leslie Lamport (1978): “Time, clocks, and the ordering of events in a distributed system”

7 Operating Systems, 2011, Danny Hendler & Amnon Meisels Causality and events ordering (cont'd) Time P1P1 P2P2 P3P3 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 e7e7 e8e8 e 1 < H e 7 ? Yes. e 1 < H e 3 ? Yes. e 1 < H e 8 ? Yes. e 5 < H e 7 ? No.

8 Operating Systems, 2011, Danny Hendler & Amnon Meisels 1 Initially my_TS=0 2 Upon event e, 3 if e is the receipt of message m 4 my_TS=max(m.TS, my_TS) 5 my_TS++ 6 e.TS=my_TS 7 If e is the sending of message m 8 m.TS=my_TS Lamport's timestamp algorithm To create a total order, ties are broken by process ID

9 Operating Systems, 2011, Danny Hendler & Amnon Meisels Lamport's timestamps (cont'd) Time P1P1 P2P2 P3P3 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 e7e7 e8e

10 Operating Systems, 2011, Danny Hendler & Amnon Meisels Distributed Synchronization: outline  Introduction  Causality and time o Lamport timestamps o Vector timestamps o Causal communication  Snapshots

11 Operating Systems, 2011, Danny Hendler & Amnon Meisels  Lamport's timestamps define a total order o e 1 < H e 2  e 1.TS < e 2.TS o However, e 1.TS < e 2.TS  e 1 < H e 2 does not hold, in general. Vector timestamps - motivation Definition: Message m 1 casually precedes message m 2 (written as m 1 < c m 2 ) if s(m 1 ) < H s(m 2 ) (sending m 1 happens before sending m 2 ) Definition: causality violation occurs if m 1 < c m 2 but r(m 2 ) < p r(m 1 ). In other words, m 1 is sent to processor p before m 2 but is received after it.  Lamport's timestamps do not allow to detect (nor prevent) causality violations.

12 Operating Systems, 2011, Danny Hendler & Amnon Meisels Causality violations – an example P1P1 P2P2 P3P3 Migrate O to P 2 Where is O? M1 On p 2 Where is O? I don’t know M2 M3 ERROR! Causality violation between… M 1 and M 3.

13 Operating Systems, 2011, Danny Hendler & Amnon Meisels Vector timestamps 1 Initially my_VT=[0,…,0] 2 Upon event e, 3 if e is the receipt of message m 4 for i=1 to M 5 my_VT[i]=max(m.VT[i], my_VT[i]) 6My_VT[self]++ 7e.VT=my_VT 8if e is the sending of message m 9 m.VT=my_VT For vector timestamps it does hold that: e 1 < VT e 2 e 1 < H e 2 e 1.VT ≤ V e 2.VT if and only if e 1.VT[i] ≤ e 2.VT[i], for every i=1,…,M e 1.VT < V e 2.VT if and only if e 1.VT ≤ V e 2.VT and e 1.VT ≠ e 2.VT

14 Operating Systems, 2011, Danny Hendler & Amnon Meisels Vector timestamps (cont'd) P1P1 P2P2 P3P3 P4P e e.VT=(3,6,4,2)

15 Operating Systems, 2011, Danny Hendler & Amnon Meisels VTs can be used to detect causality violations P1P1 P2P2 P3P3 Migrate O to P 2 Where is O? M1 On p 2 Where is O? I don’t know M2 M3 ERROR! Causality violation between… M 1 and M 3. (1,0,0) (0,0,1) (2,0,1) (3,0,1) (3,0,2) (3,0,3) (3,1,3) (3,2,3) (3,3,3) (3,2,4)

16 Operating Systems, 2011, Danny Hendler & Amnon Meisels Distributed Synchronization: outline  Introduction  Causality and time o Lamport timestamps o Vector timestamps o Causal communication  Snapshots

17 Operating Systems, 2011, Danny Hendler & Amnon Meisels A processor cannot control the order in which it receives messages… But it may control the order in which they are delivered to applications Preventing causality violations Application FIFO ordering Message passing Deliver in FIFO order Assign timestamps Message arrival order x1x1 Deliver x 1 x2x2 Deliver x 2 x4x4 Buffer x 4 x5x5 Buffer x 5 x3x3 Deliever x 3 x 4 x 5 Protocol for FIFO message delivery (as in, e.g., TCP)

18 Operating Systems, 2011, Danny Hendler & Amnon Meisels Preventing causality violations (cont'd)  Senders attach a timestamp to each message  Destination delays the delivery of out-of-order messages  Hold back a message m until we are assured that no message m’ < H m will be delivered o For every other process p, maintain the earliest timestamp of a message m that may be delivered from p o Do not deliver a message if an earlier message may still be delivered from another process

19 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order For each processor p, the earliest timestamp with which a message from p may still be delivered

20 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order For each process p, the messages from p that were received but were not delivered yet

21 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order List of messages to be delivered as a result of m’s receipt

22 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order If no blocked messages from p, update earliest[p]

23 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order m is now the most recent message from p not yet delivered

24 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order If there is a process k with delayed messages for which it is now safe to deliver its earliest message…

25 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order Remove k'th earliest message from blocked queue, make sure it is delivered

26 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order If there are additional blocked messages of k, update earliest[k] to be the timestamp of the earliest such message

27 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order Otherwise the earliest message that may be delivered from k would have previous timestamp with the k'th component incremented

28 Operating Systems, 2011, Danny Hendler & Amnon Meisels Algorithm for preventing causality violations 1earliest[1..M] initially [,, …, ] 2blocked[1…M] initially [ {}, …, {} ] 3Upon the receipt of message m from processor p 4Delivery_list = {} 5If (blocked[p] is empty) 6 earliest[p]=m.timestamp 7Add m to the tail of blocked[p] 8While (  k such that blocked[k] is non-empty AND  i  {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k]) 9 remove the message at the head of blocked[k], put it in delivery_list 10 if blocked[k] is non-empty 11 earliest[k]  m'.timestamp, where m' is at the head of blocked[k] 12 else 13 increment the k'th element of earliest[k] 14End While 15Deliver the messages in delivery_list, in causal order Finally, deliver set of messages that will not cause causality violation (if there are any).

29 Operating Systems, 2011, Danny Hendler & Amnon Meisels Execution of the algorithm as multicast P1P1 P2P2 P3P3 (0,1,0) Since the algorithm is “interested” only in causal order of sending events, vector timestamp is incremented only upon send events. (1,1,0) p 3 state earliest[1] earliest[2] earliest[3] (1,0,0)(0,1,0)(0,0,1) m1m1 m2m2 (1,1,0) (0,1,0)(0,0,1) Upon receipt of m 2 Upon receipt of m 1 (1,1,0)(0,1,0)(0,0,1) deliver m 1 (1,1,0)(0,2,0)(0,0,1) deliver m 2 (2,1,0)(0,2,0)(0,0,1)

30 Operating Systems, 2011, Danny Hendler & Amnon Meisels  Introduction  Causality and time o Lamport timestamps o Vector timestamps o Causal communication  Snapshots Distributed Synchronization: outline

31 Operating Systems, 2011, Danny Hendler & Amnon Meisels  Assume we would like to implement a distributed deadlock-detector  We record process states to check if there is a waits-for-cycle Snapshots motivation – phantom deadlock P1P1 r1r1 r2r2 P2P2 request OK request OK release request OK p2p2 r1r1 r2r2 p1p1 Actual waits-for graph p2p2 r1r1 r2r2 p1p1 Observed waits-for graph request

 Global system state: o S=(s 1, s 2, …, s M ) – local processor states o The contents L i,j =(m 1, m 2, …, m k ) of each communication channel C i,j (channels assumed to be FIFO) These are messages sent but not yet received  Global state must be consistent o If we observe in state s i that p i received message m from p k, then in observation s k,k must have sent m. o Each L i,j must contain exactly the set of messages sent by p i but not yet received by p j, as reflected by s i, s j. What is a snapshot? Snapshot state much be consistent, one that might have existed during the computation. Observations must be mutually concurrent that is, no observation casually precedes another observation

 Upon joining the algorithm a process records its local state  The process that initiates the snapshot sends snapshot tokens to its neighbors (before sending any other messages) o Neighbors send them to their neighbors – broadcast  Upon receiving a snapshot token: o a process must record its state prior to receiving additional messages o Must then send tokens to all its other neighbors  How shall we record sets L p,q ? o q receives token from p and that is the first time q receives token L p,q ={ } o q receives token from p but q received token before L p,q ={all messages received by q from p since q received token} Snapshot algorithm – informal description

34 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – data structures  The algorithm supports multiple ongoing snapshots – one per process  Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M] processor_state S[1…M] channel_state [1..M][1..M] The version number of my current snapshot

35 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – data structures  The algorithm supports multiple ongoing snapshots – one per process  Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M] processor_state S[1…M] channel_state [1..M][1..M] current_snap[r] contains version number of snapshot initiated by processor r

36 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – data structures  The algorithm supports multiple ongoing snapshots – one per process  Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M] processor_state S[1…M] channel_state [1..M][1..M] tokens_received[r] contains the number of tokens received for the snapshot initiated by processor r

37 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – data structures  The algorithm supports multiple ongoing snapshots – one per process  Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M] processor_state S[1…M] channel_state [1..M][1..M] processor_state[r] contains the state recorded for the snapshot of processor r

38 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – data structures  The algorithm supports multiple ongoing snapshots – one per process  Different snapshots from same process distinguished by version number Per process variables integer my_version initially 0 integer current_snap[1..M] initially [0,…,0] integer tokens_received[1..M] processor_state S[1…M] channel_state [1..M][1..M] channel_state[r][q] records the state of channel from q to current processor for the snapshot initiated by processor r

39 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

40 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

41 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Increment version number of this snapshot, record my local state

42 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Send snapshot-token on all outgoing channels, initialize number of received tokens for my snapshot to 0

43 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Upon receipt from q of TOKEN for snapshot (r,version)

44 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished If this is first token for snapshot (r,version)

45 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Record local state for r'th snapshot Update version number of r'th snapshot

46 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Set of messages on channel from q is empty Send token on all outgoing channels Initialize number of received tokens to 1

47 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Not the first token for (r,version)

48 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished Yet another token for snapshot (r,version)

49 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished These messages are the state of the channel from q for snapshot (r,version)

50 Operating Systems, 2011, Danny Hendler & Amnon Meisels Snapshot algorithm – pseudo-code execute_snapshot() Wait for a snapshot request or a token Snapshot request: 1 my_version++, current_snap[self]=my_version 2 S[self]  my_state 3 for each outgoing channel q, send(q, TOEKEN; self, my_version) 4 tokens_received[self]=0 TOKEN(q; r,version): 5.If current_snap[r] < version 6. S[r]  my state 7. current_snap[r]=version 8. L[r][q]  empty, send token(r, version) on each outgoing channel 9. tokens_received[r]  1 10.else 11. tokens_received[r] L[r][q]  all messages received from q since first receiving token(r,version) 13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished If all tokens of snapshot (r,version) arrived, snapshot computation is over