Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.

Slides:

Advertisements

Similar presentations

Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems Joshua Eberhardt Research Paper: Kanianthra Mani Chandy and Leslie Lamport.

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.

Parallel and Distributed Simulation Global Virtual Time - Part 2.

Time Warp: Global Control Distributed Snapshots and Fossil Collection.

Uncoordinated Checkpointing The Global State Recording Algorithm.

Chapter 15 Basic Asynchronous Network Algorithms

Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.

Self Stabilizing Algorithms for Topology Management Presentation: Deniz Çokuslu.

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.

(c) Oded Shmueli Distributed Recovery, Lecture 7 (BHG, Chap.7)

Termination Detection of Diffusing Computations Chapter 19 Distributed Algorithms by Nancy Lynch Presented by Jamie Payton Oct. 3, 2003.

CS542 Topics in Distributed Systems Diganta Goswami.

Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.

Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.

Global State Collection. Global state collection Some applications - computing network topology - termination detection - deadlock detection Chandy-Lamport.

Distributed Snapshot (continued)

1 Complexity of Network Synchronization Raeda Naamnieh.

Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.

Ordering and Consistent Cuts Presented By Biswanath Panda.

CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.

Chapter 4 - Self-Stabilizing Algorithms for Model Conservation4-1 Chapter 4: roadmap 4.1 Token Passing: Converting a Central Daemon to read/write 4.2 Data-Link.

Termination Detection Presented by: Yonni Edelist.

Chapter 11 Detecting Termination and Deadlocks. Motivation – Diffusing computation Started by a special process, the environment environment sends messages.

Cloud Computing Concepts

Chapter 10 Global Properties. Unstable Predicate Detection A predicate is stable if, once it becomes true it remains true Snapshot algorithm is not useful.

1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.

Election Algorithms and Distributed Processing Section 6.5.

CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.

1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 10 Instructor: Haifeng YU.

Diffusing Computation. Using Spanning Tree Construction for Solving Leader Election Root is the leader In the presence of faults, –There may be multiple.

Review for Exam 2. Topics included Deadlock detection Resource and communication deadlock Graph algorithms: Routing, spanning tree, MST, leader election.

Defining Programs, Specifications, fault-tolerance, etc.

Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.

Termination Detection

The Complexity of Distributed Algorithms. Common measures Space complexity How much space is needed per process to run an algorithm? (measured in terms.

Agenda Fail Stop Processors –Problem Definition –Implementation with reliable stable storage –Implementation without reliable stable storage Failure Detection.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

Hwajung Lee. The State-transition model The set of global states = s 0 x s 1 x … x s m {s k is the set of local states of process k} S0  S1  S2  Each.

Diffusing Computation. Using Spanning Tree Construction for Solving Leader Election Root is the leader In the presence of faults, –There may be multiple.

Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.

Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory.

CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Mutual Exclusion & Leader Election Steve Ko Computer Sciences and Engineering University.

Hwajung Lee. Why do we need these? Don’t we already know a lot about programming? Well, you need to capture the notions of atomicity, non-determinism,

Building Dependable Distributed Systems, Copyright Wenbing Zhao

Global State Collection

Superstabilizing Protocols for Dynamic Distributed Systems Authors: Shlomi Dolev, Ted Herman Presented by: Vikas Motwani CSE 291: Wireless Sensor Networks.

CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.

1 Chapter 11 Global Properties (Distributed Termination)

CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.

Fault Tolerance (2). Topics r Reliable Group Communication.

Design of Nonmasking Tree Algorithm Goal: design a tree construction protocol systematically by constructing its invariant and fault-span.

Design of Tree Algorithm Objectives –Learning about satisfying safety and liveness of a distributed program –Apply the method of utilizing invariants and.

CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:

Parallel and Distributed Simulation Deadlock Detection & Recovery.

Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Distributed Systems Lecture 6 Global states and snapshots 1.

Termination detection

Lecture 3: State, Detection

Theoretical Foundations

ITEC452 Distributed Computing Lecture 9 Global State Collection

Leader Election (if we ignore the failure detection part)

Time And Global Clocks CMPT 431.

Global State Collection

Chapter 5 (through section 5.4)

CIS825 Lecture 5 1.

Distributed systems Consensus

Presentation transcript:

Termination Detection Part 1

Goal Study the development of a protocol for termination detection with the help of invariants.

Termination Detection Rules: –A process is either active or passive –An active process can become passive at any time –A passive process can become active only if it receives an computation message –Only active processes can send computation messages. All processes can receive them –Any process can send control messages, I.e., messages sent for detecting termination

A system is said to be terminated if –All processes are passive –No computation messages are in transit Reminder: We distinguish between computation messages and messages sent for detecting termination. Any process can send and receive them. These messages do not change the status of a process

Application A solution for termination detection allows one to ensure that all tasks in a system are indeed complete, even though the tasks may create additional tasks that are run at other processors in the system

Observation Termination detection is a stable property –Once true, it remains true forever Detecting such properties is important for many problems including –Garbage collection –Deadlock detection

We will consider two algorithms Based on the idea of diffusion Based on the idea of global snapshot –We will study these aspects later.

Approach 1: Dijkstra Scholten Assumptions –Initially one process is active –No failures, lost messages etc.

Each process j maintains a variable P.j that is its parent in the tree –At root, P.root = root –Initially for all other processes, P.j = NULL

Predicate in Invariant (1) The set of active processes form a tree –True in the initial state –Ensure that this remains true during computation

When a Process becomes active Consider the case when j changes from Passive to Active –It must be the case that j received a computation message from some process, say k P.j = k Become active

Action (1) P.j = NULL  j receives a message from k  P.j = k, j becomes active

When a Process Becomes Passive Consider the case when j changes from Active to Passive –It must be the case that j has no children

Action (2) P.j = NULL  j receives a message from k  P.j = k, j becomes active j is active  j wants to become passive  j has no children  j becomes passive, P.j = NULL

Alternatives J is active  J is passive and for all children of j –Set their parent to P.j –Or set their parent to root There are some paeprs with this idea. But we will skip them here

Problem? Does not deal with messages.

Predicate in Invariant (2) The set of active processes form a tree If j is passive then all messages it sent have been received –True initially –Preserve this predicate during computation

Action (3) Maintain a variable oc.j that denotes the number of messages that j has sent and are not yet acknowledged

Action (2) corrected P.j = NULL  j receives a message from k  P.j = k, j becomes active j is active  j wants to become passive  j has no children  oc.j = 0  j becomes passive, P.j = NULL

The actions on previous slide can be used to implement termination detection. Consider second action j is active  j wants to become passive  j has no children  oc.j = 0  j becomes passive, P.j = NULL Is it possible to drop ` j has no children’ from the guard?

Answer We could if we guarantee that –oc.j = 0  j has no children –Same as j has children  oc.j > 0 Could be achieved if the child does not respond to at least one of parent’s message (first one?) Thus, checking oc.j is 0 sufficient

Action (3) P.j = NULL  j receives a message from k  P.j = k, j becomes active (Don’t send ack to this message) j is active  j wants to become passive  oc.j = 0  j becomes passive, P.j = NULL; send ack to parent j is active  j receives a message from k  Send ack to k Other simple actions for maintaining oc.j

Summarizing Approach 1 Goal –Active processes form a rooted tree If process k activates j then j sets its parent to k –If a process is passive, all messages it sent have been received Acknowledge each message (at some time) –A process becomes passive only when all its children are passive; in other words, force a process to wait for its children to become passive. This is achieved if the children do not send an acknowledgment for the first message received from the parent until they become passive.

Actions Passive  Active –If j is passive and receives a computation message from k then P.j = k Become active

Actions Active  Active –If j is active and receives a computation message from l Send an acknowledgment

Actions Message send –If j wants to send message (it must be active) oc.j ++ (Number of outstanding acknowledgments is increased) Acknowledgement receive –oc.j = oc.j – 1

Actions Active  passive –If j wants to be passive and oc.j = 0 Send an acknowledgment to P.j (Observe that the first message from parent was not immediately acknowledged) Become passive If j is the root then declare termination

Diffusing Computation Crucial for various applications General outline –root(?) sends the diffusion message –Every node that receives the diffusion message for the first time forwards it to its neighbors First node from which diffusion is received is called parent –All subsequent diffusion messages are acknowledged –Upon receiving acknowledgements form all neighbors, a node completes the diffusion and sends acknowledgment to parent –When root completes the diffusion, the diffusion computation is complete

Termination Detection II

Approach Arrange processes in a (hypothetical) ring –The ring is used only for the sake of termination detection –Any process can communicate with any other process

Approach Each process maintains a variable c.j c.j = number of messages sent by j – number of messages received by j Initially, c values are all 0

Action (1) When j wants to sent a message c.j := c.j + 1 When j receives a message c.j := c.j – 1

Observation Number of messages in transit = Detect the value of Ensure that when c.j is read j is passive

Action (2) Send a token along the ring to compute When 0 sends a token (token is sent only when the previous token is received and process 0 is passive) token.sum = c.0 Forwarding the token by process j, j <> 0 (token is sent only when j is passive) token.sum := token.sum + c.j

Remark Observe that the token is taking a snapshot of the system –The global snapshot consists of local snapshot of every process

Invariant (1) P1 = (token is between k and k+1  ((token.sum = c.0 + c.1 + c.2 + … + c.k) /\ (processes 0..k are passive))

If P1 were true and the toekn is between n and 0 then token.sum would capture the sum of c values and all processes would be passive

Problem with P1 After a token is sent by k, some process in the range 0..k may receive a message, thereby violating P1 To deal with this, we have two options –Strengthen P1 so that such a message cannot be received –Weaken P1 so that the invariant contains states that are reached due to such messages We need to follow this approach

Question What can be said of states reached due to violation of P1

Invariant (2) P1 \/ P2, where P2 = (token is between k and k+1  (token.sum + c.(k+1) + c.(k+2) … + c.n > 0) If we start from a state where P1 is true and P1 becomes false then in that state P2 is true

Problem with P1 \/ P2 Consider the case where the token is between k and k+1 Some process, j, j < k, is active j sends a message to a process l, l > k –In this scenario, we want to make sure that l invalidates the token circulation so that process 0 ignores token.sum

Introduce a color for process When j receives a message c.j := c.j – 1; color.j := purple // Addition to previous action

Invariant (3) P1 \/ P2 \/ P3, where P3 = (token is between k and k+1   l : l > k : color.l = purple)

Problem Consider the case where l is purple and intends to ensure that the token circulation is invalidated –What happens if l forwards the token –All predicates P1, P2, P3 can be false. Also, the color needs to be changed back to yellow so that the token circulation will be eventually valid

Solution Introduce a color.token –Initially yellow Forwarding the token by process j, j <> 0 (remember: token is sent only when j is passive) token.sum := token.sum + c.j if (color.j = purple) color.token = purple color.j = yellow else // Preserve the token color that you received. // Basically, do nothing.

Invariant (4) P1 \/ P2 \/ P3 \/ P4, where P4 = color.token = purple

When Should color.token be reset? At process 0? –Is P1 \/ P2 \/ P3 \/ P4 violated in such circumstances? If no, which of these predicates is guaranteed to be true

When is Termination Detected? Token returns to 0 color.token = yellow, and token.sum = 0

Can we Deduce Termination from Invariant? How?