Distributed algorithms

Slides:



Advertisements
Similar presentations
Chapter 12 Message Ordering. Causal Ordering A single message should not be overtaken by a sequence of messages Stronger than FIFO Example of FIFO but.
Advertisements

Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.
Lecture 8: Asynchronous Network Algorithms
SES Algorithm SES: Schiper-Eggli-Sandoz Algorithm. No need for broadcast messages. Each process maintains a vector V_P of size N - 1, N the number of processes.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Theoretical Aspects Logical Clocks Causal Ordering
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Logical Clocks and Global State.
Distributed Systems Spring 2009
CS 582 / CMPE 481 Distributed Systems
Causality & Global States. P1 P2 P Physical Time 4 6 Include(obj1 ) obj1.method() P2 has obj1 Causality violation occurs when order.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CMPT 431 Dr. Alexandra Fedorova Lecture VIII: Time And Global Clocks.
Distributed Systems Fall 2009 Logical time, global states, and debugging.
CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.
20101 Synchronization in distributed systems A collection of independent computers that appears to its users as a single coherent system.
Ordering and Consistent Cuts Presented by Chi H. Ho.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
1 Distributed Process Management: Distributed Global States and Distributed Mutual Exclusion.
Computer Science Lecture 10, page 1 CS677: Distributed OS Last Class: Clock Synchronization Physical clocks Clock synchronization algorithms –Cristian’s.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Logical Clocks and Global State.
Chapter 5.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
“Virtual Time and Global States of Distributed Systems”
Distributed Systems Fall 2010 Logical time, global states, and debugging.
Event Ordering Greg Bilodeau CS 5204 November 3, 2009.
Distributed Snapshot. One-dollar bank Let a $1 coin circulate in a network of a million banks. How can someone count the total $ in circulation? If not.
CIS825 Lecture 2. Model Processors Communication medium.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
D ISTRIBUTED S YSTEM UNIT-2 Theoretical Foundation for Distributed Systems Prepared By: G.S.Mishra.
Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
11-Jun-16CSE 542: Operating Systems1 Distributed systems Time, clocks, and the ordering of events in a distributed system Leslie Lamport. Communications.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS
Global state and snapshot
Global State Recording
Global state and snapshot
Mutual Exclusion Continued
Theoretical Foundations
Distributed Mutex EE324 Lecture 11.
Lecture 9: Asynchronous Network Algorithms
SYNCHORNIZATION Logical Clocks.
COT 5611 Operating Systems Design Principles Spring 2012
Global State Recording
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Theoretical Foundations - continued Vector clocks - review
Distributed Mutual Exclusion
湖南大学-信息科学与工程学院-计算机与科学系
Logical Clocks and Casual Ordering
Outline Theoretical Foundations - continued Lab 1
Time And Global Clocks CMPT 431.
Outline Distributed Mutual Exclusion Introduction Performance measures
Event Ordering.
Outline Theoretical Foundations
Chien-Liang Fok Distribution Seminar Chien-Liang Fok
CS 425 / ECE 428  2013, I. Gupta, K. Nahrtstedt, S. Mitra, N. Vaidya, M. T. Harandi, J. Hou.
Distributed Systems CS
Chapter 5 (through section 5.4)
Outline Theoretical Foundations - continued
ITEC452 Distributed Computing Lecture 8 Distributed Snapshot
Jenhui Chen Office number:
CIS825 Lecture 5 1.
COT 5611 Operating Systems Design Principles Spring 2014
Outline Theoretical Foundations
Distributed Snapshot.
Presentation transcript:

Distributed algorithms CIS 720 Distributed algorithms

Broadcast over a tree

Broadcast in an arbitrary graph

Initiator: num_recd = 0; Any other site i; num_recd = 0; sum = xinit; send bcast() to all nbrs; while (num_recd != num_nbrs) receive m from j; if m = ack(y)  sum = sum + y; num_recd++; if m == nack()  num_recd++; if m = bcast()  send nack() to j Any other site i; num_recd = 0; receive bcast() from j  parent = j; send bcast() to all nbrs except j sum = xi while (num_recd != num_nbrs - 1) receive m from j; if m = ack(y)  sum = sum + y; num_recd++; if m == nack()  num_recd++; if m = bcast()  send nack() to j end while send ack(sum) to parent;

Pulse based algorithm Knowledge of network diameter needed.

Depth First Search

not_visited = neighbor list Any other site I Initiator: not_visited = neighbor list select j from not_visited; remove j from not_visited; send visit() to j; : Any other site I - receive visit() from k visitedi = true; parent = k; L: if (not_visited != {}) { select j from not_visited; remove j from not_visited; send visit() to j; } else send backtrack() to parent; receive visit() from k remove k from not_visited; send ack() to k; Receive backtrack() or ack() go to L:

Breadth First Search

“Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint on your forehead”. Rounds will start at 5pm and will last one minute. At the beginning of the round, if you know you have paint on your forehead, send email to everyone, which will reach before the end of the round You all depart to your offices

Round 1: no email is received Round 6: 6 emails are received Assumptions: Synchronized clocks, all can reason perfectly, everyone knows that everyone else can reason perfectly

Event Ordering The execution of a process is characterized by a sequence of events (a) internal events (b) send events (c) receive events

Happened Before Relation Let a and b be events - If a and b are events in the same process and a occurred before b then a  b If a is an event of sending a message M in one process and b is the event of receiving M in another process, then a  b - if a  b and b  c then a  c

Happened Before Relation Relation  defines an irreflexive, transitive relation among the events Irreflexsive: for all a, not (a  a) Transitive: for a, b and c, a  b and b  c then a  c Antisymmetric: for all a and b, a  b implies not (b  a)

Causality Event a causally affects b if a  b. Two events a and b are concurrent (||) if not (a  b) and not (b  a). An event a can influence another event b only if a causally precedes b.

Detecting Event ordering Lamport’s Logical Clocks - Each process Pi has a clock Ci. - Each event e at Pi is assigned a clock value T(e): timestamp of e - Clock condition: If e1  e2 then T(e1) < T(e2)

Implementation of Clocks Each processor Pi increments Ci between two successive events If a = send(m) at then Pi assigns a timestamp T(m) = T(a) to m and this timestamp is sent along with the message On receiving m, Pj does the following: Cj = max( T(m) + 1, Cj)

Total ordering The set of all events can be totally ordered as follows: Let a and b be events at site i and j respectively. Then, a b iff either T(a) < T(b) or T(a) = T(b) and i < j

Limitations of Lamport’s Clock If a  b then T(a) < T(b) If T(a) < T(b) then If T(a) = T(b) then

Mutual Exclusion Algorithm Single resource that can be held by at most one process at a time. Each site issues a request to acquire permission to access the resource. Use Lamport’s clock to define the order in which the resource will be accessed.

Mutual Exclusion Algorithm Let req1 and req2 be two request events. If req1  req2 then req1 must be satisfied before req2. Otherwise, the requests are concurrent and can be satisfied in any order.

Algorithm Each site Pi maintains a request queue RQi. RQi stores requests sorted according to the timestamps. Asynchronous message passing model. FIFO channel. Types of messages: Request, Reply, Release. All messages carry the timestamp.

Algorithm When Pi wants to enter its CS, it sends Request(tsi,i) message to all sites, where tsi is the timestamp of the request event. It also places the messages in Rqi. When Pj receives Request(tsi,i), it returns a message Reply(tsj,j) and places the request in RQj.

Algorithm Pi can enter its CS if the following conditions hold: - Pi has received a message with timestamp larger than (tsi ,I) from all sites. - Pi’s request is at the front of RQi. On exiting CS, Pi sends Release message to all sites. On the reception of the release message, the entry is removed from the queue.

Vector Clocks Each process Pi maintains a clock vector Ci[0..N-1]. Ci[i] is incremented before assigning the timestamp to an event; Let a = send(M) at Pi and b be the receive of M at Pj. The vector clock is sent along with the message

Vector Clocks On receiving M, Pj first increments Cj[j] and then updates Cj as follows: for all k, Cj[k] = max(Cj[k], tm[k]), where tm denotes the vector in M.

Vector Clocks Assertion: for all i, for all j, Ci[i] >= Cj[i]. Comparison of vector clocks Ta = Tb iff for all i, Ta[i] = T[i] Ta < Tb iff for all i, Ta[i] <= Tb[i] and there exists j such that Ta[j] < Tb[j] Ta || Tb iff not(Ta < Tb) and not(Tb < Ta)

Vector Clocks a b iff Ta < Tb a || b iff Ta || Tb

Broadcast Message is addressed to all processes in a group. Several messages may be sent concurrently to processes in a group.

Causal ordering If send(m1)  send(m2) then every recipient of both m1 and m2 must receive m1 before m2. Point-to-point asynchronous complete network.

Algorithm Birman, Schiper and Stephenson All communication is assumed to be broadcast in nature. Each process Pi maintains a vector clock VTi VTi[i] = number of messages Pi has broadcast so far

To broadcast M, increments VTi[i] and assigns the timestamp to M. On receiving M with timestamp MT from Pi, Pj delays its delivery until: VTj[i] = MT[i] - 1 for all k != i, VTj[k] >= MT[k]

Algorithm When M is delivered, Pj updates VTj as follows: for all k, VTj[k] = max(VTj[k], MT[k])

Global State P1, P2,…… = set of processes si = state of process Pi 39

Token-passing example

Channel states Cj = sequences of messages sent along a channel excluding the messages already received along the channel. 41

Global State A global state GS of a system is a set of process states and the channel states GS = { s1,…,sn, C1,…,Cm } A global state is consistent if there does not exist an inconsistent message 42

Inconsistent messages

Algorithm Chandy-Lamport’s algorithm There exists a basic distributed computation whose state is being recorded. Communication is assumed to be FIFO Point-to-point network, asynchronous Global state = snapshot of the computation Reliable communication 44

Algorithm Single initiator A marker message is used Marker sending rule: Pi records its state For each outgoing channel C, Pi sends a marker along C. No computation message is sent between recording the state and sending of the marker message. 45

Algorithm Marker receiving rule: - On receiving a marker along channel C: if Pj has not recorded its state then record the state; recording state of incoming channel as empty follow marker sending rule else record the state of C as the sequence of messages received along C after j has recorded its state. 47

Algorithm A marker divides the messages into those that are included in the state and those that are logically after the state. The algorithm can be initiated by any number of processes concurrently 48

Stable predicate A predicate A is stable if once A becomes true, it remains true. A recorded global state could have existed in the past. If a stable predicate A is true in the recorded state, then it is true in the current state.

Termination Detection A process may be either active or passive Only active processes may send messages An active process can become passive at any time A passive process may become active on receiving a computation message Messages sent by the termination detection algorithm are called control messages

Checkpointing A checkpoint is a saved local state of a process Each process creates a checkpoint periodically Rollback recovery is performed when a failure occurs. The system is rolled back using the checkpointed states

Unncoordinated Checkpoints Each process takes checkpoints independently Upon failures, processes must find a consistent state of begin Problem: Domino effect

Coordinated Checkpoints Use a global state recording algorithm Ensures that the most recent set of states is consistent.

Recovery Need coordination during recovery. Even if the most recent states are consistent, coordination is needed. Two phase algorithm is needed.