Advanced Topics in Concurrency and Reactive Programming: Time and State Majeed Kassis.

Slides:



Advertisements
Similar presentations
Global States.
Advertisements

Global States in a Distributed System By John Kor and Yvonne Cheng.
CS 542: Topics in Distributed Systems Diganta Goswami.
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
Lecture 8: Asynchronous Network Algorithms
Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Uncoordinated Checkpointing The Global State Recording Algorithm Cristian Solano.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.
Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.
Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
1 Complexity of Network Synchronization Raeda Naamnieh.
LEADER ELECTION CS Election Algorithms Many distributed algorithms need one process to act as coordinator – Doesn’t matter which process does the.
CS 582 / CMPE 481 Distributed Systems
Distributed Systems Fall 2009 Logical time, global states, and debugging.
1. Explain why synchronization is so important in distributed systems by giving an appropriate example When each machine has its own clock, an event that.
Synchronization Clock Synchronization Logical Clocks Global State Election Algorithms Mutual Exclusion.
Computer Science Lecture 11, page 1 CS677: Distributed OS Last Class: Clock Synchronization Logical clocks Vector clocks Global state.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
EEC-681/781 Distributed Computing Systems Lecture 11 Wenbing Zhao Cleveland State University.
Clock Synchronization and algorithm
Cloud Computing Concepts
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Election Algorithms and Distributed Processing Section 6.5.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
CIS 720 Distributed algorithms. “Paint on the forehead” problem Each of you can see other’s forehead but not your own. I announce “some of you have paint.
1 Distributed Systems CS 425 / CSE 424 / ECE 428 Global Snapshots Reading: Sections 11.5 (4 th ed), 14.5 (5 th ed)  2010, I. Gupta, K. Nahrtstedt, S.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 6 Synchronization.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
Page 1 Distributed Systems Election Algorithms* *referred to slides by Prof. Hugh C. Lauer at Worcester Polytechnic Institute.
Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.
Distributed Systems Fall 2010 Logical time, global states, and debugging.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
Leader Election (if we ignore the failure detection part)
6.5 Election Algorithms -Avinash Madineni.
Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
CSE 486/586 CSE 486/586 Distributed Systems Global States Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Chapter 11 Global Properties (Distributed Termination)
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CSE 486/586 CSE 486/586 Distributed Systems Leader Election Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Distributed Systems Lecture 9 Leader election 1. Previous lecture Middleware RPC and RMI – Marshalling 2.
Distributed Systems Lecture 6 Global states and snapshots 1.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Global state and snapshot
Coordination and Agreement
CSE 486/586 Distributed Systems Leader Election
Global state and snapshot
Distributed Processing Election Algorithm
Vector Clocks and Distributed Snapshots
CSE 486/586 Distributed Systems Global States
Lecture 9: Asynchronous Network Algorithms
Leader Election (if we ignore the failure detection part)
EECS 498 Introduction to Distributed Systems Fall 2017
湖南大学-信息科学与工程学院-计算机与科学系
Time And Global Clocks CMPT 431.
CSE 486/586 Distributed Systems Leader Election
Distributed Snapshot Distributed Systems.
Lecture 10: Coordination and Agreement
Chapter 5 (through section 5.4)
Lecture 11: Coordination and Agreement
CSE 486/586 Distributed Systems Global States
CIS825 Lecture 5 1.
CSE 486/586 Distributed Systems Leader Election
Presentation transcript:

Advanced Topics in Concurrency and Reactive Programming: Time and State Majeed Kassis

Time and State: Election Algorithms “Synchronization is … doing the right thing at the right time.” Synchronization in distributed systems is related to communication. Complicated by lack of global clock, shared memory. Logical clocks support global event order. Majeed Kassis

Election Algorithms Many algorithms in distributed systems require a coordinator Most of the time it does not matter which one acts as one. Election algorithms allow choosing a unique coordinator Which all other machines agree upon Done in efficient runtime Example: Berkeley clock synchronization algorithm In case of node failure, choose a new master node. Algorithms: Bully algorithm Ring algorithm Many algorithms used in distributed systems require a coordinator For example, see the centralized mutual exclusion algorithm. In general, all processes in the distributed system are equally suitable for the role Election algorithms are designed to choose a coordinator Any process can serve as coordinator Any process can “call an election” (initiate the algorithm to choose a new coordinator). There is no harm (other than extra message traffic) in having multiple concurrent elections. Elections may be needed when the system is initialized, or if the coordinator crashes or retires.

The Bully Algorithm Setup: Goal: Initialization: Each process has unique ID number Processes know the addresses of all other machines in the system Communication is assumed reliable Processes know the IDs of all other processes in the system Goal: Set coordinator as the machine with highest ID number Initialization: When process P finds out that the coordinator has failed, it initiates an election. Every process/site has a unique ID; e.g. the network address a process number Every process in the system should know the values in the set of ID numbers, although not which processors are up or down. The process with the highest ID number will be the new coordinator. Process groups (as with ISIS toolkit or MPI) satisfy these requirements. Process p calls an election when it notices that the coordinator is no longer responding. High-numbered processes “bully” low-numbered processes out of the election, until only one process remains. When a crashed process reboots, it holds an election. If it is now the highest-numbered live process, it will win.

The Bully Algorithm P broadcasts an election message to all processes with higher IDs Expecting an "I am alive" response from them If P receives no response, declares victory. Broadcasts victory message to all processes in the system. If P hears from a process with a higher ID: P waits a certain amount of time to receive the victory message. If no victory message received, it re-broadcasts the election message. If P gets an election message from another process with a lower ID P sends an "I am alive" message back and starts new elections. P will “bully” lower ID process denying them from being coordinator. Multiple processes may detect master node failure – which causes for more than one process to initiate the elections If several elections are made in parallel then the algorithm ensures one victor – a process with the highest ID.

The Bully Algorithm: Example 7 – crashed 4 – finds out, handles elections, nominates himself, waits for responses from higher ID nodes (b) 5,6 – receive 4’s message, return “OK” message. 4 stops, waits. (c) 5 – holds elections, sends to 6,7 6 – holds elections, sends to 7

The Bully Algorithm: Example (d) 5 – receives “OK” from 6, 5 halts and waits 6 – does not receive response from 7 (e) 6 - declares victory, sends message to all nodes.

Ring Algorithm Processes are arranged in a logical ring Processes know the structure of the ring A process initiates an election: if it just recovered from failure it notices that the coordinator has failed Initiator sends election message to closest downstream node that is alive Election message is forwarded around the ring Each process adds its own ID to the Election message When Election message comes back to original node: Initiator picks node with highest ID Sends a Coordinator message specifying the winner of the election Multiple elections can be in progress! Eventually, all messages will have the same values. The ring algorithm assumes that the processes are arranged in a logical ring and each process is knows the order of the ring of processes. Processes are able to “skip” faulty systems: instead of sending to process j, send to j + 1. Faulty systems are those that don’t respond in a fixed amount of time

Ring Algorithm: Example P thinks the coordinator has crashed; builds an ELECTION message which contains its own ID number. Sends to first live successor Each process adds its own number and When the message returns to p, it sees its own process ID in the list and knows that the circuit is complete. P circulates a COORDINATOR message with the new high number. Here, both 2 and 5 elect 6: [5,6,0,1,2,3,4] [2,3,4,5,6,0,1] forwards to next. OK to have two elections at once.

Bully vs Ring runtime Assume n processes and one election in progress Bully algorithm Worst case: initiator is node with lowest ID Triggers n-2 elections at higher ranked nodes: O(n2) messages Best case: immediate election: n-2 messages Ring 2 (n-1) messages always

Elections in Wireless Networks Issues: Unreliable, and processes may move Network topology constantly changing Algorithm: Any node starts by sending out an ELECTION message to neighbors When a node receives an ELECTION message for the first time, it forwards to neighbors, and designates the sender as its parent It then waits for responses from its neighbors Responses may carry resource information When a node receives an ELECTION message for the second time It just ignores it. Traditional algorithms aren’t appropriate. Can’t assume reliable message passing or stable network configuration Wireless algorithms try to find the best node to be coordinator; traditional algorithms are satisfied with any node. Any node (the source) can initiate an election by sending an ELECTION message to its neighbors – nodes within range. When a node receives its first ELECTION message the sender becomes its parent node.

Elections: Wireless Network Example (b) (a) Initial State Node a broadcasts election channels to nearby neighbors Node a is the source. Messages have a unique ID to manage possible concurrent elections

Elections: Wireless Network Example (d) (c) g receives message from b first – sends it to neighbors g receives message from j second – ignores it c receives message from b – sends it to neighbors (d) d receives message from c e receives message from g When a node R receives its first election message, it designates the source Q as its parent, and forwards the message to all neighbors except Q. When R receives an election message from a non-parent, it just acknowledges the message

Elections: Wireless Network Example (f) (e) f receives message from e first i receives message from h first (f) Now i, f, d return responses with their own values – these nodes are the edge nodes in the tree! Each receiving node check its own value with the value of receives message – then sends the highest value of the two Final step, node a receives values, and choses the highest number – denotes him as coordinator a sends broadcast message to the complete network with the new coordinate address. If R’s neighbors have parents, R is a leaf; otherwise it waits for its children to forward the message to their neighbors. When R has collected acks from all its neighbors, it acknowledges the message from Q. Acknowledgements flow back up the tree to the original source. At each stage the “most eligible” or “best” node will be passed along from child to parent. Once the source node has received all the replies, it is in a position to choose the new coordinator. When the selection is made, it is broadcast to all nodes in the network.

What about very large networks? More than one node is selected! These nodes are denoted “supernodes” Nodes organized as peers and super-peers Elections held within each peer group Super-peers coordinate among themselves Supernodes coordinate between each other They update their own ‘internal’ network

Advanced Topics in Concurrency and Reactive Programming: Time and State Majeed Kassis

Example of a global snapshot!

But that was easy • In our system of world leaders, we were able to capture their ‘state’ (i.e., likeness) easily. Synchronized in space Synchronized in time How would we take a global snapshot if the leaders were all at home? What if Obama told Trudeau that he should really put on a shirt? This message is part of our system state!

Global snapshot is global state Each distributed application has a number of processes (leaders) running on a number of physical servers. These processes communicate with each other via channels (text messaging). A snapshot captures the local states of each process (e.g., program variables) along with the state of each communication channel.

Why do we need Global State? • There are innumerable uses for this, for instance: finding the total number of files in a distributed file system, where files may be moved from one file server to another finding the total space occupied by files in such a distributed file system - in general, to detect global properties of the distributed system, such as garbage collection, deadlock, termination

Global State the states of the participating PROCESSES, together with the states of the CHANNELS through which data (i.e., the files) pass when being transferred between these processes

Example: Distributed Garbage Collection 2 1 message garbage object object reference Garbage Collector Frees up memory which is no longer in use Check’s if a reference to memory still exists What about in a distributed system? A distributed system consists of multiple processes Each process is located on a different computer No sharing of processor or memory! Each process can only determine its own “state” Garbage: An object is considered to be garbage if there are no longer any references to it anywhere in the distributed system.

How to record snapshots? Simple Solution: Create a new process that collects the states of every other process Every process will save their state at a specific time and send it to this process Problem? Based on the assumption that all processes work on a synchronized global clock! This does not work. T(p)=1PM Global State received state of each object Process p has no record of sending m – event m sent after 1PM - Current time: 1PM Process q HAS record of receiving m – event m received before 1PM! – current time 1PM Problem? Global state does not show p sending m, therefore there is confusion as to where m came from Breaks the Consistency concept p m q T(q)=1PM

Example – Global Clock Issue Send $100 B A $300 $400 Picture taken at A - $400 A sends $100 to B Picture taken at B - $400 Total is $800

Consistent Picture Let us consider the happened-before relation. If e1 ➝ e2 then e1 happened before e2 and could have caused it. A consistent picture of the global state is obtained if we include in our computation a set of possible events, H, such that: ei ∈ H ∧ ej ➝ ei => ej ∈ H If ei were in H, but ej were not, then the set of events would include the effect of an event (for instance, the receipt of a file), but not the event causing it (the sending of the file), and an inconsistent picture would arise.

Consistent Global State The consistent GLOBAL STATE is then defined by: GS(H) = The state of each process pi after pi’s last event in H + for each channel, the sequence of messages sent in H but not received in H. Consistent Cut: representing the last event that has been recorded for each process.

A possible computation

Example: Consistent Cut

Example: Consistent Cut

Example: Inconsistent Cut

How to Construct H? Idea: The CUT and associated (consistent) set of events, H, are constructed by including specific control messages (MARKERS) in the stream of ordinary messages. Remember that we assume that: A transmitted marker will be received (and dealt with) within a FINITE TIME.

Chandy-Lamport algorithm Problem: record a global snapshot (state for each process and channel). Model: N processes in the system with no failures There are two FIFO unidirectional channels between every process pair. All messages arrive, intact, not duplicated. The only events in the system which can give rise to changes in the state are communicating events. Future work relaxes these assumptions

System requirements Taking a snapshot shouldn’t interfere with normal application behavior. Don’t stop sending messages. Don’t stop the application! Each process can record its own state Collect state in a distributed manner Any process can initiate a snapshot

Initiating a snapshot Let’s say process Pi initiates the snapshot Pi records its own state and prepares a special marker message (distinct from application messages) Send the marker message to all other processes (using N-1 outbound channels) Start recording all incoming messages from channels Cji for j not equal to i

Propagating a snapshot For all processes Pj (including the initiator), consider a message on channel Ckj. If we see marker message for the first time Pj records own state and marks Ckj as empty Send the marker message to all other processes (using N-1 outbound channels) Start recording all incoming messages from channels Cij for i not equal to j or k Else add all messages from inbound channels since we began recording to their states

Terminating a snapshot All processes have received a marker (and recorded their own state) All processes have received a marker on all the N-1 incoming channels (and recorded their states) Later, a central server can gather the partial state to build a global snapshot

Example 1: The Algorithm In Action

Example 2: The Algorithm In Action.

How the Global Snapshot is Then Created? In a practical implementation, the recorded local snapshots must be put together to create a global snapshot of the distributed system. How? Several policies: each process sends its local snapshot to the initiator of the algorithm each process sends the information it records along all outgoing channels and each process receiving such information for the first time propagates it along its outgoing channels

How is that possible?!

The algorithm finds a global state based on a partial ordering ➝ of events. For instance, we know that e1 ➝ e3 and e2 ➝ e5 BUT we have no knowledge about the timing relationship of e3 and e5. With respect to ➝ , e3 and e5 are incomparable! We cannot determine what the true sequence of these events is!

So Why Recording Global State? Stable property: a property that persists, such as termination or deadlock. Idea: if a stable property holds in the system before the snapshot begins, it holds in the recorded global snapshot. A recorded global state is useful in DETECTING STABLE PROPERTIES