Synchronous Algorithms I Barrier Synchronizations and Computing LBTS.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Decoding of Convolutional Codes  Let C m be the set of allowable code sequences of length m.  Not all sequences in {0,1}m are allowable code sequences!
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
6.1 Synchronous Computations ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Lecture 8: Asynchronous Network Algorithms
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Time Warp: Global Control Distributed Snapshots and Fossil Collection.
Chapter 15 Basic Asynchronous Network Algorithms
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Lecture 7: Synchronous Network Algorithms
Parallel and Distributed Simulation Time Warp: Other Mechanisms.
Minimum Spanning Trees
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 7 Instructor: Haifeng YU.
Broadcasting Protocol for an Amorphous Computer Lukáš Petrů MFF UK, Prague Jiří Wiedermann ICS AS CR.
Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
Parallel and Distributed Simulation Time Warp: State Saving.
1 Complexity of Network Synchronization Raeda Naamnieh.
Ordering and Consistent Cuts Presented By Biswanath Panda.
CPSC 668Set 2: Basic Graph Algorithms1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
Lecture 6 Objectives Communication Complexity Analysis Collective Operations –Reduction –Binomial Trees –Gather and Scatter Operations Review Communication.
Parallel and Distributed Simulation Object-Oriented Simulation.
Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.
Distributed systems Module 1 -Basic networking Teaching unit 1 – LAN standards Ernesto Damiani University of Bozen-Bolzano Lesson 4 – Ethernet frame.
Projects. Dataflow analysis Dataflow analysis: what is it? A common framework for expressing algorithms that compute information about a program Why.
Chapter 14 Synchronizers. Synchronizers : Introduction Simulate a synchronous network over an asynchronous underlying network Possible in the absence.
Election Algorithms and Distributed Processing Section 6.5.
Synchronization (Barriers) Parallel Processing (CS453)
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Parallel and Distributed Simulation FDK Software.
Securing Every Bit: Authenticated Broadcast in Wireless Networks Dan Alistarh, Seth Gilbert, Rachid Guerraoui, Zarko Milosevic, and Calvin Newport.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
Parallel and Distributed Simulation Synchronizing Wallclock Time.
Time Warp State Saving and Simultaneous Events. Outline State Saving Techniques –Copy State Saving –Infrequent State Saving –Incremental State Saving.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
LogP Model Motivation BSP Model Limited to BW of Network (g) and Load of PE Requires large load per super steps. Need Better Models for Portable Algorithms.
1 © R. Guerraoui Regular register algorithms R. Guerraoui Distributed Programming Laboratory lpdwww.epfl.ch.
Data Structure & Algorithm II.  In a multiuser computer system, multiple users submit jobs to run on a single processor.  We assume that the time required.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Leader Election (if we ignore the failure detection part)
Foundation of Computing Systems
DISTRIBUTED ALGORITHMS Spring 2014 Prof. Jennifer Welch Set 2: Basic Graph Algorithms 1.
Self-stabilizing energy-efficient multicast for MANETs.
Active Message Application: CONNECT Presented by Xiaozhou David Zhu Oommen Regi July 6, 2001.
Hwajung Lee. Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.   i,j  V  i,j are non-faulty ::
HYPERCUBE ALGORITHMS-1
TreeCast: A Stateless Addressing and Routing Architecture for Sensor Networks Santashil PalChaudhuri, Shu Du, Ami K. Saha, and David B. Johnson Department.
1 Chapter 11 Global Properties (Distributed Termination)
CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.
Fault Tolerance (2). Topics r Reliable Group Communication.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Parallel and Distributed Simulation Deadlock Detection & Recovery.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Leader Election Let G = (V,E) define the network topology. Each process i has a variable L(i) that defines the leader.  i,j  V  i,j are non-faulty ::
PDES Introduction The Time Warp Mechanism
Parallel and Distributed Simulation
PDES: Time Warp Mechanism Computing Global Virtual Time
Data Structures: Segment Trees, Fenwick Trees
Leader Election (if we ignore the failure detection part)
Parallel and Distributed Simulation
CIS595: Lecture 5 Acknowledgement:
Presentation transcript:

Synchronous Algorithms I Barrier Synchronizations and Computing LBTS

Outline Barrier synchronizations and a simple synchronous algorithm Implementation of Barrier mechanisms –Centralized Barriers –Tree Barrier –Butterfly Barrier Computing LBTS

Barrier Synchronization Barrier Synchronization: when a process invokes the barrier primitive, it will block until all other processes have also invoked the barrier primitive. When the last process invokes the barrier, all processes can resume execution - barrier - wait - barrier - process 1process 2process 3process 4 wallclock time

Synchronous Execution Recall the goal is to ensure each LP processes events in time stamp order Basic idea: each process cycles through the following steps: Determine the events that are safe to process –Compute a Lower Bound on the Time Stamp (LBTS i ) of events that LP i might later receive –Events with time stamp ≤ LBTS are safe to process Process safe events, exchange messages Global synchronization (barrier) Messages generated in one cycle are not eligible for processing until the next cycle

A Simple Synchronous Algorithm Assume any LP can communicate with any other LP Assume instantaneous message transmission (revisit later) N i = time of next event in LP i LA i = lookahead of LP i WHILE (unprocessed events remain) receive messages generated in previous iteration LBTS = min (N i + LA i ) process events with time stamp ≤ LBTS barrier synchronization endDO LBTS i = a lower bound on the time stamp of messages LP i might receive in the future If LBTS i is the same for all LPs, it is simply called LBTS

event Synchronous Execution Example LP A (LA=3) LP B (LA=2) LP C (LA=3) LP D (LA=5) Simulation Time LBTS=3 safe to process LBTS=7LBTS=12

Issues Implementing the barrier mechanism Computing LBTS (global minimum) Transient messages (discussed later)

Barrier Using a Centralized Controller Central controller process used to implement barrier Overall, a two step process –Controller determines when barrier reached –Broadcast message to release processes from the barrier Barrier primitive for non-controller processes: –Send a message to central controller –Wait for a reply Barrier primitive for controller process –Receive barrier messages from other processes –When a message is received from each process, broadcast message to release barrier Performance –Controller must send and receive N-1 messages –Potential bottleneck

Broadcast Barrier One step approach; no central controller Each process: Broadcast message when barrier primitive is invoked Wait until a message is received from each other process N (N-1) messages

Tree Barrier Organize processes into a tree A process sends a message to its parent process when –The process has reached the barrier point, and –A message has been received from each of its children processes Root detects completion of barrier, broadcast message to release processes (e.g., send messages down tree) 2 log N time if all processes reach barrier at same time

Butterfly Barrier N processes (here, assume N is a power of 2) Sequence of log 2 N pairwise barriers (let k = log 2 N) Pairwise barrier: –Send message to partner process –Wait until message is received from that process Process p: b k b k-1 … b 1 = binary representation of p Step i: perform barrier with process b k … b i ’ … b 1 (complement ith bit of the binary representation) Example: Process 3 (011) –Step 1: pairwise barrier with process 2 (010) –Step 2: pairwise barrier with process 1 (001) –Step 3: pairwise barrier with process 7 (111)

Butterfly Barrier Example step 1 step 2 step 3 Wallclock time ,12,34,56, step 1 step 2 step 3 The communication pattern forms a tree from the perspective of any process

Butterfly: Superimpose Trees After log 2 N steps each process is notified that the barrier operation has completed An N node butterfly can be viewed as N trees superimposed over each other

Outline Barrier synchronizations and a simple synchronous algorithm Implementation of Barrier mechanisms –Centralized Barriers –Tree Barrier –Butterfly Barrier Computing LBTS

step 1 step 2 step 3 Wallclock time It is trivial to extend any of these barrier algorithms to also compute a global minimum (LBTS) Piggyback local time value on each barrier message Compute new minimum among local value, incoming message(s) Transmit new minimum in next step After log N steps, (1) LBTS has been computed, (2) each process has the LBTS value

Summary Synchronous algorithms use a global barrier to ensure events are processed in time stamp order –Requires computation of a Lower Bound on Time Stamp (LBTS) on messages each LP might later receive There are several ways to implement barriers –Central controller –Broadcast –Tree –Butterfly The LBTS computation can be “piggybacked” onto the barrier synchronization algorithm