Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.

Slides:

Advertisements

Similar presentations

Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct

Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

6.1 Synchronous Computations ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.

Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.

Lecture 8: Asynchronous Network Algorithms

Parallel and Distributed Simulation Global Virtual Time - Part 2.

Time Warp: Global Control Distributed Snapshots and Fossil Collection.

Token-Dased DMX Algorithms n LeLann’s token ring n Suzuki-Kasami’s broadcast n Raymond’s tree.

Chapter 15 Basic Asynchronous Network Algorithms

Parallel and Distributed Simulation Time Warp: Basic Algorithm.

Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.

Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.

Other Optimistic Mechanism, Memory Management. Outline Dynamic Memory Allocation Error Handling Event Retraction Lazy Cancellation Lazy Re-Evaluation.

Parallel and Distributed Simulation Time Warp: Other Mechanisms.

Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.

Termination Detection. Goal Study the development of a protocol for termination detection with the help of invariants.

Parallel and Distributed Simulation Time Warp: State Saving.

1 Complexity of Network Synchronization Raeda Naamnieh.

Ordering and Consistent Cuts Presented By Biswanath Panda.

Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Distributed Snapshots –Termination detection Election algorithms –Bully –Ring.

Parallel and Distributed Simulation Object-Oriented Simulation.

1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.

1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )

Distributed Systems Fall 2009 Distributed transactions.

Chapter 14 Synchronizers. Synchronizers : Introduction Simulate a synchronous network over an asynchronous underlying network Possible in the absence.

Election Algorithms and Distributed Processing Section 6.5.

Concurrency: Deadlock and Starvation Chapter 6. Goal and approach Deadlock and starvation Underlying principles Solutions? –Prevention –Detection –Avoidance.

Distributed Mutex EE324 Lecture 11.

Synchronization (Barriers) Parallel Processing (CS453)

Parallel and Distributed Simulation FDK Software.

Synchronous Algorithms I Barrier Synchronizations and Computing LBTS.

Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,

Parallel and Distributed Simulation Hardware Platforms Simulation Fundamentals.

4.5 DISTRIBUTED MUTUAL EXCLUSION MOSES RENTAPALLI.

1 Processes, Threads, Race Conditions & Deadlocks Operating Systems Review.

Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.

Time Warp State Saving and Simultaneous Events. Outline State Saving Techniques –Copy State Saving –Infrequent State Saving –Incremental State Saving.

Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.

1 Concurrency Architecture Types Tasks Synchronization –Semaphores –Monitors –Message Passing Concurrency in Ada Java Threads.

Presenter: Long Ma Advisor: Dr. Zhang 4.5 DISTRIBUTED MUTUAL EXCLUSION.

Shuman Guo CSc 8320 Advanced Operating Systems

 Communication Distributed Systems IT332. Outline  Fundamentals  Layered network communication protocols  Types of communication  Remote Procedure.

Parallel and Distributed Simulation Time Parallel Simulation.

DEADLOCK DETECTION ALGORITHMS IN DISTRIBUTED SYSTEMS

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.

HYPERCUBE ALGORITHMS-1

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

1 Chapter 11 Global Properties (Distributed Termination)

CIS 825 Review session. P1: Assume that processes are arranged in a ring topology. Consider the following modification of the Lamport’s mutual exclusion.

Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.

Zookeeper Wait-Free Coordination for Internet-Scale Systems.

Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:

Parallel and Distributed Simulation Deadlock Detection & Recovery.

Embedded Real-Time Systems Processing interrupts Lecturer Department University.

PDES Introduction The Time Warp Mechanism

Parallel and Distributed Simulation

Parallel and Distributed Simulation Techniques

PDES: Time Warp Mechanism Computing Global Virtual Time

Distributed Mutex EE324 Lecture 11.

CPSC 531: System Modeling and Simulation

Timewarp Elias Muche.

Introduction to locality sensitive approach to distributed systems

Outline Distributed Mutual Exclusion Introduction Performance measures

Parallel and Distributed Simulation

Synchronizing Computations

Lecture 18: Coherence and Synchronization

Presentation transcript:

Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms

Outline Deadlock Detection and Recovery Algorithm –Empirical performance measurements Synchronous Algorithms –Barrier mechanisms –Centralized Barriers –Tree Barrier –Butterfly Barrier

Performance T = arrival time of job Q = waiting time in queue S = service time Example: Tandem first-come-first-serve queues “Classical” approach: lookahead? LP 1 LP 2 arrival event departure event arrival event T T+Q T+Q+S begin service Optimized to exploit lookahead LP 1 LP 2 arrival event arrival event T T+Q T+Q+S Maintain variable indicating departure time of previous job

Efficiency of Queueing Network Simulation Parallel Simulation of a Central Server Queueing Network Deadlock Detection and Recovery Algorithm (5 processors) mergefork

Speedup of Queueing Network Simulation Deadlock Detection and Recovery Algorithm (5 processors) Exploiting lookahead is essential to obtain good performance

Synchronous Execution Basic idea: each process cycles through the following steps: Determine the events that are safe to process Process events, exchange messages Global synchronization (barrier) Messages generated in one cycle are not eligible for processing until the next cycle Issues Barrier mechanism, transient messages Determining safe events

Barrier Synchronization Barrier Synchronization: when a process invokes the barrier primitive, it will block until all other processors have also invoked the barrier primitive. When the last process invokes the barrier, all processes can execute forward - barrier - wait - barrier - process 1process 2process 3process 4 wallclock time

Barrier Implementation Centralized Message-Passing Approach Central controller used to implement barrier 2 step process –Determine when barrier reached –Broadcast message to release processes from the barrier Barrier primitive for non-controller processes: –Send a message to central controller –Wait for a reply Barrier primitive for controller process –Receive barrier messages from other processes –When a message is received from each process, broadcast message to release barrier Performance –Controller must send and receive N-1 messages –Potential bottleneck

Broadcast Barrier 1 step approach Each process broadcasts message when it reaches barrier Wait until a message is received from each other process N (N-1) messages

Tree Barrier Organize processes into a tree A process sends a message to its parent process when –The process has reached the barrier point, and –A message has been received from each of its children processes Root detects completion of barrier, broadcast message to release processes (e.g., send messages down tree) 2 log N time if all processes reach barrier at same time

Butterfly Barrier N processes (here, assume N is a power of 2) Sequence of log 2 N pairwise barriers (let k = log 2 N) Pairwise barrier: –Send message to partner process –Wait until message is received from that process Process p: b k b k-1 … b 1 = binary representation of p Step i: perform barrier with process b k … b i ’ … b 1 (complement ith bit of the binary representation) Example: Process 3 (011) –Step 1: pairwise barrier with process 2 (010) –Step 2: pairwise barrier with process 1 (001) –Step 3: pairwise barrier with process 7 (111)

Butterfly Barrier Example step 1 step 2 step 3 Wallclock time ,12,34,56, step 1 step 2 step 3 The communication pattern forms a tree from the perspective of any process

Butterfly: Superimpose Trees After log 2 N steps each process is notified that the barrier operation has completed

Summary Deadlock detection and recovery algorithm –Performance critically dependent on lookahead Barrier mechanisms –Simple barriers using broadcast or central controller OK for small number of processors –Tree or butterfly give more scalable performance