PADS 20021 Conservative Simulation using Distributed-Shared Memory Teo, Y. M., Ng, Y. K. and Onggo, B. S. S. Department of Computer Science National University.

Slides:



Advertisements
Similar presentations
The Effect of Network Total Order, Broadcast, and Remote-Write on Network- Based Shared Memory Computing Robert Stets, Sandhya Dwarkadas, Leonidas Kontothanassis,
Advertisements

Agent agent Outline of Presentation Introduction: Inter-Agent Message Passing ARP: Design and Analysis Generalization: A Generic Framework Conclusion.
Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
SE-292 High Performance Computing
Grid Communication Simulator Boro Jakimovski Marjan Gusev Institute of Informatics Faculty of Natural Sciences and Mathematics University of Sts. Cyril.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
AMLAPI: Active Messages over Low-level Application Programming Interface Simon Yau, Tyson Condie,
Multiple Processor Systems
An Associative Broadcast Based Coordination Model for Distributed Processes James C. Browne Kevin Kane Hongxia Tian Department of Computer Sciences The.
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
1 Complexity of Network Synchronization Raeda Naamnieh.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
ECE669 L20: Evaluation and Message Passing April 13, 2004 ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Reliable Communication for Highly Mobile Agents ECE 7995: Term Paper.
Yousuf Surmust Instructor: Marius Soneru Course: CS550 Fall 2001
Performance Evaluation of Load Sharing Policies on a Beowulf Cluster James Nichols Marc Lemaire Advisor: Mark Claypool.
Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France
Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.
PRASHANTHI NARAYAN NETTEM.
Chapter 4.1 Interprocess Communication And Coordination By Shruti Poundarik.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
DNA REASSEMBLY Using Javaspace Sung-Ho Maeung Laura Neureuter.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.
Distributed Systems Principles and Paradigms Chapter 12 Distributed Coordination-Based Systems 01 Introduction 02 Communication 03 Processes 04 Naming.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
P-GAS: Parallelizing a Many-Core Processor Simulator Using PDES Huiwei Lv, Yuan Cheng, Lu Bai, Mingyu Chen, Dongrui Fan, Ninghui Sun Institute of Computing.
Shuman Guo CSc 8320 Advanced Operating Systems
The influence of system calls and interrupts on the performances of a PC cluster using a Remote DMA communication primitive Olivier Glück Jean-Luc Lamotte.
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
DISTRIBUTED COMPUTING
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Timestamp snooping: an approach for extending SMPs Milo M. K. Martin et al. Summary by Yitao Duan 3/22/2002.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Parallel and Distributed Simulation Deadlock Detection & Recovery.
© Oxford University Press 2011 DISTRIBUTED COMPUTING Sunita Mahajan Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai.
Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.
PDES Introduction The Time Warp Mechanism
Parallel Programming By J. H. Wang May 2, 2017.
Parallel and Distributed Simulation Techniques
12.4 Memory Organization in Multiprocessor Systems
Department of Computer Science University of California,Santa Barbara
Implementing an OpenFlow Switch on the NetFPGA platform
Parallel and Distributed Simulation
Parallel Exact Stochastic Simulation in Biochemical Systems
Emulating Massively Parallel (PetaFLOPS) Machines
Cluster Computers.
Presentation transcript:

PADS Conservative Simulation using Distributed-Shared Memory Teo, Y. M., Ng, Y. K. and Onggo, B. S. S. Department of Computer Science National University of Singapore

PADS Improve performance of SPaDES/Java by reducing overhead:  Synchronization of events  Distributed communications Study the memory requirements in parallel simulations. Objectives

PADS Presentation Outline Parallel Simulation Null Message Protocol Performance Improvement Memory Requirement Conclusion

PADS Parallel Simulation Sequential simulations execute on a single thread in one processor. Ideally, parallelizing the simulation should enhance its real-time performance since the workload is distributed. The need to maintain causality throughout a parallel simulation => Event synchronization protocols. => Adds to inter-process communications. => New bottleneck!

PADS Null Message Protocol First designed by Chandy and Misra (1979). Prevents deadlock situations between LPs. LP i sends null messages to each of its neighbours at the end of every simulation pass, with timestamp = local virtual time of LP i. Timestamp on null message, T, indicates that the source LP will not send any messages to other LPs before T.

PADS LP Null Message Protocol Clock = 4 LP FEL 4 7 LP

PADS Chandy-Misra-Byrant’s (CMB) protocol performs poorly due to high null message overhead. It transmits null msgs on every simulation pass NMR ~> 1 for nearly all [0, T). Optimizations incorporated:  Carrier-null message scheme  Flushing mechanism  Demand-driven null message algorithm  Remote communications using JavaSpace Performance Improvement

PADS Carrier-Null Message Algorithm Problem with cyclic topologies Use carrier-null message algorithm (Wood, Turner, 1996) Avoids transmissions of redundant null messages in such cycles.

PADS Output Channel (A) REQ 30 Request Channel (B) Logical Process (A) Logical Process (B) FEL Flusher Performance Improvement Demand driven null messaging + flushing

PADS Experiments conducted using PC cluster of 8 nodes running RedHat Linux version 7.0. Each node is a Pentium II 400 MHz processor with 256 MB of memory connected through 100 Mbps switch. 2 benchmark programs  PHOLD system  Linear Pipeline Performance Evaluation

PADS PHOLD (3x3, m) Node Closed system

PADS Linear Pipeline (4,  ) Open system Service Center Service Center Service Center Service Center Customer population Depart

PADS PHOLD (n x n, m) CMB + Carrier-Null + Flushing + Demand-driven null msging

PADS Linear Pipeline (n,  ) CMB + Carrier-Null + Flushing + Demand-driven null msging

PADS %tage Reduction in NMR:  PHOLD system CMB  Carrier-null  30%  Flushing incorporated  42%  Demand-driven null msg  55%  Linear Pipeline CMB  Carrier-null  0%  Flushing incorporated  23%  Demand-driven null msg  35% Performance Summary

PADS Distributed Communications Originally, SPaDES/Java uses the RMI library to transmit messages between remote LPs. But the serialization phase presents a bottleneck. Previous performance optimization effort: message deflation. Only solution to overcome remote communications overhead => send less messages. How? Target at null messages.

PADS JavaSpaces A special Java-Jini service developed by Sun Microsystems, Inc., built on top of Java’s RMI, mimicking a tuple space. Abstract platform for developing complex distributed applications. Distributed data persistence. Holds objects, known as entries, with variable attribute types. Key concept: matching of attribute types/values.

PADS JavaSpaces Client write Notifier notify read take  4 generic operations: write, read, take and notify.

PADS Replace the RMI communication module in SPaDES/Java with one running on a single JavaSpace. Use a FrontEndSpace: permits crash recovery of entries in the space. Transmission of processes and null messages between remote hosts go through theFrontEndSpace as space entries. Distributed Communications

PADS LP1LP2 Space Communications : Processes Time = 0 Time = t > 0 SProcess receiver = 1 SProcess sender = 2 receiver = 1 …….. SProcess receiver = 2

PADS LP1LP2 Space Communications : Null Messages NullMsg sender = 2 …….. Req sender = 2 LP3 LP4 Req sender = 2

PADS Performance Evaluation – PHOLD(n x n, m) RMI JavaSpace (4 procs) JavaSpace (8 procs)

PADS Overall Performance Evaluation – PHOLD(n x n, m) CMB + Carrier-Null + Flushing + Demand-driven null msging JavaSpace (4 procs) JavaSpace (8 procs)

PADS %tage Reduction in NMR: CMB  Carrier-null  30%  Flushing incorporated  42%  Demand-driven null msg  55%  JavaSpace (4 processors)  63%  JavaSpace (8 processors)  74% Performance Summary

PADS M prob  n i=1 MaxQueueSize(LP i ) M ord  n i=1 MaxFELSize(LP i ) M sync  n i=1 MaxNullMsgBufferSize(LP i ) Memory Requirement

PADS Memory Requirement

PADS Achievements & Conclusion Enhanced the performance of SPaDES/Java through various synchronization protocols, achieving an excellent NMR of < 30%. Implemented a brand new discrete-event simulation library based on the concept of shared memory in a JavaSpace. Implemented a TSA into SPaDES/Java that can be used as a bench for memory usage studies in parallel simulations.

PADS Acknowledgments Port of Singapore Authority (PSA) Ministry of Education, Singapore Constructive feed-back from referees

PADS References SPaDES/Java homepage Current project webpage MSG homepage