Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France

Slides:



Advertisements
Similar presentations
CSE 413: Computer Networks
Advertisements

Parallel Discrete Event Simulation Richard Fujimoto Communications of the ACM, Oct
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Why to learn OSI reference Model? The answer is too simple that It tells us that how communication takes place between computers on internet but how??
Parallel Simulation. Past, Present and Future C.D. Pham Laboratoire RESAM Universit ₫ Claude Bernard Lyon 1
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Parallel and Distributed Simulation Time Warp: Basic Algorithm.
Parallel and Distributed Simulation Lookahead Deadlock Detection & Recovery.
Lookahead. Outline Null message algorithm: The Time Creep Problem Lookahead –What is it and why is it important? –Writing simulations to maximize lookahead.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Types of Parallel Computers
1 A Practical Efficiency Criterion For The Null Message Algorithm András Varga 1, Y. Ahmet Şekerciuğlu 2, Gregory K. Egan 2 1 Omnest Global, Inc. 2 CTIE,
PADS Conservative Simulation using Distributed-Shared Memory Teo, Y. M., Ng, Y. K. and Onggo, B. S. S. Department of Computer Science National University.
Module 3.4: Switching Circuit Switching Packet Switching K. Salah.
High Performance Router Architectures for Network- based Computing By Dr. Timothy Mark Pinkston University of South California Computer Engineering Division.
Semester Copyright USM EEE442 Computer Networks The Data Link / Network Layer Functions: Switching En. Mohd Nazri Mahmud MPhil (Cambridge, UK)
A loss detection Service for Active Reliable Multicast Protocols Moufida MAIMOUR & C. D. PHAM INRIA-RESO RESAM UCB-Lyon – ENS Lyon INC’02, Plymouth Tuesday,
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
1 IMPROVING RESPONSIVENESS BY LOCALITY IN DISTRIBUTED VIRTUAL ENVIRONMENTS Luca Genovali, Laura Ricci, Fabrizio Baiardi Lucca Institute for Advanced Studies.
What Mum Never Told Me about Parallel Simulation K arim Djemame Informatics Research Lab. & School of Computing University of Leeds.
RESAM Laboratory Univ. Lyon 1, France lead by Prof. B. Tourancheau Laurent Lefèvre CongDuc Pham Pascale Primet PhD. student Patrick Geoffray Roland Westrelin.
Building Parallel Time-Constrained HLA Federates: A Case Study with the Parsec Parallel Simulation Language Winter Simulation Conference (WSC’98), Washington.
Parallel and Distributed Simulation Introduction and Motivation By Syed S. Rizvi.
BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
OMNET++. Outline Introduction Overview The NED Language Simple Modules.
Hardware Supported Time Synchronization in Multi-Core Architectures 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
Chapter 2 – X.25, Frame Relay & ATM. Switched Network Stations are not connected together necessarily by a single link Stations are typically far apart.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
Data and Computer Communications Chapter 10 – Circuit Switching and Packet Switching (Wide Area Networks)
Data and Computer Communications Circuit Switching and Packet Switching.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Salim Hariri HPDC Laboratory Enhanced General Switch Management Protocol Salim Hariri Department of Electrical and Computer.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Parallel and Distributed Simulation Memory Management & Other Optimistic Protocols.
Computer Networks with Internet Technology William Stallings
Parallel and Distributed Simulation Introduction and Motivation.
Parallel and Distributed Simulation Introduction and Motivation.
CSCI 465 D ata Communications and Networks Lecture 14 Martin van Bommel CSCI 465 Data Communications & Networks 1.
Institute of Technology Sligo - Dept of Computing Sem 2 Chapter 12 Routing Protocols.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Workshop BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 13. Review Shared Data Software Architectures – Black board Style architecture.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Clock Synchronization (Time Management) Deadlock Avoidance Using Null Messages.
Parallel and Distributed Simulation Deadlock Detection & Recovery: Performance Barrier Mechanisms.
Parallel and Distributed Simulation Deadlock Detection & Recovery.
Sem 2 v2 Chapter 12: Routing. Routers can be configured to use one or more IP routing protocols. Two of these IP routing protocols are RIP and IGRP. After.
PDES Introduction The Time Warp Mechanism
Parallel and Distributed Simulation
Parallel and Distributed Simulation Techniques
Parallel and Distributed Simulation
Parallel Exact Stochastic Simulation in Biochemical Systems
Cluster Computers.
Introduction and Overview
Presentation transcript:

Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France

Outline Backgrounds –Discrete Event Simulation (DES) –Parallel DES and the synchronization problems The CSAM Tool –Architecture of the simulator kernel –The communication network model Results –On mono-processor cluster –On multi-processor cluster

Simulation To simulate is to reproduce the behavior of a physical system with a model Practically, computers are used to numerically simulate a logical model Simulations are used for performance evaluation and prediction of complex systems –fluids dynamic, chemistry reactions (continous) –communication network models: routing, congestion avoidance, mobile… (discrete) Simulation is more flexible than analytical methods

Discrete Event Simulation (DES) assumption that a system changes its state at discrete points in simulation time a1a2a3a4d1d2d3 S1S3 S2 0 tt 2t2t3t3t4t4t5t5t6t6t time-step

DES concepts fundamental concepts: –system state (variables) –state transitions (events) –simulation time: totally ordered set of values representing time in the system being modeled the system state can only be modified upon reception of an event modeling can be –event-oriented –process-oriented

Life cycle of a DES a DES system can be viewed as a collec- tion of simulated objects and a sequence of event computations each event computation contains a time stamp indicating when that event occurs in the physical system each event computation may: –modify state variables –schedule new events into the simulated future events are stored in a local event list –events are processed in time stamped order –usually, no more event = termination

A simple DES model local event list A B 5 link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 B receive P1 from A e4 B sends ACK(P1) to A e5 e8 B receive P2 from A A sends P1 to B e2 A receive packet P1 e1 A sends P2 to B e6 A receive packet P2 e3 A receive packet P3 e9 e7 A receive ACK(P1)

Why it works? events are processed in time stamp order an event at time t can only generate future events with timestamp greater or equal to t (no event in the past) generated events are put and sorted in the event list, according to their timestamp –the event with the smallest timestamp is always processed first, –causality constraints are implicitly maintained.

Why change? It ’s so simple! models becomes larger and larger the simulation time is overwhelming or the simulation is just untractable example: –parallel programs with millions of lines of codes, –mobile networks with millions of mobile hosts, –ATM networks with hundreds of complex switches, –multicast model with thousands of sources, –ever-growing Internet, –and much more...

Some figures to convince... ATM network models –Simulation at the cell-level, –200 switches –1000 traffic sources, 50Mbits/s –155Mbits/s links, –1 simulation event per cell arrival. –simulation time increases as link speed increases, –usually more than 1 event per cell arrival, –how scalable is traditional simulation? More than 26 billions events to simulate 1 second! 30 hours if 1 event is processed in 1us

Parallel simulation - principles execution of a discrete event simulation on a parallel or distributed system with several physical processors. the simulation model is decomposed into several sub-models that can be executed in parallel –spacial partitioning, –temporel partitioning, radically different from simple simulation replications.

Parallel simulation - pros & cons pros –reduction of the simulation time, –increase of the model size, cons –causality constraints are difficult to maintain, –need of special mechanisms to synchronize the different processors, –increase both the model and the simulation kernel complexity. challenges –ease of use, transparency.

Parallel simulation - example logical process (LP) packetheventt parallel

A simple PDES model local event list A B 5 link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 B sends ACK(P1) e5 A sends P1 to B e2 e6 A sends P2 to B A rec. packet P1 e1 A rec. packet P2 e3 B rec. P1 from A e4 B rec. P2 from A e8 e7 A rec. ACK(P1) t e9 A rec. packet P3 causality error, violation

Synchronization problems fundamental concepts –each Logical Process (LP) can be at a different simulation time –local causality constraints: events in each LP must be executed in time stamp order synchronization algorithms –Conservative: avoids local causality violations by waiting until it ’s safe –Optimistic: allows local causality violations but provisions are done to recover from them at runtime

CSAM (Pham, UCBL) CSAM: Conservative Simulator for ATM network Model Simulation at the cell-level Conservative and/or sequential C++ programming-style, predefined generic model of sources, switches, links… New models can be easily created by deriving from base classes Configuration file that describes the topology

CSAM - Kernel characteristics Exploits the lookahead of communication links: transparent for the user Virtual Input Channels –reduces overhead for event manipulation, –reduces overhead for null-messages handling. Cyclic event execution Message aggregation –static aggregation size, –asymmetric aggregation size on CLUMPS, –sender-initiated, –receiver-initiated.

CSAM - Life cycle

Test case: 78-switch ATM network Distance-Vector Routing with dynamic link cost functions Connection setup, admission control protocols

Why is it difficult? Very small granularity: 1 message represents 1 cell tranfer –high level of message synchronisation –very small computation/communication ratio Load imbalance between links –large number of control messages –partitioning and load balancing are difficult

CSAM - Some results... Routing protocol’s reconfiguration time

CSAM - Some results...

Parallel Simulation on High Performance Clusters Myrinet-based cluster of 12 Pentium Pro at 200MHz, 64 MBytes, Linux Myrinet-based cluster of 4 dual Pentium Pro 450MHz, 128 Mbytes, Linux Myrinet board with LANai 4.1, 256KB BIP, BIP-SMP, MPI/BIP, MPI/BIP-SMP communication libraries

Speedup on a myrinet cluster Pentium Pro 200MHz More than 53 millions events to simulate 0.31s

Speedup with CLUMPS Dual Pentium Pro 450MHz

Increasing the model size (CLUMPS) Dual Pentium Pro 450MHz, 4x2 int

Speedup on SGI/Cray Origin 2000

Conclusions Parallel Simulation is very sensitive to latency High Performance Clusters is a good alternative to traditionnal massively parallel computer CLUMPS architectures are very attractive as the price on the communication card can be cut in half