Download presentation
Presentation is loading. Please wait.
1
Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France cpham@resam.univ-lyon1.fr
2
Outline Backgrounds –Discrete Event Simulation (DES) –Parallel DES and the synchronization problems The CSAM Tool –Architecture of the simulator kernel –The communication network model Results –On mono-processor cluster –On multi-processor cluster
3
Simulation To simulate is to reproduce the behavior of a physical system with a model Practically, computers are used to numerically simulate a logical model Simulations are used for performance evaluation and prediction of complex systems –fluids dynamic, chemistry reactions (continous) –communication network models: routing, congestion avoidance, mobile… (discrete) Simulation is more flexible than analytical methods
4
Discrete Event Simulation (DES) assumption that a system changes its state at discrete points in simulation time a1a2a3a4d1d2d3 S1S3 S2 0 tt 2t2t3t3t4t4t5t5t6t6t time-step
5
DES concepts fundamental concepts: –system state (variables) –state transitions (events) –simulation time: totally ordered set of values representing time in the system being modeled the system state can only be modified upon reception of an event modeling can be –event-oriented –process-oriented
6
Life cycle of a DES a DES system can be viewed as a collec- tion of simulated objects and a sequence of event computations each event computation contains a time stamp indicating when that event occurs in the physical system each event computation may: –modify state variables –schedule new events into the simulated future events are stored in a local event list –events are processed in time stamped order –usually, no more event = termination
7
A simple DES model local event list A B 5 link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 B receive P1 from A e4 B sends ACK(P1) to A e5 e8 B receive P2 from A A sends P1 to B e2 A receive packet P1 e1 A sends P2 to B e6 A receive packet P2 e3 A receive packet P3 e9 e7 A receive ACK(P1)
8
Why it works? events are processed in time stamp order an event at time t can only generate future events with timestamp greater or equal to t (no event in the past) generated events are put and sorted in the event list, according to their timestamp –the event with the smallest timestamp is always processed first, –causality constraints are implicitly maintained.
9
Why change? It ’s so simple! models becomes larger and larger the simulation time is overwhelming or the simulation is just untractable example: –parallel programs with millions of lines of codes, –mobile networks with millions of mobile hosts, –ATM networks with hundreds of complex switches, –multicast model with thousands of sources, –ever-growing Internet, –and much more...
10
Some figures to convince... ATM network models –Simulation at the cell-level, –200 switches –1000 traffic sources, 50Mbits/s –155Mbits/s links, –1 simulation event per cell arrival. –simulation time increases as link speed increases, –usually more than 1 event per cell arrival, –how scalable is traditional simulation? More than 26 billions events to simulate 1 second! 30 hours if 1 event is processed in 1us
11
Parallel simulation - principles execution of a discrete event simulation on a parallel or distributed system with several physical processors. the simulation model is decomposed into several sub-models that can be executed in parallel –spacial partitioning, –temporel partitioning, radically different from simple simulation replications.
12
Parallel simulation - pros & cons pros –reduction of the simulation time, –increase of the model size, cons –causality constraints are difficult to maintain, –need of special mechanisms to synchronize the different processors, –increase both the model and the simulation kernel complexity. challenges –ease of use, transparency.
13
Parallel simulation - example logical process (LP) packetheventt parallel
14
A simple PDES model local event list A B 5 link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 B sends ACK(P1) e5 A sends P1 to B e2 e6 A sends P2 to B A rec. packet P1 e1 A rec. packet P2 e3 B rec. P1 from A e4 B rec. P2 from A e8 e7 A rec. ACK(P1) t e9 A rec. packet P3 causality error, violation
15
Synchronization problems fundamental concepts –each Logical Process (LP) can be at a different simulation time –local causality constraints: events in each LP must be executed in time stamp order synchronization algorithms –Conservative: avoids local causality violations by waiting until it ’s safe –Optimistic: allows local causality violations but provisions are done to recover from them at runtime
16
CSAM (Pham, UCBL) CSAM: Conservative Simulator for ATM network Model Simulation at the cell-level Conservative and/or sequential C++ programming-style, predefined generic model of sources, switches, links… New models can be easily created by deriving from base classes Configuration file that describes the topology
17
CSAM - Kernel characteristics Exploits the lookahead of communication links: transparent for the user Virtual Input Channels –reduces overhead for event manipulation, –reduces overhead for null-messages handling. Cyclic event execution Message aggregation –static aggregation size, –asymmetric aggregation size on CLUMPS, –sender-initiated, –receiver-initiated.
18
CSAM - Life cycle
19
Test case: 78-switch ATM network Distance-Vector Routing with dynamic link cost functions Connection setup, admission control protocols
20
Why is it difficult? Very small granularity: 1 message represents 1 cell tranfer –high level of message synchronisation –very small computation/communication ratio Load imbalance between links –large number of control messages –partitioning and load balancing are difficult
21
CSAM - Some results... Routing protocol’s reconfiguration time
22
CSAM - Some results...
23
Parallel Simulation on High Performance Clusters Myrinet-based cluster of 12 Pentium Pro at 200MHz, 64 MBytes, Linux Myrinet-based cluster of 4 dual Pentium Pro 450MHz, 128 Mbytes, Linux Myrinet board with LANai 4.1, 256KB BIP, BIP-SMP, MPI/BIP, MPI/BIP-SMP communication libraries
24
Speedup on a myrinet cluster Pentium Pro 200MHz More than 53 millions events to simulate 0.31s
25
Speedup with CLUMPS Dual Pentium Pro 450MHz
26
Increasing the model size (CLUMPS) Dual Pentium Pro 450MHz, 4x2 int
27
Speedup on SGI/Cray Origin 2000
28
Conclusions Parallel Simulation is very sensitive to latency High Performance Clusters is a good alternative to traditionnal massively parallel computer CLUMPS architectures are very attractive as the price on the communication card can be cut in half
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.