Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France

Similar presentations


Presentation on theme: "Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France"— Presentation transcript:

1 Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France cpham@resam.univ-lyon1.fr

2 Outline Backgrounds –Discrete Event Simulation (DES) –Parallel DES and the synchronization problems The CSAM Tool –Architecture of the simulator kernel –The communication network model Results –On mono-processor cluster –On multi-processor cluster

3 Simulation To simulate is to reproduce the behavior of a physical system with a model Practically, computers are used to numerically simulate a logical model Simulations are used for performance evaluation and prediction of complex systems –fluids dynamic, chemistry reactions (continous) –communication network models: routing, congestion avoidance, mobile… (discrete) Simulation is more flexible than analytical methods

4 Discrete Event Simulation (DES) assumption that a system changes its state at discrete points in simulation time a1a2a3a4d1d2d3 S1S3 S2 0 tt 2t2t3t3t4t4t5t5t6t6t time-step

5 DES concepts fundamental concepts: –system state (variables) –state transitions (events) –simulation time: totally ordered set of values representing time in the system being modeled the system state can only be modified upon reception of an event modeling can be –event-oriented –process-oriented

6 Life cycle of a DES a DES system can be viewed as a collec- tion of simulated objects and a sequence of event computations each event computation contains a time stamp indicating when that event occurs in the physical system each event computation may: –modify state variables –schedule new events into the simulated future events are stored in a local event list –events are processed in time stamped order –usually, no more event = termination

7 A simple DES model local event list A B 5 link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 B receive P1 from A e4 B sends ACK(P1) to A e5 e8 B receive P2 from A A sends P1 to B e2 A receive packet P1 e1 A sends P2 to B e6 A receive packet P2 e3 A receive packet P3 e9 e7 A receive ACK(P1)

8 Why it works? events are processed in time stamp order an event at time t can only generate future events with timestamp greater or equal to t (no event in the past) generated events are put and sorted in the event list, according to their timestamp –the event with the smallest timestamp is always processed first, –causality constraints are implicitly maintained.

9 Why change? It ’s so simple! models becomes larger and larger the simulation time is overwhelming or the simulation is just untractable example: –parallel programs with millions of lines of codes, –mobile networks with millions of mobile hosts, –ATM networks with hundreds of complex switches, –multicast model with thousands of sources, –ever-growing Internet, –and much more...

10 Some figures to convince... ATM network models –Simulation at the cell-level, –200 switches –1000 traffic sources, 50Mbits/s –155Mbits/s links, –1 simulation event per cell arrival. –simulation time increases as link speed increases, –usually more than 1 event per cell arrival, –how scalable is traditional simulation? More than 26 billions events to simulate 1 second! 30 hours if 1 event is processed in 1us

11 Parallel simulation - principles execution of a discrete event simulation on a parallel or distributed system with several physical processors. the simulation model is decomposed into several sub-models that can be executed in parallel –spacial partitioning, –temporel partitioning, radically different from simple simulation replications.

12 Parallel simulation - pros & cons pros –reduction of the simulation time, –increase of the model size, cons –causality constraints are difficult to maintain, –need of special mechanisms to synchronize the different processors, –increase both the model and the simulation kernel complexity. challenges –ease of use, transparency.

13 Parallel simulation - example logical process (LP) packetheventt parallel

14 A simple PDES model local event list A B 5 link model delay = 5 send processing time = 5 receive processing time = 1 packet arrival P1 at 5, P2 at 12, P3 at 22 B sends ACK(P1) e5 A sends P1 to B e2 e6 A sends P2 to B A rec. packet P1 e1 A rec. packet P2 e3 B rec. P1 from A e4 B rec. P2 from A e8 e7 A rec. ACK(P1) t e9 A rec. packet P3 causality error, violation

15 Synchronization problems fundamental concepts –each Logical Process (LP) can be at a different simulation time –local causality constraints: events in each LP must be executed in time stamp order synchronization algorithms –Conservative: avoids local causality violations by waiting until it ’s safe –Optimistic: allows local causality violations but provisions are done to recover from them at runtime

16 CSAM (Pham, UCBL) CSAM: Conservative Simulator for ATM network Model Simulation at the cell-level Conservative and/or sequential C++ programming-style, predefined generic model of sources, switches, links… New models can be easily created by deriving from base classes Configuration file that describes the topology

17 CSAM - Kernel characteristics Exploits the lookahead of communication links: transparent for the user Virtual Input Channels –reduces overhead for event manipulation, –reduces overhead for null-messages handling. Cyclic event execution Message aggregation –static aggregation size, –asymmetric aggregation size on CLUMPS, –sender-initiated, –receiver-initiated.

18 CSAM - Life cycle

19 Test case: 78-switch ATM network Distance-Vector Routing with dynamic link cost functions Connection setup, admission control protocols

20 Why is it difficult? Very small granularity: 1 message represents 1 cell tranfer –high level of message synchronisation –very small computation/communication ratio Load imbalance between links –large number of control messages –partitioning and load balancing are difficult

21 CSAM - Some results... Routing protocol’s reconfiguration time

22 CSAM - Some results...

23 Parallel Simulation on High Performance Clusters Myrinet-based cluster of 12 Pentium Pro at 200MHz, 64 MBytes, Linux Myrinet-based cluster of 4 dual Pentium Pro 450MHz, 128 Mbytes, Linux Myrinet board with LANai 4.1, 256KB BIP, BIP-SMP, MPI/BIP, MPI/BIP-SMP communication libraries

24 Speedup on a myrinet cluster Pentium Pro 200MHz More than 53 millions events to simulate 0.31s

25 Speedup with CLUMPS Dual Pentium Pro 450MHz

26 Increasing the model size (CLUMPS) Dual Pentium Pro 450MHz, 4x2 int

27 Speedup on SGI/Cray Origin 2000

28 Conclusions Parallel Simulation is very sensitive to latency High Performance Clusters is a good alternative to traditionnal massively parallel computer CLUMPS architectures are very attractive as the price on the communication card can be cut in half


Download ppt "Parallel Simulations on High-Performance Clusters C.D. Pham RESAM laboratory Univ. Lyon 1, France"

Similar presentations


Ads by Google