Download presentation
Presentation is loading. Please wait.
1
BigSim: Simulating PetaFLOPS Supercomputers
Gengbin Zheng Parallel Programming Laboratory University of Illinois at Urbana-Champaign
2
5th Annual Workshop on Charm++ and its Applications
Introduction Petascale machines are being built It is hard to understand parallel application without actually running it Tools that allow one to predict the performance of applications before machines are built, also help debugging and tuning Allows easier "offline" experimentation Provides a feedback for machine designers 4/10/2019 5th Annual Workshop on Charm++ and its Applications
3
5th Annual Workshop on Charm++ and its Applications
Overview Prediction accuracy Sequential performance Parallel performance (message passing) Flexibility - different levels of fidelity/tradeoffs Challenges for usability Memory constraints Time cost How to analyze result - scalable performance analysis tools Portability Scalability BigSim system 4/10/2019 5th Annual Workshop on Charm++ and its Applications
4
Summary of Our Approach
Parallel Discrete Event Simulation (PDES) the operation of a system is represented as a chronological sequence of events Event-driven simulation method Two phase simulation BigSim Emulator, capable of running application Charm++, Adaptive MPI / MPI Predict sequential execution blocks Generate trace logs Parallel Discrete Event Simulator (POSE-based) Predict message passing performance Implemented on Charm++ 4/10/2019 5th Annual Workshop on Charm++ and its Applications
5
BigSim System Architecture
Performance visualization (Projections) Offline PDES Network Simulator BigSim Simulator Simulation output trace logs Charm++ Runtime Load Balancing Module Performance counters Instruction Sim (RSim, IBM, ..) Simple Network Model BigSim Emulator Charm++ and MPI applications 4/10/2019 5th Annual Workshop on Charm++ and its Applications 5
6
BigSim Emulator – the first step
Actual execution of real applications on much smaller machines Model each target processor as virtual processor Using light-weight (Converse) threads simulating each target processor Charm++/AMPI applications without change No global variables/swap-globals/copy-globals 4/10/2019 5th Annual Workshop on Charm++ and its Applications
7
Emulation on a Parallel Machine
Target Nodes Simulating (Host) Processor Hardware thread 4/10/2019 5th Annual Workshop on Charm++ and its Applications
8
Predicting Sequential Performance
Different level of fidelity: Direct mapping Performance counter Instruction-level simulator Tradeoff between time cost and accuracy Exploit inherent determinacy of Charm++ application Out of order messages do not change the computation behavior (structured dagger) Different approaches 4/10/2019 5th Annual Workshop on Charm++ and its Applications
9
5th Annual Workshop on Charm++ and its Applications
Trace Logs Event logs are generated for each target processor Predicted time on execution blocks with timestamps Event dependencies Message sending events [22] name:msgep (srcnode:0 msgID:21) ep:1 [[ recvtime: startTime: endTime: ]] backward: forward: [23] [23] name:Chunk_atomic_0 (srcnode:-1 msgID:-1) ep:0 [[ recvtime: startTime: endTime: ]] msgID:3 sent: recvtime: dstPe:7 size:208 msgID:4 sent: recvtime: dstPe:1 size:208 backward: [22] forward: [24] [24] name:Chunk_overlap_0 (srcnode:-1 msgID:-1) ep:0 [[ recvtime: startTime: endTime: ]] backward: [0x80a7af0 23] forward: [25] [28] 4/10/2019 5th Annual Workshop on Charm++ and its Applications
10
Predicting message passing performance – second step
Parallel Discrete Event Simulation Needed for correcting causality errors due to out-of-order messages Timestamps are re-adjusted without rerunning the actual application Simulate network behaviours: packetization, routing, contention, etc simple latency based network model, and contention-based network model 4/10/2019 5th Annual Workshop on Charm++ and its Applications
11
5th Annual Workshop on Charm++ and its Applications
PDES and Charm++ PDES is a naturally message-driven activity Large number of entities naturally maps to virtual processors Fine grained PDES Migratable PDES entities and load balancing 4/10/2019 5th Annual Workshop on Charm++ and its Applications
12
Network Simulator Design
4/10/2019 5th Annual Workshop on Charm++ and its Applications 12
13
Network Simulator: Data Flow
4/10/2019 5th Annual Workshop on Charm++ and its Applications 13
14
Network Communication Pattern Analysis
NAMD with apoa1 15 timestep 4/10/2019 5th Annual Workshop on Charm++ and its Applications
15
Network Communication Pattern Analysis
Data transferred (KB) in a single time step 4/10/2019 5th Annual Workshop on Charm++ and its Applications
16
NAMD network simulation (scalability)
Simulating NAMD on 2048 processors with 3D Torus topology Tests ran on Turing Apple cluster with Myrinet Terry L. Wilmarth, Gengbin Zheng, Eric J. Bohm, Yogesh Mehta, Praveen Jagadishprasad, Laxmikant V. Kalé, ``Performance Prediction using Simulation of Large-scale Interconnection Networks in POSE'', PADS 2005 4/10/2019 5th Annual Workshop on Charm++ and its Applications
17
LeanMD Performance Analysis
Benchmark 3-away ER-GRE 36573 atoms 1.6 million objects 8 step simulation 64k BG processors Running on PSC Lemieux 4/10/2019 5th Annual Workshop on Charm++ and its Applications
18
Integration with Instruction level simulators
Instruction level simulators typically are Standalone applications Sequential execution Very slow An interpolation-based scheme Use result of a smaller scale instruction level simulation to interpolate for large dataset do a least-squares fit to determine the coefficients of an approximation polynomial function 4/10/2019 5th Annual Workshop on Charm++ and its Applications
19
Case study: BigSim / Mambo
void func( ) { StartBigSim( ) … EndBigSim( ) } Mambo Prediction for Target System Cycle-accurate prediction of sequential blocks on POWER7 processor BigSim Parallel Emulation Interpolation BigSim Parallel Simulation Results are not available + Replace sequential timing Trace files Parameter files for sequential blocks Adjusted trace files 4/10/2019 5th Annual Workshop on Charm++ and its Applications
20
5th Annual Workshop on Charm++ and its Applications
Future Work Use performance counter to predict sequential timing Out-of-core execution for memory bound applications 4/10/2019 5th Annual Workshop on Charm++ and its Applications
21
Out-of-core Emulation
Motivation Physical memory is shared VM system would not handle well Message driven execution Peek msg queue => what execute next? (prefetch) 4/10/2019 5th Annual Workshop on Charm++ and its Applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.