Download presentation
Presentation is loading. Please wait.
Published byLaurence Montgomery Modified over 9 years ago
1
Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science Center for High Performance Simulations (CHiPS) North Carolina State University (* Joint Faculty with Oak Ridge National Laboratory)
2
Supercomputing 2005 2 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work
3
Supercomputing 2005 3 Cross-Platform Performance Prediction Users face wide selection of machines Need cross-platform performance prediction to Choose platform to use / purchase Estimate resource usage Estimate job wall time Machines and applications both grow larger and more complex Modeling- and simulation- based approaches harder and more expensive Performance data not reused in performance prediction
4
Supercomputing 2005 4 Observation-based Performance Prediction Observe cross-platform behavior Treating applications and platforms as black boxes Avoiding case-by-case model building Covering entire application Computation Communication I/O Convenient with third-party libraries Performance translation Observation: existence of “reference platform” Goal: Cross-platform Meta-predictor Approach: based on relative performance T = 20 hrs T = ? hrs
5
Supercomputing 2005 5 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work
6
Supercomputing 2005 6 Main Idea: Utilizing Partial Execution Observation: majority of scientific applications are iteration-based Highly repetitive behavior phases -> timesteps Execute small partial executions Low-cost “test drives” Simple APIs (indicate timesteps: k) Quit after k timesteps Full-1 Partial-1Partial-2 Relative performance = 0.6 Full-2 (predicted) reference system target system
7
Supercomputing 2005 7 Application Model Execution of parallel simulations modeled as regular expression I(C*[W])*F I: one-time initialization phase C: computation phase W: optional I/O phase F: one-time finalization phase Different phases likely have different cross-platform relative performance Major challenges Avoid impact of initially unstable performance Predict correct mixture of C and W phases
8
Supercomputing 2005 8 Partial Execution Terminate applications prematurely API init_timestep() Optional, useful with large setup phase begin_timestep() end_timestep(maxsteps) “begin” and “end” calls bracket C or CW phase Execution terminated after maxsteps timesteps Easy-to-use interface 2-3 lines of codes inserted into source codes
9
Supercomputing 2005 9 Base Prediction Model Given reference platform and target platform Perform 1 or more partial executions Compute average execution time of timestep on both platforms Compute relative performance Compute overall execution time estimate for target platform Prediction performance (predicted-to-actual ratio)
10
Supercomputing 2005 10 Refined Prediction Model Problem 1: initial performance fluctuations Variances due to cache warm-up, etc. May span dozens of timesteps Problem 2: periodic I/O phases I/O frequency often configurable and determined at run time Unified solution Monitor per-timestep performance variance at runtime Identify anomalies and repeated patterns Filter out early, unstable timestep measurements Consider only later results once performance stabilizes Combine early timestep overheads into initialization cost Computing sliding window averages of per-timestep overheads Use multiples of observed pattern length as window size
11
Supercomputing 2005 11 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work
12
Supercomputing 2005 12 Proof-of-concept experiments Questions: Is relative performance observed in a very short early period indicative of overall relative performance? Can we reuse partial execution data in predicting execution with different configurations? Experiment settings Large-scale codes: 2 ASCI Purple (sphot and sPPM) fusion code (Gyro) rocket simulation (GENx) Full runs take >5 hours 10 super computers: SDSC, NCSA, ORNL, LLNL, UIUC, NCSU, NERSC 7 architectures (SP3, SP4, Altix, Cray X1, 3 clusters: G5, Xeon, Itanium)
13
Supercomputing 2005 13 Base Model Accuracy (Sphot) High accuracy with very short partial execution
14
Supercomputing 2005 14 Refined Model (sPPM, Ram->Henry2) Issues: Ram: init variance Henry2: 1 in 10 steps I/O normalized Smarter algorithms Initialization filter Sliding window handle anomaly and periodic I/O
15
Supercomputing 2005 15 Application with Variable Problem Size GENx Rocket Simulation (CSAR, UIUC), Turing Frost Limited accuracy w/ variable timesteps
16
Supercomputing 2005 16 Reusing Partial Execution Data Avg. Error: 12.1% - 25.8% Avg. Error: 5.6% - 37.9% Scientists often repeat runs with different configurations Number of processors Input size and data content Computation tasks Results from Gyro fusion simulation on 5 platforms
17
Supercomputing 2005 17 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work
18
Supercomputing 2005 18 Conclusion Empirical performance prediction works! Real-world production codes Multiple parallel platforms Highly accurate predictions Limitations with Variable problem sizes Input-size/processor scaling Observation-based prediction Simple Portable Low cost (few timesteps) T = 20 hrs T = 2 hrs T = 10 hrs T = 1 hrs
19
Supercomputing 2005 19 Related Work Parallel program performance prediction Application-specific analytical models Compiler/instrumentation tools Simulation-based predictions Cross-platform performance studies Mostly examine multiple platforms individually Grid job schedulers Do not offer cross-platform performance translation
20
Supercomputing 2005 20 Ongoing and Future Work Evaluate with AMR applications Automated partial execution Automatic computation phase identification Binary rewriting to avoid source code modification Extend to non-dedicated systems For job schedulers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.