Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.

Similar presentations


Presentation on theme: "Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science."— Presentation transcript:

1 Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science Center for High Performance Simulations (CHiPS) North Carolina State University (* Joint Faculty with Oak Ridge National Laboratory)

2 Supercomputing 2005 2 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work

3 Supercomputing 2005 3 Cross-Platform Performance Prediction Users face wide selection of machines Need cross-platform performance prediction to  Choose platform to use / purchase  Estimate resource usage  Estimate job wall time Machines and applications both grow larger and more complex  Modeling- and simulation- based approaches harder and more expensive  Performance data not reused in performance prediction

4 Supercomputing 2005 4 Observation-based Performance Prediction Observe cross-platform behavior  Treating applications and platforms as black boxes  Avoiding case-by-case model building  Covering entire application Computation Communication I/O  Convenient with third-party libraries Performance translation Observation: existence of “reference platform” Goal: Cross-platform Meta-predictor Approach: based on relative performance T = 20 hrs T = ? hrs

5 Supercomputing 2005 5 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work

6 Supercomputing 2005 6 Main Idea: Utilizing Partial Execution Observation: majority of scientific applications are iteration-based  Highly repetitive behavior  phases -> timesteps Execute small partial executions  Low-cost “test drives”  Simple APIs (indicate timesteps: k)  Quit after k timesteps Full-1 Partial-1Partial-2 Relative performance = 0.6 Full-2 (predicted) reference system target system

7 Supercomputing 2005 7 Application Model Execution of parallel simulations modeled as regular expression I(C*[W])*F  I: one-time initialization phase  C: computation phase  W: optional I/O phase  F: one-time finalization phase  Different phases likely have different cross-platform relative performance Major challenges  Avoid impact of initially unstable performance  Predict correct mixture of C and W phases

8 Supercomputing 2005 8 Partial Execution Terminate applications prematurely API  init_timestep() Optional, useful with large setup phase  begin_timestep()  end_timestep(maxsteps) “begin” and “end” calls bracket C or CW phase Execution terminated after maxsteps timesteps Easy-to-use interface  2-3 lines of codes inserted into source codes

9 Supercomputing 2005 9 Base Prediction Model Given reference platform and target platform  Perform 1 or more partial executions  Compute average execution time of timestep on both platforms  Compute relative performance  Compute overall execution time estimate for target platform Prediction performance (predicted-to-actual ratio)

10 Supercomputing 2005 10 Refined Prediction Model Problem 1: initial performance fluctuations  Variances due to cache warm-up, etc.  May span dozens of timesteps Problem 2: periodic I/O phases  I/O frequency often configurable and determined at run time Unified solution  Monitor per-timestep performance variance at runtime Identify anomalies and repeated patterns  Filter out early, unstable timestep measurements Consider only later results once performance stabilizes Combine early timestep overheads into initialization cost  Computing sliding window averages of per-timestep overheads Use multiples of observed pattern length as window size

11 Supercomputing 2005 11 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work

12 Supercomputing 2005 12 Proof-of-concept experiments Questions:  Is relative performance observed in a very short early period indicative of overall relative performance?  Can we reuse partial execution data in predicting execution with different configurations? Experiment settings  Large-scale codes: 2 ASCI Purple (sphot and sPPM) fusion code (Gyro) rocket simulation (GENx)  Full runs take >5 hours  10 super computers: SDSC, NCSA, ORNL, LLNL, UIUC, NCSU, NERSC  7 architectures (SP3, SP4, Altix, Cray X1, 3 clusters: G5, Xeon, Itanium)

13 Supercomputing 2005 13 Base Model Accuracy (Sphot)  High accuracy with very short partial execution

14 Supercomputing 2005 14 Refined Model (sPPM, Ram->Henry2) Issues: Ram: init variance Henry2: 1 in 10 steps I/O normalized Smarter algorithms Initialization filter Sliding window handle anomaly and periodic I/O

15 Supercomputing 2005 15 Application with Variable Problem Size GENx Rocket Simulation (CSAR, UIUC), Turing  Frost  Limited accuracy w/ variable timesteps

16 Supercomputing 2005 16 Reusing Partial Execution Data Avg. Error: 12.1% - 25.8% Avg. Error: 5.6% - 37.9% Scientists often repeat runs with different configurations  Number of processors  Input size and data content  Computation tasks Results from Gyro fusion simulation on 5 platforms

17 Supercomputing 2005 17 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work

18 Supercomputing 2005 18 Conclusion Empirical performance prediction works!  Real-world production codes  Multiple parallel platforms  Highly accurate predictions  Limitations with Variable problem sizes Input-size/processor scaling Observation-based prediction  Simple  Portable  Low cost (few timesteps) T = 20 hrs T = 2 hrs T = 10 hrs T = 1 hrs

19 Supercomputing 2005 19 Related Work Parallel program performance prediction  Application-specific analytical models  Compiler/instrumentation tools  Simulation-based predictions Cross-platform performance studies  Mostly examine multiple platforms individually Grid job schedulers  Do not offer cross-platform performance translation

20 Supercomputing 2005 20 Ongoing and Future Work Evaluate with AMR applications Automated partial execution  Automatic computation phase identification  Binary rewriting to avoid source code modification Extend to non-dedicated systems  For job schedulers


Download ppt "Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science."

Similar presentations


Ads by Google