Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.

Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science Center for High Performance Simulations (CHiPS) North Carolina State University (* Joint Faculty with Oak Ridge National Laboratory)

Supercomputing 2005 2 Presentation Roadmap Introduction Model and approach Performance results Conclusion and future work

Supercomputing 2005 3 Cross-Platform Performance Prediction Users face wide selection of machines Need cross-platform performance prediction to  Choose platform to use / purchase  Estimate resource usage  Estimate job wall time Machines and applications both grow larger and more complex  Modeling- and simulation- based approaches harder and more expensive  Performance data not reused in performance prediction

Supercomputing 2005 4 Observation-based Performance Prediction Observe cross-platform behavior  Treating applications and platforms as black boxes  Avoiding case-by-case model building  Covering entire application Computation Communication I/O  Convenient with third-party libraries Performance translation Observation: existence of “reference platform” Goal: Cross-platform Meta-predictor Approach: based on relative performance T = 20 hrs T = ? hrs

Supercomputing 2005 6 Main Idea: Utilizing Partial Execution Observation: majority of scientific applications are iteration-based  Highly repetitive behavior  phases -> timesteps Execute small partial executions  Low-cost “test drives”  Simple APIs (indicate timesteps: k)  Quit after k timesteps Full-1 Partial-1Partial-2 Relative performance = 0.6 Full-2 (predicted) reference system target system

Supercomputing 2005 7 Application Model Execution of parallel simulations modeled as regular expression I(C*[W])*F  I: one-time initialization phase  C: computation phase  W: optional I/O phase  F: one-time finalization phase  Different phases likely have different cross-platform relative performance Major challenges  Avoid impact of initially unstable performance  Predict correct mixture of C and W phases

Supercomputing 2005 8 Partial Execution Terminate applications prematurely API  init_timestep() Optional, useful with large setup phase  begin_timestep()  end_timestep(maxsteps) “begin” and “end” calls bracket C or CW phase Execution terminated after maxsteps timesteps Easy-to-use interface  2-3 lines of codes inserted into source codes

Supercomputing 2005 9 Base Prediction Model Given reference platform and target platform  Perform 1 or more partial executions  Compute average execution time of timestep on both platforms  Compute relative performance  Compute overall execution time estimate for target platform Prediction performance (predicted-to-actual ratio)

Supercomputing 2005 10 Refined Prediction Model Problem 1: initial performance fluctuations  Variances due to cache warm-up, etc.  May span dozens of timesteps Problem 2: periodic I/O phases  I/O frequency often configurable and determined at run time Unified solution  Monitor per-timestep performance variance at runtime Identify anomalies and repeated patterns  Filter out early, unstable timestep measurements Consider only later results once performance stabilizes Combine early timestep overheads into initialization cost  Computing sliding window averages of per-timestep overheads Use multiples of observed pattern length as window size

Supercomputing 2005 12 Proof-of-concept experiments Questions:  Is relative performance observed in a very short early period indicative of overall relative performance?  Can we reuse partial execution data in predicting execution with different configurations? Experiment settings  Large-scale codes: 2 ASCI Purple (sphot and sPPM) fusion code (Gyro) rocket simulation (GENx)  Full runs take >5 hours  10 super computers: SDSC, NCSA, ORNL, LLNL, UIUC, NCSU, NERSC  7 architectures (SP3, SP4, Altix, Cray X1, 3 clusters: G5, Xeon, Itanium)

Supercomputing 2005 13 Base Model Accuracy (Sphot)  High accuracy with very short partial execution

Supercomputing 2005 14 Refined Model (sPPM, Ram->Henry2) Issues: Ram: init variance Henry2: 1 in 10 steps I/O normalized Smarter algorithms Initialization filter Sliding window handle anomaly and periodic I/O

Supercomputing 2005 15 Application with Variable Problem Size GENx Rocket Simulation (CSAR, UIUC), Turing  Frost  Limited accuracy w/ variable timesteps

Supercomputing 2005 16 Reusing Partial Execution Data Avg. Error: 12.1% - 25.8% Avg. Error: 5.6% - 37.9% Scientists often repeat runs with different configurations  Number of processors  Input size and data content  Computation tasks Results from Gyro fusion simulation on 5 platforms

Supercomputing 2005 18 Conclusion Empirical performance prediction works!  Real-world production codes  Multiple parallel platforms  Highly accurate predictions  Limitations with Variable problem sizes Input-size/processor scaling Observation-based prediction  Simple  Portable  Low cost (few timesteps) T = 20 hrs T = 2 hrs T = 10 hrs T = 1 hrs

Supercomputing 2005 19 Related Work Parallel program performance prediction  Application-specific analytical models  Compiler/instrumentation tools  Simulation-based predictions Cross-platform performance studies  Mostly examine multiple platforms individually Grid job schedulers  Do not offer cross-platform performance translation

Supercomputing 2005 20 Ongoing and Future Work Evaluate with AMR applications Automated partial execution  Automatic computation phase identification  Binary rewriting to avoid source code modification Extend to non-dedicated systems  For job schedulers

Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.

Similar presentations

Presentation on theme: "Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.

Similar presentations

Presentation on theme: "Supercomputing 2005 1 Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback