Download presentation
Presentation is loading. Please wait.
Published byJoseph Daniels Modified over 8 years ago
1
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois Curfman McInnes, Li Li Argonne National Laboratory Oscar Hernandez, Barbara Chapman University of Houston Kevin Huck University of Oregon
2
Outline Motivation Performance/Power Models Component Infrastructure Experiments Conclusions and Future Work Acknowledgements 2CBHPC, Karlsruhe, Germany, October 17, 2008
3
Component-Based Software Engineering Functional unit with well-defined interfaces and dependencies Components interact through ports Benefits: software reuse, complex software management, code generation, available “services” Drawback: more restrictive software engineering, need for runtime framework CBHPC, Karlsruhe, Germany, October 17, 2008 3
4
Motivation CBSE increasing in HPC Power increasing in importance A need for simpler processes for performance/power measurement and analysis ― Performance tools can be applied at the component abstraction layer ― Opportunities for automation CBHPC, Karlsruhe, Germany, October 17, 2008 4
5
Power vs. Energy Rate a system performs work Power = Work / ▲Time Total work over a period of time Energy = Power * ▲ Time CBHPC, Karlsruhe, Germany, October 17, 2008 5
6
Power Trends CBHPC, Karlsruhe, Germany, October 17, 2008 6 Cameron, K. W., Ge, R., and Feng, X. 2005. High-Performance, Power-Aware Distributed Computing for Scientific Applications. Computer 38, 11 (Nov. 2005), 40- 47.
7
Power Reduction Techniques Circuit and logic level Low power interconnect Low power memories and memory hierarchy Low power processor architecture adaptations Dynamic voltage scaling Resource hibernation Compiler level power management Application level power management CBHPC, Karlsruhe, Germany, October 17, 2008 7
8
Goals and Approach Provide a component based system ― Facilitates performance/power measurement and analysis ― Computes high level performance metrics ― Integrates existing tools into a uniform interface ― End Goal: static and dynamic optimizations based on offline/online analyses 8CBHPC, Karlsruhe, Germany, October 17, 2008
9
System Diagram 9 Interactive Analysis and Model Building Substitution Assertion Database Instrumented Component Application Runs Instrumented Component Application Runs Control System (parameter changes and component substitution) Control System (parameter changes and component substitution) CQoS-Enabled Component Application CQoS-Enabled Component Application Component A Component B Component C Substitution Set Machine Learning Performance/Power Databases (persistent & runtime) Analysis InfrastructureControl Infrastructure CBHPC, Karlsruhe, Germany, October 17, 2008
10
Performance Model I FLP Inefficiency – PD: Problem size dependent variant FLP Inefficiency – PI: Problem size independent variant CBHPC, Karlsruhe, Germany, October 17, 2008 10 Metric Global StallsStall_cycles/total_cycles % FLP StallsFLP_stalls/stall_cycles FLP Inefficiency – PDFLP_OPS * stalls/cycles FLP Inefficiency – PI(FLP_OPS/retired_inst) * stall/cycle
11
Performance Model II Core logic Stalls = L1D_register_stalls + branch_misprediction + instruction_miss + stack_engine_stalls + floating_point_stalls + pipeline_inter_register_dependency + processor_frontend_flush Memory Stalls = L1_hits * L1_latency + L2_hits * L2_latency + L3_hits * L3_latency + local_mem_access * local_mem_latency + remote_mem_access * remote_mem_latency + TLB_miss * TLB_miss_penalty CBHPC, Karlsruhe, Germany, October 17, 2008 11
12
Power Model CBHPC, Karlsruhe, Germany, October 17, 2008 12 Based on on-die components Leverages performance hardware counters
13
Die Photo for SiCortex CBHPC, Karlsruhe, Germany, October 17, 2008 13
14
Performance Measurement and Analysis System Components ― TAU: Performance measurement http://www.cs.uoregon.edu/research/tau/home.php ― Performance Database Component(s) ― PerfExplorer: Performance and power analysis http://www.cs.uoregon.edu/research/tau/docs/perfexplorer/ CBHPC, Karlsruhe, Germany, October 17, 2008 14 PerfExplorer Component TAU Component Component App Database Components Runtime Optimization Compiler feedback User/tool analysis
15
PerfExplorer Component Loads a python analysis script Performance and power analysis Data mining, inference rules, comparing different experimental runs CBHPC, Karlsruhe, Germany, October 17, 2008 15
16
Study I: Performance-Power Trade-offs CBHPC, Karlsruhe, Germany, October 17, 2008 16 Experiment – Effect of compiler optimization levels on performance and power Experimental Details ― Machine: SGI Altix 300 ― MPI Processes: 16 ― Compiler: OpenUH ― Code: GenIDLEST ― Optimization levels: -O0, -O1, -O2, -O3 ― Performance tools: TAU, PerfExplorer, and PAPI
17
Linux/ccNUMA CBHPC, Karlsruhe, Germany, October 17, 2008 17
18
Results CBHPC, Karlsruhe, Germany, October 17, 2008 18 Aggressive optimizations Higher power IPC ~ Power dissipation Aggressive optimizations Lower energy Operation count ~ energy consumption
19
Performance/Power Study With PETSc Codes PETSc: Portable Extensible Toolkit for Scientific Computation ― http://www.mcs.anl.gov/petsc/ Experimental Details ― Machine: SGI Altix 3600 ― Compiler: GCC ― MPI Processes: 32 ― Application: 2-D simulation of cavity flow Krylov subspace linear solvers: FGMRES, GMRES, BiCGS Preconditioner: Block Jacobi Problem Size: 16x16 each processor (weak scaling) ― Performance tools: TAU, PerfExplorer, PAPI CBHPC, Karlsruhe, Germany, October 17, 2008 19
20
Inefficiency CBHPC, Karlsruhe, Germany, October 17, 2008 20 ― Bottlenecks in methods used in solution of linear system ― Bottleneck also in preconditioner
21
Results FGMRES has good performance initially ― Not very power efficient BCGS is optimal for performance and power efficiency CBHPC, Karlsruhe, Germany, October 17, 2008 21
22
Conclusions Little or no hardware and software support for detailed power measurement and analysis on modern systems Need for more integrated toolsets supporting both performance and power measurements, analysis, and optimizations Combining tools with component based software engineering can benefit efficiency and effectiveness of tuning process CBHPC, Karlsruhe, Germany, October 17, 2008 22
23
Future Directions Integration of components into a framework Dynamic selection of algorithms and parameters based on offline/online analyses Compiler based performance power cost modeling Continue performance and power analysis of PETSc based codes Extension of performance and power model for more modern architectures CBHPC, Karlsruhe, Germany, October 17, 2008 23
24
References Jarp, S. A methodology for using the itanium-2 performance counters for bottleneck analysis. Tech.rep., HP Labs, August 2002. Bircher, W.L.; John, L.K. Complete System Power Estimation: A Trickle- Down Approach Based on Performance Events. International Symposium on Performance Analysis of Systems & Software, Page(s):158 - 168, 2007. Isci, C. and Martonosi, M. 2003. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data. In Proceedings of the 36th Annual IEEE/ACM international Symposium on Microarchitecture (December 03 - 05, 2003). K. Huck, O. Hernandez, V. Bui, S. Chandrasekaran, B. Chapman, A. D. Malony, L.C. McInnes, and B. Norris. Capturing Performance Knowledge for Automated Analysis, Supercomputing, 2008. http://www2.cs.uh.edu/~vtbui/sc.pdf http://www2.cs.uh.edu/~vtbui/sc.pdf 24CBHPC, Karlsruhe, Germany, October 17, 2008
25
Acknowledgments Professors/Advisors: Boyana Norris, Lois Curfman McInnes, Barbara Chapman, Allen Maloney, Danesh Tafti Students: Oscar Hernandez, Kevin Huck, Sunita Chandrasekaran, Li Li SiCortex: Lawrence Stuart and Dan Jackson MCS Division, Argonne National Laboratory NSF, DOE, NCSA, NASA CBHPC, Karlsruhe, Germany, October 17, 2008 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.