Download presentation
Presentation is loading. Please wait.
Published byAnnice Constance Terry Modified over 9 years ago
1
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit : Multi-platform Tools for Performance Analysis http://www.hipersoft.rice.edu/hpctoolkit/
2
2 The Big Picture Long-term: compiler and architecture research requires detailed performance understanding — identify performance bottlenecks in complex applications — understand the mismatch between application needs and architecture capabilities — automate strategies for performance improvement Short-term result: programmer-accessible tools for understanding application performance
3
http://www.hipersoft.rice.edu/hpctoolkit/ 3 Performance Analysis and Tuning Increasingly necessary — gap between typical and peak performance is growing Increasingly hard — complex architectures are harder to program effectively –deeply-pipelined microprocessors VLIW or superscalar –complex memory hierarchy non-blocking, multi-level caches — large-scale scientific applications pose challenges for tools
4
http://www.hipersoft.rice.edu/hpctoolkit/ 4 LACSI HPCToolkit Support large, multi-lingual applications — a mix of of Fortran, C, C++ — hundreds of thousands of lines, many procedures — external libraries Eliminate manual labor from run, analyze tune cycle — use optimized application binaries directly –no: manual instrumentation, build process changes, recompilation Platform, language, and compiler independence — emphasis on LANL ASC Platforms (Origin, AlphaServer, Opteron) — multiple data sources cross platform comparisons Scalable data collection Effective presentation of analysis results — intuitive, top-down user interface –hierarchical program structure with loop level metrics
5
http://www.hipersoft.rice.edu/hpctoolkit/ 5 HPCToolkit System Overview application source application source profile execution performance profile performance profile binary object code binary object code compilation linking binary analysis program structure program structure interpret profile source correlation hyperlinked database hyperlinked database hpcviewer
6
http://www.hipersoft.rice.edu/hpctoolkit/ 6 HPCToolkit System Overview profile execution performance profile performance profile application source application source binary object code binary object code compilation linking binary analysis program structure program structure interpret profile source correlation hyperlinked database hyperlinked database hpcviewer — launch unmodified, optimized application binaries — collect statistical profiles of events of interest
7
http://www.hipersoft.rice.edu/hpctoolkit/ 7 HPCToolkit System Overview profile execution performance profile performance profile application source application source binary object code binary object code compilation linking binary analysis program structure program structure interpret profile source correlation hyperlinked database hyperlinked database hpcviewer — decode instructions and combine with profile data
8
http://www.hipersoft.rice.edu/hpctoolkit/ 8 HPCToolkit System Overview profile execution performance profile performance profile application source application source binary object code binary object code compilation linking binary analysis program structure program structure interpret profile source correlation hyperlinked database hyperlinked database hpcviewer — extract loop nesting information from executables
9
http://www.hipersoft.rice.edu/hpctoolkit/ 9 HPCToolkit System Overview profile execution performance profile performance profile application source application source binary object code binary object code compilation linking binary analysis program structure program structure interpret profile source correlation hyperlinked database hyperlinked database hpcviewer — synthesize new metrics by combining metrics — relate metrics, structure, and program source
10
http://www.hipersoft.rice.edu/hpctoolkit/ 10 HPCToolkit System Overview profile execution performance profile performance profile application source application source binary object code binary object code compilation linking binary analysis program structure program structure interpret profile source correlation hyperlinked database hyperlinked database hpcviewer — support top-down analysis with interactive viewer — analyze results anytime, anywhere
11
http://www.hipersoft.rice.edu/hpctoolkit/ 11 HPCViewer Screenshot MetricsNavigation Annotated Source View
12
http://www.hipersoft.rice.edu/hpctoolkit/ 12 Impact on LANL Code Teams HPCToolkit deployed on Origin — improved SAGE by 2x on one example (see next slide) First performance workshop (Feb 03) — Feedback: needed on Q, smaller DB on large codes — Improvements: Sophisticated support for Alpha/Tru64 platform, new Java browser using compact database Second performance workshop (July 03) — Feedback: ready to use, binary analysis too slow on large codes — Improvement: sped up binary analysis on large codes by 30x HPCToolkit deployed on secure machines (July 03) — used to evaluate FLAG for ASCI burn code review (Aug 03) Ongoing interactions — Feedback: better support for shared libraries and Opteron — Improvement: new support for shared libraries installed on Q — Ongoing work: LANL/Rice collaboration for Opteron support
13
http://www.hipersoft.rice.edu/hpctoolkit/ 13 Sage Solver Performance Improvement
14
http://www.hipersoft.rice.edu/hpctoolkit/ 14 Future Collect and present dynamic context — what path gets us to expensive computations — accurate call-graph profiling of unmodified binaries — analysis and presentation of dynamic context to explain performance –solver is slow only when called on non-preconditioned matrices –MPI wait cost is incurred in the backsolve Statistical clustering — effective analysis of large collections of processes Performance diagnosis — why rather than what
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.