Performance Technology for Scalable Parallel Systems Allen D. Malony Department of Computer and Information Science University of Oregon
Performance Technology Tools for performance problem solving Empirical-based performance optimization process Performance technology concerns Data mining Models Expert systems Performance Technology Performance Tuning Experiment management Performance storage Performance Technology hypotheses Performance Diagnosis properties Instrumentation Measurement Analysis Visualization Performance Technology Performance Experimentation characterization Performance Observation
TAU Parallel Performance System Project Tuning and Analysis Utilities (15+ year project effort) Performance system framework for HPC systems Integrated, scalable, and flexible Target parallel programming paradigms Integrated toolkit for performance problem solving Instrumentation, measurement, analysis, and visualization Portable performance profiling and tracing facility Performance data management and data mining Partners LLNL, ANL, LANL Research Centre Jülich, TU Dresden
TAU Parallel Performance System Goals Portable (open source) parallel performance system Computer system architectures and operating systems Different programming languages and compilers Multi-level, multi-language performance instrumentation Flexible and configurable performance measurement Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid, object oriented, component-based Support for performance mapping Integration of leading performance technology Scalable (very large) parallel performance analysis
TAU Performance System Components TAU Architecture Program Analysis Performance Data Mining PDT PerfExplorer Parallel Profile Analysis PerfDMF Performance Monitoring ParaProf TAUoverSupermon
TAU Instrumentation and Measurement
TAU Analysis
TAU on HPC Platforms with Intel Processors ARL (JVN / MJM, x86_64 Linux NetworX) 14.7 TF / 52.8 TF, 2048 / 4400 processors ARFL (Hawk, Eagle ia64 SGI Altix) 59 TF, 9216 processors NCSA (Abe, x86_64 Dell) 89.47 TF, 9600 cores NASA (Columbia, ia64 SGI Altix) 60.96 TF, 10240 processors MHPCC (Jaws, x86_64 Dell) 60 TF, 5120 processors TACC (Lonestar, x86_64 Dell) 62 TF, 5840 processors
TAU on Leadership Class Facilities and TeraGrid Argonne National Laboratory IBM BG/P 111 TF 32768 processors Oak Ridge National Laboratory Cray XT-4 119 TF peak 23416 cores (AMD Opteron) Texas Advanced Computing Center Sun Blade 8000 504 TF peak 62976 cores (AMD Opteron)
Recent Funding A. Malony, S. Shende, N. Nystrom, S. Moore, R. Kufrin, SDCI HPC Improvement: High-Productivity Performance Engineering (Tools, Methods, Training) for NSF HPC Applications, NSF Software Development for Cyberinfrastructure (SDCI), 11/1/2007-10/31/2010. A. Malony, S. Shende, Knowledge-based Parallel Performance Technology, DOE Office of Science, 9/1/2007-8/31/2010. P. Beckman, A. Malony, Extreme Performance Scalable Operating Systems, DOE Office of Science, 12/1/04-1/31/08. A. Malony, S. Shende, Application-Specific Performance Technology for Productive Parallel Computing, DOE Office of Science, 5/1/05-4/30/08. S. McKee, A. Malony, G. Tyson, ST-HEC: Collaborative Research: Scalable, Interoperable Tools to Support Autonomic Optimization of High-End Applications, NSF High-End Computing (HEC), 11/1/04-10/31/07. A. Malony, Multi-core Parallel Programming and Performance Tools, Intel equipment grant, 9/15/2006. A. Malony, M. Sottile, Multi-core Parallel Programming, Intel equipment grant, 11/1/2007.
Intel Contacts Justin Rattner, Intel Senior Fellow, Vice-President Director, Corporate Technology Group In 1988 Rattner (at Intel Scientific Computers) suggested implementing a performance monitor for the iPSC/2 hypercube A. Malony, D. Reed, “A Hardware- Based Performance Monitor for the Intel iPSC/2 Hypercube,” ICS 1990. David Kuck, Intel Fellow Software and Solutions Group Director, Parallel and Distributed Solutions Division Worked for Kuck at the Center for Supercomputing Research and Development, University of Illinois, Urbana-Champaign Tim Mattson, Senior Research Scientist Computational Software Laboratory Werner Krotz-Vogel Technical Marketing Engineer, Intel Cluster Tools
Support Acknowledgements Department of Energy (DOE) Office of Science ASCR, Argonne National Lab ASC/NNSA University of Utah ASC/NNSA Level 1 ASC/NNSA, Lawrence Livermore National Lab Department of Defense (DoD) HPC Modernization Office (HPCMO) NSF Software Development for Cyberinfrastructure Los Alamos National Laboratory Research Centre Juelich, TU Dresden ParaTools, Inc.