The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.

Slides:



Advertisements
Similar presentations
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Advertisements

Introductions to Parallel Programming Using OpenMP
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Workload Characterization using the TAU Performance System Sameer Shende, Allen D. Malony, Alan Morris University of Oregon {sameer,
Sameer Shende, Allen D. Malony, and Alan Morris {sameer, malony, Steven Parker, and J. Davison de St. Germain {sparker,
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
TAU Performance System Alan Morris, Sameer Shende, Allen D. Malony University of Oregon {amorris, sameer,
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory NeuroInformatics Center University.
TAU Parallel Performance System DOD UGC 2004 Tutorial Part 1: TAU Overview and Architecture.
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
TAU Performance Toolkit (WOMPAT 2004 OpenMP Lab) Sameer Shende, Allen D. Malony University of Oregon {sameer,
Scalability Study of S3D using TAU Sameer Shende
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Performance Observation Sameer Shende and Allen D. Malony cs.uoregon.edu.
Performance Technology for Complex Parallel Systems REFERENCES.
SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed.
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Performance Analysis Tool List Hans Sherburne Adam Leko HCS Research Laboratory University of Florida.
Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
Tutorial Outline – Part 1
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Outline Introduction Motivation for performance mapping SEAA model
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen D. Malony, Robert Bell University of Oregon {sameer, malony,

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Outline  Motivation  Part I: Instrumentation  Part II: Measurement  Part III: Analysis Tools  Conclusion

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high- performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable, configurable performance profiling/tracing facility  Open software approach  University of Oregon, LANL, FZJ Germany 

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, TAU Performance System Architecture EPILOG Paraver paraprof

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, TAU Analysis  Parallel profile analysis  pprof  parallel profiler with text-based display  paraprof  Graphical, scalable, parallel profile analysis and display  Trace analysis and visualization  Trace merging and clock adjustment (if necessary)  Trace format conversion (ALOG, SDDF, VTF, Paraver)  Trace visualization using Vampir (Pallas/Intel)

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Pprof Output (ESMF CoupledFlowSolver)  IBM AIX  F95, C++, C, MPI  Profile - Node - Context - Thread  Events - code - MPI

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Terminology – Example  For routine “int main( )”:  Exclusive time  =10 secs  Inclusive time  100 secs  Calls  1 call  Subrs (no. of child routines called)  3  Inclusive time/call  100secs int main( ) { /* takes 100 secs */ f1(); /* takes 20 secs */ f2(); /* takes 50 secs */ f1(); /* takes 20 secs */ /* other work */ } /* Time can be replaced by counts */

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Performance Analysis and Visualization  Analysis of parallel profile and trace measurement  Parallel profile analysis  ParaProf  Cube Profile Browser (UTK, FZJ)  Profile generation from trace data  Performance data management framework (PerfDMF)  Parallel trace analysis  Translation to VTF 3.0 and EPILOG  Integration with VNG (Technical University of Dresden)  Online parallel analysis and visualization

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, TAU’s ParaProf Framework Architecture  Portable, extensible, and scalable tool for profile analysis  Try to offer “best of breed” capabilities to analysts  Build as profile analysis framework for extensibility

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Profile Manager Window  Structured AMR toolkit (SAMRAI++), LLNL

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Paraprof: CoupledFlowApp (ESMF) on 4 Nodes

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Paraprof Mean Profile (4 nodes)

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Individual Node (0) Profile in Paraprof

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, MPI Routines

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Text Profile Window

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, k-Level Callpath Implementation in TAU  TAU maintains a performance event (routine) callstack  Profiled routine (child) looks in callstack for parent  Previous profiled performance event is the parent  A callpath profile structure created first time parent calls  TAU records parent in a callgraph map for child  String representing k-level callpath used as its key  “a( )=>b( )=>c()” : name for time spent in “c” when called by “b” when “b” is called by “a”  Map returns pointer to callpath profile structure  k-level callpath is profiled using this profiling data  Set environment variable TAU_CALLPATH_DEPTH to depth  Build upon TAU’s performance mapping technology  Measurement is independent of instrumentation  Use –PROFILECALLPATH to configure TAU

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, k-Level Callpath Implementation in TAU

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Examining Callpaths

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Unique Callpaths

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Gprof Style Parent, Routine, Children Display

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Clickable Callpath Entities

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Paraprof

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Tracking I/O on Node 0 in ESMF

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Calling Path for MPI_Recv( )

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, CUBE (UTK, FZJ) Browser [Sept. 2004]

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Using TAU with Vampir (Intel Trace Analyzer)  Configure TAU with -TRACE option % configure –TRACE –mpi …  Execute application % poe CoupledFlowApp –procs 4  This generates TAU traces and event descriptors  Merge all traces using tau_merge % tau_merge *.trc app.trc  Convert traces to Vampir Trace format using tau_convert % tau_convert –pv app.trc tau.edf app.pv Note: Use –vampir instead of –pv for multi-threaded traces  Load generated trace file in Vampir % vampir app.pv

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Global Timeline Display with Parallelism View

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Vampir: Zooming In…

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Vampir: IO on Node 0

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Vampir: Communication Matrix Display

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Vampir: Calltree View

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Summary Chart

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, TAU Performance System Status  Computing platforms (selected)  IBM SP / pSeries, SGI Origin 2K/3K, Cray T3E / SV-1 / X1, HP (Compaq) SC (Tru64), Sun, Hitachi SR8000, NEC SX-5/6, Linux clusters (IA-32/64, Alpha, PPC, PA- RISC, Power, Opteron), Apple (G4/5, OS X), Windows  Programming languages  C, C++, Fortran 77/90/95, HPF, Java, OpenMP, Python  Thread libraries  pthreads, SGI sproc, Java,Windows, OpenMP  Compilers (selected)  Intel KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, Sun, Microsoft, SGI, Cray, IBM (xlc, xlf), Compaq, NEC, Intel

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Concluding Remarks  Complex parallel systems and software pose challenging performance analysis problems that require robust methodologies and tools  To build more sophisticated performance tools, existing proven performance technology must be utilized  Performance tools must be integrated with software and systems models and technology  Performance engineered software  Function consistently and coherently in software and system environments  TAU performance system offers robust performance technology that can be broadly integrated

Using TAU Performance Technology in ESMF NRL D.C. BYOC Workshop Aug 11, Support Acknowledgements  Department of Energy (DOE)  Office of Science contracts  University of Utah DOE ASCI Level 1 sub-contract  DOE ASCI Level 3 (LANL, LLNL)  NSF National Young Investigator (NYI) award  Research Centre Juelich  John von Neumann Institute for Computing  Dr. Bernd Mohr  Los Alamos National Laboratory