Parallel Program Analysis Framework for the DOE ACTS Toolkit

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Performance Technology for Complex Parallel Systems Part 1 – Overview and TAU Introduction Allen D. Malony.
Performance Technology for Complex Parallel Systems Part 1 – Overview and TAU Introduction Allen D. Malony.
Martin Kruliš by Martin Kruliš (v1.1)1.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Computer System Structures
CIS 375 Bruce R. Maxim UM-Dearborn
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Productive Performance Tools for Heterogeneous Parallel Computing
Chapter 4: Multithreaded Programming
Current Generation Hypervisor Type 1 Type 2.
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
Tutorial Outline – Part 1
课程名 编译原理 Compiling Techniques
Wireless Instant Messaging Using J2ME
CMPE419 Mobile Application Development
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Performance Technology for Complex Parallel and Distributed Systems
A configurable binary instrumenter
TAU Parallel Performance System
Chapter 4: Threads.
TAU: A Framework for Parallel Performance Analysis
Chapter 4: Threads.
Software Engineering with Reusable Components
Lecture Topics: 11/1 General Operating System Concepts Processes
Allen D. Malony Computer & Information Science Department
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
MPJ: A Java-based Parallel Computing System
Outline Introduction Motivation for performance mapping SEAA model
Chapter 4: Threads & Concurrency
Performance Technology for Complex Parallel and Distributed Systems
Introduction to Virtual Machines
HPC User Forum: Back-End Compiler Technology Panel
Introduction to Virtual Machines
CMPE419 Mobile Application Development
ONAP Architecture Principle Review
Presentation transcript:

Parallel Program Analysis Framework for the DOE ACTS Toolkit Allen D. Malony, Sameer Shende, Robert Ansell-Bell {malony,sameer,bertie}@cs.uoregon.edu Computer & Information Science Department Computational Science Institute University of Oregon

Performance Needs Performance Technology Observe/analyze/understand performance behavior Multiple levels of software and hardware Different types and detail of performance data Alternative performance problem solving methods Multiple targets of software and system application Robust AND ubiquitous performance technology Broad scope of performance observability Flexible and configurable mechanisms Technology integration and extension Cross-platform portability Open layered and modular framework architecture April 19, 2019

Complexity Challenges in DOE ACTS Environs Computing system environment complexity Observation integration and optimization Access, accuracy, and granularity constraints Diverse/specialized observation capabilities/technology Restricted modes limit performance problem solving Sophisticated software development environments Programming paradigms and performance models Performance data mapping to software abstractions Uniformity of performance abstraction across platforms Rich observation capabilities and flexible configuration Common performance problem solving methods April 19, 2019

Computation Model for Performance Technology How to address dual performance technology goals? Robust capabilities + widely available methodologies Contend with problems of system diversity Flexible tool composition/configuration/integration Base technology on abstract computation model general architecture and software execution features map features/methods to existing system types develop capabilities that can adapt and be optimized April 19, 2019

General Complex System Computation Model Node: physically distinct shared memory machine Message passing node interconnection network Context: distinct virtual memory space within node Thread: execution threads (user/system) in context Network Node Node Node memory node memory SMP memory VM space …    … Context Threads April 19, 2019

TAU Performance Framework Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high-performance computing Targets a general complex system computation model nodes / contexts / threads multi-level: system / software / parallelism measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization portable performance profiling/tracing facility open software approach April 19, 2019

TAU Architecture Dynamic April 19, 2019

TAU Instrumentation Flexible, multiple instrumentation mechanisms Source code manual automatic using PDT (tau_instrumentor) Object code pre-instrumented libraries (e.g., POOMA) statically linked (e.g., MPI wrapper library) dynamically linked (e.g., JVM profiling interface) Executable code dynamic instrumentation using DynInstAPI (tau_run) Virtual machine April 19, 2019

TAU Instrumentation (continued) Common target measurement interface (TAU API) C++ (object-based) design and implementation Macro-based, using constructor/destructor techniques Function, classes, and templates Uniquely identify functions and templates name and type signature (name registration) static object creates performance entry dynamic object receives static object pointer runtime type identification for template instantiations C and Fortran instrumentation variants Instrumentation and measurement optimization April 19, 2019

TAU Measurement Performance information Organization High resolution timer library (real-time / virtual clocks) Generalized software counter library Hardware performance counters PCL (Performance Counter Library) (ZAM, Germany) PAPI (Performance API) (UTK, Ptools Consortium) consistent, portable API Organization Node, context, thread levels Profile groups for collective events (runtime selective) Mapping between software levels April 19, 2019

TAU Measurement (continued) Profiling Function-level, block-level, statement-level Supports user-defined events TAU profile (function) database (PD) Function callstack Hardware counts instead of time Tracing Profile-level events Interprocess communication events Timestamp synchronization User-controlled configuration (configure) April 19, 2019

TAU Analysis Profile analysis Trace analysis Pprof Racy parallel profiler with texted based display Racy graphical interface to pprof Trace analysis Trace merging and clock adjustment (if necessary) Trace format conversion (ALOG, SDDF, PV, Vampir) Vampir (Pallas) April 19, 2019

TAU Status Usage (selective) Platforms Languages IBM SP, SGI Origin 2K, Intel Teraflop, Cray T3E, HP, Sun, Windows 95/98/NT, Alpha/Pentium Linux cluster, IA-64 Languages C, C++, Fortran 77/90, HPF, pC++, HPC++, Java Communication libraries MPI, PVM, Nexus, Tulip, ACLMPL Thread libraries pthreads, Tulip, SMARTS, Java,Windows, OpenMP Compilers KAI (KCC and KAP/Pro), PGI, GNU, Fujitsu, Sun, Microsoft, SGI, Cray, IBM April 19, 2019

TAU Status (continued) Application libraries Blitz++, A++/P++, ACLVIS, PAWS Application frameworks POOMA, POOMA-2, MC++, Conejo, PaRP Other projects ACPC, University of Vienna: Opus/HPF KAI and Pallas: OpenMP/MPI TAU profiling and tracing toolkit (Version 2.8) Extensive TAU User’s Guide http://www.acl.lanl.gov/tau http://www.cs.uoregon.edu/research/paracomp/tau April 19, 2019

TAU Application Scenarios Instrumentation examples Instrumentation of C++ source and templates Instrumentation of multi-threaded code Object-oriented (C++) template libraries Template-derived code performance measurement Array classes and expression transformation Source code performance mapping Multi-level and asynchronous computation Multi-threaded parallel execution Asynchronous runtime system scheduling Parallel performance mapping April 19, 2019

TAU Application Scenarios (continued) Hardware performance measurement Integration of external performance technology Cross-platform hardware counter API Virtual machine execution Abstract thread-based performance measurement Performance measurement integration in virtual machine Hierarchical, hybrid (mixed model) parallel systems Portable shared memory and message passing APIs Combined task and data parallel execution Performance system configuration and model mapping April 19, 2019

TAU Java Instrumentation Architecture Java program TAU package mpiJava package Thread API JNI MPI profiling interface Event notification TAU TAU wrapper Native MPI library JVMPI Profile DB April 19, 2019

Program Database Toolkit (PDT) Program code analysis framework for developing source-based tools High-level interface to source code information Integrated toolkit for source code parsing, database creation, and database query commercial grade front end parsers portable IL analyzer, database format, and access API open software approach for tool development Target and integrate multiple source languages http://www.acl.lanl.gov/pdtoolkit April 19, 2019

PDT Architecture and Tools April 19, 2019

PDT and TAU Instrumentation Manual source instrumentation time consuming and error prone Automatic source instrumentation need function and method signature need parameter type information need source file and line information generate instrumentation statement insert instrumentation in source file Use PDT to create/access program code information Develop instrumentation tool April 19, 2019

PDT Summary Program Database Toolkit (Version 1.2) EDG C++ Front End (Version 2.41.2) C++ IL Analyzer and DUCTAPE library tools: pdbmerge, pdbconv, pdbtree, pdbhtml standard C++ system header files (KAI KCC 3.4c) Fortran 90 IL Analyzer in progress Automated TAU performance instrumentation Program analysis support for SILOON (ACL CD) “A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software” (SC 2000) April 19, 2019

TAU Future Plans Platforms IA-64, Compaq ASCI Q, Sun Starfire, Linux SC,... Languages / Programming Environments OpenMP + MPI, Java (Java Grande), Opus / Java, … Instrumentation Automatic (F90, Java), DynInst, DPCL, DITools Measurement Extend tracing support to include event data (e.g., HW counts) Dynamic performance measurement control Application libraries PETSc, GloalArrays, ScaLAPACK, SuperLU Automatic performance analysis (APART Esprit WG) Distributed Performance Monitoring Performance database technology Target next-generation DOE ACTS aims April 19, 2019

Conclusions Complex parallel computing environments require robust and widely available performance technology Portable, cross-platform, multi-level, integrated Able to bridge and reuse existing technology Technology savvy and open TAU is only a performance technology framework General computation model and core services Mapping, extension, and refinement Integration of additional performance technology Need for higher-level framework layers Computational and performance model archetypes Performance diagnosis April 19, 2019