CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen.

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

Sameer Shende Department of Computer and Information Science NeuroInformatics Center University of Oregon Generating Proxy Components.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Sameer Shende, Allen D. Malony, and Alan Morris {sameer, malony, Steven Parker, and J. Davison de St. Germain {sparker,
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
Allen D. Malony Department of Computer and Information Science University of Oregon Performance Technology for Scientific (Parallel.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende, Allen D. Malony {sameer, Department of Computer and Information Science Computational Science Institute University.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee Sameer Shende, and.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
An overview of the DANSE software architecture Michael Aivazis Caltech DANSE Kick-Off Meeting Pasadena Aug 15, 2006.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Technology for Complex Parallel Systems REFERENCES.
Understanding and Managing WebSphere V5
SC’01 Tutorial Nov. 7, 2001 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University.
CCA Common Component Architecture CCA Forum Tutorial Working Group Introduction to Components.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Grid Computing Research Lab SUNY Binghamton 1 XCAT-C++: A High Performance Distributed CCA Framework Madhu Govindaraju.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Preparatory Research on Performance Tools for HPC HCS Research Laboratory University of Florida November 21, 2003.
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Connections to Other Packages The Cactus Team Albert Einstein Institute
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
CCA Common Component Architecture CCA Forum Tutorial Working Group Writing Components.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
CCA Common Component Architecture CCA Forum Tutorial Working Group Writing Components.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
CCA Common Component Architecture CCA Forum Tutorial Working Group A Simple CCA Component.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
University of Technology
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
Performance Technology for Complex Parallel and Distributed Systems
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Performance Technology for Parallel Component Software
Chapter 7 –Implementation Issues
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Generating Proxy Components using PDT
Presentation transcript:

CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen (LANL) Jaideep Ray (SNL, CA) Matt Sottile (LANL)

CCA Common Component Architecture Performance Technology for Component Software - TAU 2 Overview Complexity and performance technology TAU performance system Developing performance interfaces for CCA Performance modeling and prediction issues Conclusions

CCA Common Component Architecture Performance Technology for Component Software - TAU 3 Focus on Component Technology Emerging component technology for HPC and Grid Component: software object embedding functionality Component architecture (CA): how components connect Component framework: implements a CA Common Component Architecture (CCA) –Standard foundation for scientific component architecture –Component descriptions Scientific Interface Description Language (SIDL) –CCA ports for component interactions –CCA framework services (CCAFEINE)

CCA Common Component Architecture Performance Technology for Component Software - TAU 4 Problem Statement How do we create robust and ubiquitous performance technology for the analysis and tuning of component software in the presence of (evolving) complexity challenges? How do we apply performance technology effectively for the variety and diversity of performance problems that arise in the context of CCA components? 

CCA Common Component Architecture Performance Technology for Component Software - TAU 5 Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high- performance computing Targets a general complex system computation model –nodes / contexts / threads –Multi-level: system / software / parallelism –Measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization –Portable, configurable performance profiling/tracing facility –Open software approach University of Oregon, LANL, FZJ Germany

CCA Common Component Architecture Performance Technology for Component Software - TAU 6 TAU Performance System Architecture EPILOG Paraver

CCA Common Component Architecture Performance Technology for Component Software - TAU 7 TAU Instrumentation Flexible instrumentation mechanisms at multiple levels –Source code Manual (TAU API, CCA Measurement Port API) automatic using Program Database Toolkit (PDT), OPARI (for OpenMP programs), Babel SIDL compiler (proposed) –Object code pre-instrumented libraries (e.g., MPI using PMPI) statically linked dynamically linked (e.g., Virtual machine instrumentation) fast breakpoints (compiler generated) –Executable code dynamic instrumentation (pre-execution) using DynInstAPI

CCA Common Component Architecture Performance Technology for Component Software - TAU 8 Program Database Toolkit Application / Library C / C++ parser Fortran 77/90 parser C / C++ IL analyzer Fortran 77/90 IL analyzer Program Database Files IL DUCTAPE PDBhtml SILOON CHASM TAU_instr Program documentation Application component glue C++ / F90 interoperability Automatic source instrumentation

CCA Common Component Architecture Performance Technology for Component Software - TAU 9 Program Database Toolkit (PDT) Program code analysis framework for developing source-based tools for C99, C++ and F90 [U.Oregon, LANL, FZJ Germany] High-level interface to source code information Widely portable: –IBM, SGI, Compaq, HP, Sun, Linux clusters,Windows, Apple, Hitachi, Cray T3E... Integrated toolkit for source code parsing, database creation, and database query –commercial grade front end parsers (EDG for C99/C++, Mutek for F90) –Intel/KAI C++ headers for std. C++ library distributed with PDT –portable IL analyzer, database format, and access API –open software approach for tool development Target and integrate multiple source languages Used in CCA for automated generation of SIDL [CHASM] Use in TAU to build automated performance instrumentation tools (tau_instrumentor) Can be used to generate code for performance ports in CCA

CCA Common Component Architecture Performance Technology for Component Software - TAU 10 Extended Component Design PKC: Performance Knowledge Component POC: Performance Observability Component generic component Extended Component Design

CCA Common Component Architecture Performance Technology for Component Software - TAU 11 Performance Observation Ability to observe execution performance is important –Empirically-derived performance knowledge Does not require measurement integration in component –Monitor during execution to make dynamic decisions Measurement integration is key Performance observation integration –Component integration: core and variant –Runtime measurement and data collection –On-line and off-line performance analysis

CCA Common Component Architecture Performance Technology for Component Software - TAU 12 Performance Observation Component (POC) Performance observation in a performance-engineered component model Functional extension of original component design ( ) –Include new component methods and ports ( ) for other components to access measured performance data –Allow original component to access performance data Encapsulate as tightly-couple and co-resident performance observation object POC “provides” port allow use optmized interfaces ( ) to access ``internal'' performance observations

CCA Common Component Architecture Performance Technology for Component Software - TAU 13 Performance Observation Component Performance Component One performance component per context Performance component provides a Measurement Port –Measurement Port allows a user to create and access: Timer (start/stop, set name/type/group) Event (trigger) Control (enable/disable groups) Query (get functions, metrics, counters, dump to disk) Timer Event Control Query Measurement Port

CCA Common Component Architecture Performance Technology for Component Software - TAU 14 Measurement Port in CCAFEINE namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){} /* Create a Timer */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type, string group) = 0; /* Create a Query interface */ virtual performance::Query* createQuery(void) = 0; /* Create a User Defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0; /** * Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; } Performance Component API

CCA Common Component Architecture Performance Technology for Component Software - TAU 15 namespace performance { class Timer { public: virtual ~Timer() {} /* Start the Timer. Implement these methods in * a derived class to provide required functionality. */ virtual void start(void) = 0; /* Stop the Timer.*/ virtual void stop(void) = 0; virtual void setName(string name) = 0; virtual string getName(void) = 0; virtual void setType(string name) = 0; virtual string getType(void) = 0; /**Set the group name associated with the Timer * (e.g., All MPI calls can be grouped into an "MPI" group)*/ virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0; virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; } CCA Timer Interface

CCA Common Component Architecture Performance Technology for Component Software - TAU 16 Control Class Interface namespace performance { class Control { public: ~Control () { } /* Control instrumentation. Enable group Id.*/ virtual void enableGroupId(unsigned long id) = 0; /* Control instrumentation. Disable group Id. */ virtual void disableGroupId(unsigned long id) = 0; /* Control instrumentation. Enable group name. */ virtual void enableGroupName(string name) = 0; /* Control instrumentation. Disable group name.*/ virtual void disableGroupName(string name) = 0; /* Control instrumentation. Enable all groups.*/ virtual void enableAllGroups(void) = 0; /* Control instrumentation. Disable all groups.*/ virtual void disableAllGroups(void) = 0; }; } CCA Instrumentation Control Interface

CCA Common Component Architecture Performance Technology for Component Software - TAU 17 Query Class Interface namespace performance { class Query { public: virtual ~Query() {} /* Get the list of Timer names */ virtual void getTimerNames(const char **& functionList, int& numFuncs) = 0; /* Get the list of Counter names */ virtual void getCounterNames(const char **& counterList, int& numCounters) = 0; /* getTimerData. Returns lists of metrics.*/ virtual void getTimerData(const char **& inTimerList, int numTimers, double **& counterExclusive, double **& counterInclusive, int*& numCalls, int*& numChildCalls, const char **& counterNames, int& numCounters) = 0; virtual void dumpProfileData(void) = 0; virtual void dumpProfileDataIncremental(void) = 0; // timestamped dump virtual void dumpTimerNames(void) = 0; virtual void dumpTimerData(const char **& inTimerList, int numTimers) = 0; virtual void dumpTimerDataIncremental(const char **& inTimerList, int numTimers) = 0; }; } CCA Performance Query Interface

CCA Common Component Architecture Performance Technology for Component Software - TAU 18 Event Class Interface namespace performance { class Event { public: /** * Destructor */ virtual ~Event() { } /** * Register the name of the event */ virtual void trigger(double data) = 0; /* e.g., size of a message, error in an iteration, memory allocated */ }; } CCA User Defined Event Interface

CCA Common Component Architecture Performance Technology for Component Software - TAU 19 Measurement Port Implementation TAU component implements the MeasurementPort –Implements Timer, Control, Query and Control classes –Registers the port with the CCAFEINE framework Components target the generic MeasurementPort interface –Runtime selection of TAU component during execution –Instrumentation code independent of underlying tool –Instrumentation code independent of measurement choice –TauMeasurement_CCA port implementation uses a specific TAU measurement library

CCA Common Component Architecture Performance Technology for Component Software - TAU 20 Using MeasurementPort #include "ports/Measurement_CCA.h" … double MonteCarloIntegrator::integrate (double lowBound, double upBound, int count) { classic::gov::cca::Port * port; double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast (port); if (measurement_m == 0){ cerr createTimer( string("IntegrateTimer")); t->start(); for (int i = 0; i getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop(); Using the Timer Interface: An Example

CCA Common Component Architecture Performance Technology for Component Software - TAU 21 TAU Component in CCAFEINE repository get TauMeasurement repository get Driver repository get MidpointIntegrator repository get MonteCarloIntegrator repository get RandomGenerator repository get LinearFunction repository get NonlinearFunction repository get PiFunction create LinearFunction lin_func create NonlinearFunction nonlin_func create PiFunction pi_func create MonteCarloIntegrator mc_integrator create RandomGenerator rand create TauMeasurement tau connect mc_integrator RandomGeneratorPort rand RandomGeneratorPort connect mc_integrator FunctionPort nonlin_func FunctionPort connect mc_integrator MeasurementPort tau MeasurementPort create Driver driver connect driver IntegratorPort mc_integrator IntegratorPort go driver Go quit

CCA Common Component Architecture Performance Technology for Component Software - TAU 22 SIDL interface for Timers // // File: performance.sidl // version performance 1.0; package performance { class Timer { void start(); void stop(); void setName(in string name); string getName(); void setType(in string name); string getType(); void setGroupName(in string name); string getGroupName(); void setGroupId(in long group); long getGroupId(); }

CCA Common Component Architecture Performance Technology for Component Software - TAU 23 Using SIDL Interface for Timers // SIDL: #include "performance_Timer.hh" int main(int argc, char* argv[]) { performance::Timer t = performance::Timer::_create();... t.setName("Integrate timer"); t.start(); // Computation for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); }... t.stop(); return 0; }

CCA Common Component Architecture Performance Technology for Component Software - TAU 24 Performance Knowledge Component Describe and store “known” component’s performance –Benchmark characterizations in performance database –Empirical or analytical performance models Saved information about component performance –Use for performance-guided selection and deployment –Use for runtime adaptation Representation must be in common forms with standard means for accessing the performance information

CCA Common Component Architecture Performance Technology for Component Software - TAU 25 Performance Knowledge Repository Component performance repository –Implement in component architecture framework –Similar to CCA component repository [Alexandria] –Access by component infrastructure View performance knowledge as component (PKC) –PKC ports give access to performance knowledge – to other components back to original component –Store performance model for performance prediction –Component composition performance knowledge

CCA Common Component Architecture Performance Technology for Component Software - TAU 26 Component Performance Model User specified Inferred automatically by performance tool –Prior performance data –Expression –Parametric model Estimate performance of a single component by –Querying runtime performance data –Passing this to performance model for evaluation Integration of performance observation and knowledge components key to runtime selection of components

CCA Common Component Architecture Performance Technology for Component Software - TAU 27 Applications: Uintah (U. Utah) Scalability analysis

CCA Common Component Architecture Performance Technology for Component Software - TAU 28 Applications: VTF (ASCI ASAP Caltech) C++, C, F90, Python PDT, MPI

CCA Common Component Architecture Performance Technology for Component Software - TAU 29 Applications: SAMRAI (LLNL) C++ PDT, MPI SAMRAI timers (groups)

CCA Common Component Architecture Performance Technology for Component Software - TAU 30 TAU Status Instrumentation supported: –Source, preprocessor, compiler, MPI, runtime, virtual machine Languages supported: –C++, C, F90, Java, Python –HPF, ZPL, HPC++, pC++... Packages supported: –PAPI [UTK], PCL [FZJ] (hardware performance counter access), –Opari, PDT [UO,LANL,FZJ], DyninstAPI [U.Maryland] (instrumentation), –EXPERT, EPILOG[FZJ],Vampir[Pallas], Paraver [CEPBA] (visualization) Platforms supported: –IBM SP, SGI Origin, Sun, HP Superdome, HP/Compaq Tru64 ES, –Linux clusters (IA-32, IA-64, PowerPC, Alpha), Apple, Windows, –Hitachi SR8000, NEC SX, Cray T3E... Compilers suites supported: –GNU, Intel KAI (KCC, KAP/Pro), Intel, SGI, IBM, Compaq,HP, Fujitsu, Hitachi, Sun, Apple, Microsoft, NEC, Cray, PGI, Absoft, … Thread libraries supported: –Pthreads, SGI sproc, OpenMP, Windows, Java, SMARTS

CCA Common Component Architecture Performance Technology for Component Software - TAU 31 Concluding Remarks Complex component systems pose challenging performance analysis problems that require robust methodologies and tools New performance problems will arise –Instrumentation and measurement –Data analysis and presentation –Diagnosis and tuning Performance engineered components –Performance knowledge, observation, query and control Integration of performance technology

CCA Common Component Architecture Performance Technology for Component Software - TAU 32 Support Acknowledgement TAU and PDT support: –Department of Energy (DOE) DOE 2000 ACTS contract DOE MICS contract DOE ASCI Level 3 (LANL, LLNL) U. of Utah DOE ASCI Level 1 subcontract –DARPA –NSF National Young Investigator (NYI) award