Sameer Shende Department of Computer and Information Science NeuroInformatics Center University of Oregon Generating Proxy Components.

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Allen D. Malony Department of Computer and Information Science University of Oregon Performance Technology for Scientific (Parallel.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende, Allen D. Malony {sameer, Department of Computer and Information Science Computational Science Institute University.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Profiling S3D on Cray XT3 using TAU Sameer Shende
TAU: Tuning and Analysis Utilities. TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
June 2, 2003ICCS Performance Instrumentation and Measurement for Terascale Systems Jack Dongarra, Shirley Moore, Philip Mucci University of Tennessee.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Sameer Shende, Allen D. Malony and Alan Morris {sameer, malony, Department of Computer and Information Science Performance Research.
Scalability Study of S3D using TAU Sameer Shende
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Observation Sameer Shende and Allen D. Malony cs.uoregon.edu.
Using TAU Performance Technology in ESMF Sameer Shende, Nancy Collins University of Oregon, UCAR
A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
CCA Common Component Architecture CCA Forum Tutorial Working Group Contributors: Language Interoperability Using Gary.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Getting Started with SIDL using the ANL SIDL Environment (ASE) ANL SIDL Team MCS Division, ANL April 2003 The ANL SIDL compilers are based on the Scientific.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
CCA Common Component Architecture CCA Forum Tutorial Working Group Writing Components.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Performance Technology for Parallel Component Software
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Generating Proxy Components using PDT
Presentation transcript:

Sameer Shende Department of Computer and Information Science NeuroInformatics Center University of Oregon Generating Proxy Components using PDT

Boulder CCA Meeting 2 Apr. 15, 2004 Outline  Overview of the TAU and PDT projects  Proxy Component  Auto-generation of proxies  Applications  Concluding remarks

Boulder CCA Meeting 3 Apr. 15, 2004 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high- performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable, configurable performance profiling/tracing facility  Open software approach  University of Oregon, LANL, FZJ Germany 

Boulder CCA Meeting 4 Apr. 15, 2004 TAU Performance System Architecture EPILOG Paraver

Boulder CCA Meeting 5 Apr. 15, 2004 TAU’s Paraprof Profile Browser (ESMF Data)

Boulder CCA Meeting 6 Apr. 15, 2004 Callpath Profiling in TAU

Boulder CCA Meeting 7 Apr. 15, 2004 Program Database Toolkit Component source/ Library C / C++ parser Fortran 77/90/95 parser C / C++ IL analyzer Fortran 77/90/95 IL analyzer Program Database Files IL DUCTAPE tau_pg SILOON CHASM TAU_instr Proxy Component Application component glue C++ / F90 interoperability Automatic source instrumentation

Boulder CCA Meeting 8 Apr. 15, 2004 Program Database Toolkit (PDT)  Program code analysis framework for developing source-based tools for C99, C++ and F90  High-level interface to source code information  Widely portable:  IBM (AIX, Linux Power4), SGI, Compaq, HP, Sun, Linux clusters,Windows, Apple, Hitachi, Cray X1,T3E, RedStorm...  Integrated toolkit for source code parsing, database creation, and database query  commercial grade front end parsers  EDG for C99/C++  Mutek Solutions for F90  Cleanscape Flint Parser for F77/F90/F95  Intel/KAI C++ headers for std. C++ library distributed with PDT  portable IL analyzer, database format, and access API  open software approach for tool development  Target and integrate multiple source languages  Used in TAU to build automated performance instrumentation tools  Used in CHASM, XMLGEN, Component method signature extraction,…

Boulder CCA Meeting 9 Apr. 15, 2004 CCA Performance Observation Component  Design measurement port and measurement interfaces  Timer  start/stop  set name/type/group  Control  enable/disable groups  Query  get timer names  metrics, counters, dump to disk  Event  user-defined events

Boulder CCA Meeting 10 Apr. 15, 2004 CCA C++ (CCAFFEINE) Performance Interface namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){} /* Create a Timer interface */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type, string group) = 0; /* Create a Query interface */ virtual performance::Query* createQuery(void) = 0; /* Create a user-defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0; /* Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; } Measurement port Measurement interfaces

Boulder CCA Meeting 11 Apr. 15, 2004 CCA Timer Interface Declaration namespace performance { class Timer { public: virtual ~Timer() {} /* Implement methods in a derived class to provide functionality */ /* Start and stop the Timer */ virtual void start(void) = 0; virtual void stop(void) = 0; /* Set name and type for Timer */ virtual void setName(string name) = 0; virtual string getName(void) = 0; virtual void setType(string name) = 0; virtual string getType(void) = 0; /* Set the group name and group type associated with the Timer */ virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0; virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; } Timer interface methods

Boulder CCA Meeting 12 Apr. 15, 2004 Use of Observation Component in CCA Example #include "ports/Measurement_CCA.h"... double MonteCarloIntegrator::integrate(double lowBound, double upBound, int count) { classic::gov::cca::Port * port; double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast (port); if (measurement_m == 0){ cerr << "Connected to something other than a Measurement port"; return -1; } static performance::Timer* t = measurement_m->createTimer( string("IntegrateTimer")); t->start(); for (int i = 0; i getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop(); }

Boulder CCA Meeting 13 Apr. 15, 2004 Measurement Port Implementation  Use of Measurement port (i.e., instrumentation)  independent of choice of measurement tool  independent of choice of measurement type  TAU performance observability component  Implements the Measurement port  Implements Timer, Control, Query, Control  Port can be registered with the CCAFEINE framework  Components instrument to generic Measurement port  Runtime selection of TAU component during execution  TauMeasurement_CCA port implementation uses a specific TAU library for choice of measurement type

Boulder CCA Meeting 14 Apr. 15, 2004 What’s Going On Here? TAU API runtime TAU performance data TAU API application component performance component other API … Alternative implementations of performance component Two instrumentation paths using TAU API Two query and control paths using TAU API application component

Boulder CCA Meeting 15 Apr. 15, 2004 Proxy Component  Interpose a proxy component for each port  Inside the proxy, track caller/callee invocations, timings  Automate the process of proxy component creation  Using PDT for static analysis of components MidpointIntegrator IntegratorPortGo Driver IntegratorPort IntegratorProxy Component IntegratorPortUsesIntegratorPortProvides MeasurementPort Performance

Boulder CCA Meeting 16 Apr. 15, 2004 TAU’s Proxy Generator for Classic C++ Interface  Proxy generator arguments:  -p -t -c -d -o -f -x e.g., % tau_pg -c integrators::ccaports::Integrator -t integrators.ccaports.Integrator -p IntegratorPort -d ParallelIntegrator_CCA.pdb -o Proxy.cc -h ports/Integrator_CCA.h -f select.dat –x ParallelInt  Creating PDB file: % cxxparse -I -D creates file.pdb. % pdbmerge -o merged.pdb file1.pdb file2.pdb …

Boulder CCA Meeting 17 Apr. 15, 2004 Selective Instrumentation  Exclude or include list of routines % tau_pg … -f select.dat % cat select.dat # Selective instrumentation: Specify an exclude/include list of routines/files. BEGIN_EXCLUDE_LIST void quicksort(int *, int, int) void sort_5elements(int *) void interchange(int *, int *) END_EXCLUDE_LIST # or use BEGIN_INCLUDE_LIST END_INCLUDE_LIST to bracket the event names # Instruments routines in Main.cpp, Foo?.c and *.C files only # Use BEGIN_[FILE]_INCLUDE_LIST with END_[FILE]_INCLUDE_LIST

Boulder CCA Meeting 18 Apr. 15, 2004 Flame Reaction-Diffusion Demonstration CCAFFEINE

Boulder CCA Meeting 19 Apr. 15, 2004 CFRFS Profiles using Proxy Generator NODE 0;CONTEXT 0;THREAD 0: %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call ,374 1: int main(int, char **) ,177 1: driver_proxy::go() ,023 48, rk2_proxy::Advance() ,914 45, ee_proxy::Regrid() ,368 34, flux_proxy::compute() ,862 21, sc_proxy::compute() 9.0 9,089 9, efm_proxy::compute() 3.7 2,216 3, grace_proxy::GC_Synch() , grace_proxy::GC_regrid_above_threshold() icc_proxy::restrict() icc_proxy::prolong() MPI_Init() TAU_GET_FUNCTION_VALUES() MPI_Isend() c_proxy::compute() stats_proxy::compute() MPI_Waitsome() MPI_Comm_dup() MPI_Allreduce() rk2_proxy::GetStableTimestep() compute void ( VectorFieldVariable *) bc_proxy::compute() MPI_Barrier()

Boulder CCA Meeting 20 Apr. 15, 2004 Performance Modeling  Use MasterMind Component [IPDPS’04] with Measurement Component to track each argument invocation  Proxy for Mastermind component currently tracks all methods.  Develop performance models based on measurements  Specify performance models to a model evaluator library (being developed at UO/Sandia [Nick Trebon, J. Ray]) for evaluating performance models of component ensembles  Specifying performance models research  Performance Database ties in historical performance data

Boulder CCA Meeting 21 Apr. 15, 2004 TAU Performance Database Framework Performance analysis programs Performance analysis and query toolkit  profile data only  XML representation  project / experiment / trial PerfDML translators... ORDB PostgreSQL PerfDB Performance data description Raw performance data Other tools

Boulder CCA Meeting 22 Apr. 15, 2004 Proxy Generator for other Applications  PDT based proxy component for:  QoS tracking [Boyana, ANL]  Debugging Port Monitor (tracks arguments)  SCIRun2 Perfume components [Venkat, U. Utah]  Exploring Babel for auto-generation of proxies:  Direct SIDL to proxy code generation  Generating client component interface in C++, using PDT for generating proxies

Boulder CCA Meeting 23 Apr. 15, 2004 Concluding Remarks  Complex component systems pose challenging performance analysis problems that require robust methodologies and tools  Automating Instrumentation of Component Software  Performance Measurement  Performance Prediction  Debugging  Performance-aware (QoS) intelligent components  Performance engineered components  Performance knowledge, observation, query and control

Support Acknowledgement  TAU and PDT support:  Department of Energy (DOE)  DOE MICS contracts  DOE ASCI Level 3 (LANL, LLNL)  U. of Utah DOE ASCI Level 1 subcontract  NSF National Young Investigator (NYI) award