Generating Proxy Components using PDT

Slides:



Advertisements
Similar presentations
Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Sameer Shende Department of Computer and Information Science NeuroInformatics Center University of Oregon Generating Proxy Components.
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Allen D. Malony Department of Computer and Information Science University of Oregon Performance Technology for Scientific (Parallel.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende, Allen D. Malony {sameer, Department of Computer and Information Science Computational Science Institute University.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen.
Profiling S3D on Cray XT3 using TAU Sameer Shende
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
TAU Parallel Performance System DOD UGC 2004 Tutorial Allen D. Malony, Sameer Shende, Robert Bell Univesity of Oregon.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Evaluation of S3D using TAU Sameer Shende
TAU: Performance Regression Testing Harness for FLASH Sameer Shende
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Sameer Shende, Allen D. Malony and Alan Morris {sameer, malony, Department of Computer and Information Science Performance Research.
Scalability Study of S3D using TAU Sameer Shende
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
The TAU Performance System Sameer Shende, Allen D. Malony, Robert Bell University of Oregon.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Performance Observation Sameer Shende and Allen D. Malony cs.uoregon.edu.
Using TAU Performance Technology in ESMF Sameer Shende, Nancy Collins University of Oregon, UCAR
A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Using TAU on SiCortex Alan Morris, Aroon Nataraj Sameer Shende, Allen D. Malony University of Oregon {amorris, anataraj, sameer,
CCA Common Component Architecture CCA Forum Tutorial Working Group Contributors: Language Interoperability Using Gary.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University of Oregon Performance Technology.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Components for Beam Dynamics Douglas R. Dechow, Tech-X Lois Curfman McInnes, ANL Boyana Norris, ANL With thanks to the Common Component Architecture (CCA)
Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.
Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.
Allen D. Malony, Sameer S. Shende, Alan Morris, Robert Bell, Kevin Huck, Nick Trebon, Suravee Suthikulpanit, Kai Li, Li Li
Allen D. Malony, Sameer S. Shende, Robert Bell Kai Li, Li Li, Kevin Huck Department of Computer.
Getting Started with SIDL using the ANL SIDL Environment (ASE) ANL SIDL Team MCS Division, ANL April 2003 The ANL SIDL compilers are based on the Scientific.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
CCA Common Component Architecture CCA Forum Tutorial Working Group Writing Components.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
“Port Monitor”: progress & open questions Torsten Wilde and James Kohl Oak Ridge National Laboratory CCA Forum Quarterly Meeting Santa Fe, NM ~ October.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
TAU integration with Score-P
Allen D. Malony, Sameer Shende
TAU Parallel Performance System
A configurable binary instrumenter
TAU Parallel Performance System
TAU: A Framework for Parallel Performance Analysis
Performance Technology for Parallel Component Software
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Generating Proxy Components using PDT Sameer Shende sameer@cs.uoregon.edu Department of Computer and Information Science NeuroInformatics Center University of Oregon

Outline Overview of the TAU and PDT projects Proxy Component Auto-generation of proxies Applications Concluding remarks Apr. 15, 2004

TAU Performance System Framework Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high-performance computing Targets a general complex system computation model nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable, configurable performance profiling/tracing facility Open software approach University of Oregon, LANL, FZJ Germany http://www.cs.uoregon.edu/research/paracomp/tau Apr. 15, 2004

TAU Performance System Architecture Paraver EPILOG Apr. 15, 2004

TAU’s Paraprof Profile Browser (ESMF Data)

Callpath Profiling in TAU Apr. 15, 2004

Program Database Toolkit Component source/ Library C / C++ parser Fortran 77/90/95 IL analyzer Program Database Files IL DUCTAPE tau_pg SILOON CHASM TAU_instr Proxy Application component glue C++ / F90 interoperability Automatic source instrumentation Apr. 15, 2004

Program Database Toolkit (PDT) Program code analysis framework for developing source-based tools for C99, C++ and F90 High-level interface to source code information Widely portable: IBM (AIX, Linux Power4), SGI, Compaq, HP, Sun, Linux clusters,Windows, Apple, Hitachi, Cray X1,T3E, RedStorm... Integrated toolkit for source code parsing, database creation, and database query commercial grade front end parsers EDG for C99/C++ Mutek Solutions for F90 Cleanscape Flint Parser for F77/F90/F95 Intel/KAI C++ headers for std. C++ library distributed with PDT portable IL analyzer, database format, and access API open software approach for tool development Target and integrate multiple source languages Used in TAU to build automated performance instrumentation tools Used in CHASM, XMLGEN, Component method signature extraction,… Apr. 15, 2004

CCA Performance Observation Component Design measurement port and measurement interfaces Timer start/stop set name/type/group Control enable/disable groups Query get timer names metrics, counters, dump to disk Event user-defined events Apr. 15, 2004

CCA C++ (CCAFFEINE) Performance Interface namespace performance { namespace ccaports { class Measurement: public virtual classic::gov::cca::Port { public: virtual ~ Measurement (){} /* Create a Timer interface */ virtual performance::Timer* createTimer(void) = 0; virtual performance::Timer* createTimer(string name) = 0; virtual performance::Timer* createTimer(string name, string type) = 0; virtual performance::Timer* createTimer(string name, string type, string group) = 0; /* Create a Query interface */ virtual performance::Query* createQuery(void) = 0; /* Create a user-defined Event interface */ virtual performance::Event* createEvent(void) = 0; virtual performance::Event* createEvent(string name) = 0; /* Create a Control interface for selectively enabling and disabling * the instrumentation based on groups */ virtual performance::Control* createControl(void) = 0; }; } Measurement port Measurement interfaces Apr. 15, 2004

CCA Timer Interface Declaration namespace performance { class Timer { public: virtual ~Timer() {} /* Implement methods in a derived class to provide functionality */ /* Start and stop the Timer */ virtual void start(void) = 0; virtual void stop(void) = 0; /* Set name and type for Timer */ virtual void setName(string name) = 0; virtual string getName(void) = 0; virtual void setType(string name) = 0; virtual string getType(void) = 0; /* Set the group name and group type associated with the Timer */ virtual void setGroupName(string name) = 0; virtual string getGroupName(void) = 0; virtual void setGroupId(unsigned long group ) = 0; virtual unsigned long getGroupId(void) = 0; }; } Timer interface methods Apr. 15, 2004

Use of Observation Component in CCA Example #include "ports/Measurement_CCA.h" ... double MonteCarloIntegrator::integrate(double lowBound, double upBound, int count) { classic::gov::cca::Port * port; double sum = 0.0; // Get Measurement port port = frameworkServices->getPort ("MeasurementPort"); if (port) measurement_m = dynamic_cast < performance::ccaports::Measurement * >(port); if (measurement_m == 0){ cerr << "Connected to something other than a Measurement port"; return -1; } static performance::Timer* t = measurement_m->createTimer( string("IntegrateTimer")); t->start(); for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } t->stop(); } Apr. 15, 2004

Measurement Port Implementation Use of Measurement port (i.e., instrumentation) independent of choice of measurement tool independent of choice of measurement type TAU performance observability component Implements the Measurement port Implements Timer, Control, Query, Control Port can be registered with the CCAFEINE framework Components instrument to generic Measurement port Runtime selection of TAU component during execution TauMeasurement_CCA port implementation uses a specific TAU library for choice of measurement type Apr. 15, 2004

… What’s Going On Here? application component application performance Two instrumentation paths using TAU API runtime TAU performance data TAU API application component performance other API … TAU API Alternative implementations of performance component Two query and control paths using TAU API Apr. 15, 2004

IntegratorProxy Component Interpose a proxy component for each port Inside the proxy, track caller/callee invocations, timings Automate the process of proxy component creation Using PDT for static analysis of components Go Driver IntegratorPort MidpointIntegrator IntegratorPort IntegratorProxy Component IntegratorPortUses IntegratorPortProvides MeasurementPort Performance Apr. 15, 2004

TAU’s Proxy Generator for Classic C++ Interface Proxy generator arguments: -p <port name> -t <type> -c <component> -d <PDB file> -o <output file> -f <selective instrumentation file> -x <Component tag> e.g., % tau_pg -c integrators::ccaports::Integrator -t integrators.ccaports.Integrator -p IntegratorPort -d ParallelIntegrator_CCA.pdb -o Proxy.cc -h ports/Integrator_CCA.h -f select.dat –x ParallelInt Creating PDB file: % cxxparse <file.cpp> -I<dir> -D<flags> creates file.pdb. % pdbmerge -o merged.pdb file1.pdb file2.pdb … Apr. 15, 2004

Selective Instrumentation Exclude or include list of routines % tau_pg … -f select.dat % cat select.dat # Selective instrumentation: Specify an exclude/include list of routines/files. BEGIN_EXCLUDE_LIST void quicksort(int *, int, int) void sort_5elements(int *) void interchange(int *, int *) END_EXCLUDE_LIST # or use BEGIN_INCLUDE_LIST END_INCLUDE_LIST to bracket the event names # Instruments routines in Main.cpp, Foo?.c and *.C files only # Use BEGIN_[FILE]_INCLUDE_LIST with END_[FILE]_INCLUDE_LIST Apr. 15, 2004

Flame Reaction-Diffusion Demonstration CCAFFEINE Apr. 15, 2004

CFRFS Profiles using Proxy Generator NODE 0;CONTEXT 0;THREAD 0: --------------------------------------------------------------------------------------- %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call 100.0 3,374 1:40.763 1 7 100763742 int main(int, char **) 95.8 1,177 1:36.525 1 391 96525455 driver_proxy::go() 48.6 9,023 48,947 15 9640 3263162 rk2_proxy::Advance() 44.8 43,914 45,151 7 138 6450244 ee_proxy::Regrid() 34.3 3,368 34,559 594 7129 58180 flux_proxy::compute() 21.7 21,862 21,862 1188 0 18403 sc_proxy::compute() 9.0 9,089 9,089 1188 0 7651 efm_proxy::compute() 3.7 2,216 3,707 210 11250 17655 grace_proxy::GC_Synch() 1.2 841 1,225 3 1496 408607 grace_proxy::GC_regrid_above_threshold() 1.0 980 980 123 0 7970 icc_proxy::restrict() 0.9 943 943 4460 0 212 icc_proxy::prolong() 0.9 863 863 1 39 863722 MPI_Init() 0.8 772 772 16764 0 46 TAU_GET_FUNCTION_VALUES() 0.6 613 614 869 869 707 MPI_Isend() 0.4 432 436 15 24 29093 c_proxy::compute() 0.4 288 409 30 120 13665 stats_proxy::compute() 0.4 393 393 954 0 413 MPI_Waitsome() 0.2 217 217 6 18 36282 MPI_Comm_dup() 0.2 182 182 215 0 849 MPI_Allreduce() 0.1 3 126 15 75 8402 rk2_proxy::GetStableTimestep() 0.1 101 120 15 45 8062 compute void ( VectorFieldVariable *) 0.1 62 62 533 0 118 bc_proxy::compute() 0.0 17 17 3 0 5729 MPI_Barrier() Apr. 15, 2004

Performance Modeling Use MasterMind Component [IPDPS’04] with Measurement Component to track each argument invocation Proxy for Mastermind component currently tracks all methods. Develop performance models based on measurements Specify performance models to a model evaluator library (being developed at UO/Sandia [Nick Trebon, J. Ray]) for evaluating performance models of component ensembles Specifying performance models research Performance Database ties in historical performance data Apr. 15, 2004

TAU Performance Database Framework Performance analysis programs Performance data description Raw performance data Other tools PerfDML translators Performance analysis and query toolkit . . . ORDB PostgreSQL PerfDB profile data only XML representation project / experiment / trial Apr. 15, 2004

Proxy Generator for other Applications PDT based proxy component for: QoS tracking [Boyana, ANL] Debugging Port Monitor (tracks arguments) SCIRun2 Perfume components [Venkat, U. Utah] Exploring Babel for auto-generation of proxies: Direct SIDL to proxy code generation Generating client component interface in C++, using PDT for generating proxies Apr. 15, 2004

Concluding Remarks Complex component systems pose challenging performance analysis problems that require robust methodologies and tools Automating Instrumentation of Component Software Performance Measurement Performance Prediction Debugging Performance-aware (QoS) intelligent components Performance engineered components Performance knowledge, observation, query and control Apr. 15, 2004

Support Acknowledgement TAU and PDT support: Department of Energy (DOE) DOE MICS contracts DOE ASCI Level 3 (LANL, LLNL) U. of Utah DOE ASCI Level 1 subcontract NSF National Young Investigator (NYI) award