Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Allen D. Malony Department of Computer and Information Science University of Oregon Performance Technology for Scientific (Parallel.
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
CCA Common Component Architecture Performance Technology for Component Software - TAU Allen D. Malony (U. Oregon) Sameer Shende (U. Oregon) Craig Rasmussen.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.
Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.
Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.
Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
A Hybrid Decomposition Scheme for Building Scientific Workflows Wei Lu Indiana University.
An Introduction to Software Architecture
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
SIAM Computational Science and Engineering1 10 February Components for Scientific Computing: An Introduction David E. Bernholdt Computer Science.
CCA Common Component Architecture CCA Forum Tutorial Working Group Introduction to Components.
Components for Beam Dynamics Douglas R. Dechow, Tech-X Lois Curfman McInnes, ANL Boyana Norris, ANL With thanks to the Common Component Architecture (CCA)
SCIRun and SPA integration status Steven G. Parker Ayla Khan Oscar Barney.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Modeling Component-based Software Systems with UML 2.0 George T. Edwards Jaiganesh Balasubramanian Arvind S. Krishna Vanderbilt University Nashville, TN.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Introducing Allors Applications, Tools & Platform.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson
Connections to Other Packages The Cactus Team Albert Einstein Institute
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
CCA Common Component Architecture CCA Forum Tutorial Working Group Writing Components.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
CCA Common Component Architecture CCA Forum Tutorial Working Group A Simple CCA Component Application.
CCA Common Component Architecture CCA Forum Tutorial Working Group Writing Components.
Background Computer System Architectures Computer System Software.
CCA Common Component Architecture CCA Forum Tutorial Working Group Common Component Architecture.
CCA Common Component Architecture CCA Forum Tutorial Working Group A Simple CCA Component.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Managing Data Resources File Organization and databases for business information systems.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Performance Technology for Scalable Parallel Systems
Allen D. Malony, Sameer Shende
A configurable binary instrumenter
Analysis models and design models
An Introduction to Software Architecture
Allen D. Malony Computer & Information Science Department
Outline Introduction Motivation for performance mapping SEAA model
Allen D. Malony, Sameer Shende
CS590L Distributed Component Architecture References: - Objects, components, and framenworks with UML, Ch A Component Based Services Architecture.
Parallel Program Analysis Framework for the DOE ACTS Toolkit
TAU Performance DataBase Framework (PerfDBF)
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University of Oregon Advances in the TAU Performance System

Dagstuhl, August 20022Advances in the TAU Performance System Outline  Complexity and performance technology  Was ist TAU?  Problems currently being investigated  Instrumentation control  Selective Instrumentation  Performance mapping  Callpath profiling  Performance data interaction, and steering  Online performance analysis and visualization  Performance analysis for component software  Concluding remarks

Dagstuhl, August 20023Advances in the TAU Performance System Complexity in Parallel and Distributed Systems  Complexity in computing system architecture  Diverse parallel and distributed system architectures  shared / distributed memory, cluster, hybrid, NOW, Grid, …  Sophisticated processor / memory / network architectures  Complexity in parallel software environment  Diverse parallel programming paradigms  Optimizing compilers and sophisticated runtime systems  Advanced numerical libraries and application frameworks  Hierarchical, multi-level software architectures  Multi-component, coupled simulation models

Dagstuhl, August 20024Advances in the TAU Performance System Complexity Determines Performance Requirements  Performance observability requirements  Multiple levels of software and hardware  Different types and detail of performance data  Alternative performance problem solving methods  Multiple targets of software and system application  Performance technology requirements  Broad scope of performance observation  Flexible and configurable mechanisms  Technology integration and extension  Cross-platform portability  Open, layered, and modular framework architecture

Dagstuhl, August 20025Advances in the TAU Performance System Complexity Challenges for Performance Tools  Computing system environment complexity  Observation integration and optimization  Access, accuracy, and granularity constraints  Diverse/specialized observation capabilities/technology  Restricted modes limit performance problem solving  Sophisticated software development environments  Programming paradigms and performance models  Performance data mapping to software abstractions  Uniformity of performance abstraction across platforms  Rich observation capabilities and flexible configuration  Common performance problem solving methods

Dagstuhl, August 20026Advances in the TAU Performance System General Problems (Performance Technology) How do we create robust and ubiquitous performance technology for the analysis and tuning of parallel and distributed software and systems in the presence of (evolving) complexity challenges? How do we apply performance technology effectively for the variety and diversity of performance problems that arise in the context of complex parallel and distributed computer systems? 

Dagstuhl, August 20027Advances in the TAU Performance System TAU Performance System Framework  Tuning and Analysis Utilities (aka Tools Are Us)  Performance system framework for scalable parallel and distributed high-performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable performance profiling/tracing facility  Open software approach

Dagstuhl, August 20028Advances in the TAU Performance System TAU Performance System Architecture EPILOG Paraver

Dagstuhl, August 20029Advances in the TAU Performance System Instrumentation Control  Selection of which performance events to observe  Could depend on scope, type, level of interest  Could depend on instrumentation overhead  How is selection supported in instrumentation system?  No choice  Include / exclude lists (TAU)  Environment variables  Static vs. dynamic  Problem: Controlling instrumentation of small routines  High relative measurement overhead  Significant intrusion and possible perturbation

Dagstuhl, August Advances in the TAU Performance System Rule-Based Overhead Analysis (N. Trebon, UO)  Analyze the performance data to determine events with high (relative) overhead performance measurements  Create a select list for excluding those events  Rule grammar (used in TAUreduce tool) [GroupName:] Field Operator Number  GroupName indicates rule applies to events in group  Field is a event metric attribute (from profile statistics)  numcalls, numsubs, percent, usec, cumusec, totalcount, stdev, usecs/call, counts/call  Operator is one of >, <, or =  Number is any number  Compound rules possible using & between simple rules

Dagstuhl, August Advances in the TAU Performance System Example Rules  #Exclude all events that are members of TAU_USER #and use less than 1000 microseconds TAU_USER:usec < 1000  #Exclude all events that have less than 100 #microseconds and are called only once usec < 1000 & numcalls = 1  #Exclude all events that have less than 1000 usecs per #call OR have a (total inclusive) percent less than 5 usecs/call < 1000 percent < 5  Scientific notation can be used

Dagstuhl, August Advances in the TAU Performance System TAUReduce Example  tau_reduce implements overhead reduction in TAU  Consider klargest example  Find kth largest element in a N elements  Compare two methods: quicksort, select_kth_largest  i = 2324, N = (uninstrumented)  quicksort: (wall clock) = secs  select_kth_largest: (wall clock) = secs  Total: (P3/1.2GHz time) = 0.340u 0.020s 0:00.37  Execution with all routines instrumented  Execution with rule-based selective instrumentation  usec>1000 & numcalls> & usecs/call 25

Dagstuhl, August Advances in the TAU Performance System Simple sorting example on one processor NODE 0;CONTEXT 0;THREAD 0: %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec msec usec/call , int main ,223 4, E E+07 1 void quicksort , int kth_largest_qs , int select_kth_largest , void sort_5elements ,435 1, E void interchange void setup int ceil Before selective instrumentation reduction NODE 0;CONTEXT 0;THREAD 0: %Time Exclusive Inclusive #Call #Subrs Inclusive Name msec total msec usec/call int main int kth_largest_qs int select_kth_largest void setup int ceil After selective instrumentation reduction

Dagstuhl, August Advances in the TAU Performance System Performance Mapping  Associate performance with “significant” entities (events)  Source code points are important  Functions, regions, control flow events, user events  Execution process and thread entities are important  Some entities are more abstract, harder to measure  Consider callgraph (callpath) profiling  Measure time (metric) along an edge (path) of callgraph  Incident edge gives parent / child view  Edge sequence (path) gives parent / descendant view  Problem: Callpath profiling when callgraph is unknown  Determine callgraph dynamically at runtime  Map performance measurement to dynamic call path state

Dagstuhl, August Advances in the TAU Performance System A B C D HG FE I Callgraph (Callpath) Profiling (S. Shende, UO)  0-level callpath  Callgraph node  A  1-level callpath  Immediate descendant  A  B, E  I, D  H  C  H ?  k-level callpath (k>1)  k call descendant  2-level: A  D, C  I  2-level: A  I ?  3-level: A  H     

Dagstuhl, August Advances in the TAU Performance System 1-Level Callpath Implementation in TAU  TAU maintains a performance event (routine) callstack  Profiled routine (child) looks in callstack for parent  Previous profiled performance event is the parent  A callpath profile structure created first time parent calls  TAU records parent in a callgraph map for child  String representing 1-level callpath used as its key  “a( )=>b( )” : name for time spent in “b” when called by “a”  Map returns pointer to callpath profile structure  1-level callpath is profiled using this profiling data  Build upon TAU’s performance mapping technology  Measurement is independent of instrumentation

Dagstuhl, August Advances in the TAU Performance System Performance Monitoring and Steering  Desirable to monitor performance during execution  Long-running applications  Steering computations for improved performance  Large-scale parallel applications complicate solutions  More parallel threads of execution producing data  Large amount of performance data (relative) to access  Analysis and visualization more difficult  Problem: Online performance data access and analysis  Incremental profile sampling (based on files)  Integration in computational steering system  Dynamic performance measurement and access

Dagstuhl, August Advances in the TAU Performance System Online Performance Analysis (K. Li, UO) Application Performance Steering Performance Visualizer Performance Analyzer Performance Data Reader TAU Performance System Performance Data Integrator SCIRun (Univ. of Utah) // performance data streams // performance data output file system sample sequencing reader synchronization accumulated samples

Dagstuhl, August Advances in the TAU Performance System 2D Field Performance Visualization in SCIRun SCIRun program

Dagstuhl, August Advances in the TAU Performance System Uintah Computational Framework (UCF)  University of Utah  UCF analysis  Scheduling  MPI library  Components  500 processes  Use for online and offline visualization  Apply SCIRun steering

Dagstuhl, August Advances in the TAU Performance System Performance Analysis of Component Software  Complexity in scientific problem solving addressed by advances in software development environments and rich layered software middleware and libraries  Increases complexity in performance problem solving  Integration barriers for performance technology  Incompatible with advanced software technology  Inconsistent with software engineering process  Problem: Performance engineering for component systems  Respect software development methodology  Leverage software implementation technology  Look for opportunities for synergy and optimization

Dagstuhl, August Advances in the TAU Performance System Focus on Component Technology and CCA  Emerging component technology for HPC and Grid  Component: software object embedding functionality  Component architecture (CA): how components connect  Component framework: implements a CA  Common Component Architecture (CCA)  Standard foundation for scientific component architecture  Component descriptions  Scientific Interface Description Language (SIDL)  CCA ports for component interactions (provides and uses)  CCA services: directory, registery, connection, event  High-performance components and interactions

Dagstuhl, August Advances in the TAU Performance System Extended Component Design  POC and PKC are compliant with component architecture  Component composition performance engineering  Utilize technology and services of component framework generic component

Dagstuhl, August Advances in the TAU Performance System  Each component advertises its services  Performance component:  Timer (start/stop)  Event (trigger)  Query (timers…)  Knowledge (component performance model)  Prototype implementation of timer  CCAFFEINE reference framework   SIDL  Instantiate with TAU functionality Architecture of a Performance Component Performance Component Timer Event Query Knowledge Ports

Dagstuhl, August Advances in the TAU Performance System TimerPort Interface Declaration in CCAFEINE namespace performance{ namespace ccaports{ /** * This abstract class declares the Timer interface. * Inherit from this class to provide functionality. */ class Timer: /* implementation of port */ public virtual gov::cca::Port { /* inherits from port spec */ public: virtual ~ Timer (){ } /** * Start the Timer. Implement this function in * a derived class to provide required functionality. */ virtual void start(void) = 0; /* virtual methods with */ virtual void stop(void) = 0; /* null implementations */... }  Create Timer port abstraction

Dagstuhl, August Advances in the TAU Performance System Using Performance Component Timer // Get Timer port from CCA framework services form CCAFFEINE port = frameworkServices->getPort ("TimerPort"); if (port) timer_m = dynamic_cast (port); if (timer_m == 0) { cerr << "Connected to something, not a Timer port" << endl; return -1; } string s = "IntegrateTimer"; // give name for timer timer_m->setName(s); // assign name to timer timer_m->start(); // start timer (independent of tool) for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); } timer_m->stop(); // stop timer  Component uses framework services to get TimerPort  Use of this TimerPort interface is independent of TAU

Dagstuhl, August Advances in the TAU Performance System Using SIDL for Language Interoperability // // File: performance.sidl // version performance 1.0; package performance { class Timer { void start(); void stop(); void setName(in string name); string getName(); void setType(in string name); string getType(); void setGroupName(in string name); string getGroupName(); void setGroupId(in long group); long getGroupId(); }  Can create Timer interface in SIDL for creating stubs

Dagstuhl, August Advances in the TAU Performance System Using SIDL Interface for Timers // SIDL: #include "performance_Timer.hh" int main(int argc, char* argv[]) { performance::Timer t = performance::Timer::_create();... t.setName("Integrate timer"); t.start(); // Computation for (int i = 0; i < count; i++) { double x = random_m->getRandomNumber (); sum = sum + function_m->evaluate (x); }... t.stop(); return 0; }  C++ program that uses the SIDL Timer interface  Again, independent of timer implementations (e.g., TAU)

Dagstuhl, August Advances in the TAU Performance System Using TAU Component in CCAFEINE repository get TauTimer /* get TAU component from repository */ repository get Driver /* get application components */ repository get MidpointIntegrator repository get MonteCarloIntegrator repository get RandomGenerator repository get LinearFunction repository get NonlinearFunction repository get PiFunction create LinearFunction lin_func /* create component instances */ create NonlinearFunction nonlin_func create PiFunction pi_func create MonteCarloIntegrator mc_integrator create RandomGenerator rand create TauTimer tau /* create TAU component instance */ /* connecting components and running */ connect mc_integrator RandomGeneratorPort rand RandomGeneratorPort connect mc_integrator FunctionPort nonlin_func FunctionPort connect mc_integrator TimerPort tau TimerPort create Driver driver connect driver IntegratorPort mc_integrator IntegratorPort go driver Go quit

Dagstuhl, August Advances in the TAU Performance System Concluding Remarks  Complex software and parallel computing systems pose challenging performance analysis problems that require robust methodologies and tools  To build more sophisticated performance tools, existing proven performance technology must be utilized  Performance tools must be integrated with software and systems models and technology  Performance engineered software  Function consistently and coherently in software and system environments  TAU performance system offers robust performance technology that can be broadly integrated

Dagstuhl, August Advances in the TAU Performance System