Download presentation
Presentation is loading. Please wait.
1
Sameer Shende, Allen D. Malony and Alan Morris {sameer, malony, amorris}@cs.uoregon.edu Department of Computer and Information Science Performance Research Laboratory, NeuroInformatics Center University of Oregon Analysis Infrastructure for CQoS using TAU
2
2 Acknowledgement Jaideep Ray, SNL Lois McIness, ANL David Bernholdt, ORNL Boyana Norris, ANL Robert Yelle, U. Oregon
3
3 Outline Motivation: CQoS Instrumentation Measurement Analysis tools
4
4 CQoS in GAMESS Robert Yelle, PRL, U. Oregon ryelle@uoregon.eduryelle@uoregon.edu Calculate the energy of Thiophene molecule using different algorithms S FINAL U-B3LYP ENERGY IS -552.9083139587 AFTER 21 ITERATIONS FINAL U-BLYP ENERGY IS -552.9861184848 AFTER 22 ITERATIONS FINAL UHF ENERGY IS -551.3483315053 AFTER 11 ITERATIONS FINAL U-SVWN ENERGY IS -550.2734639639 AFTER 22 ITERATIONS
5
5 TAU Performance System Framework Tuning and Analysis Utilities Performance system framework for scalable parallel and distributed high- performance computing Targets a general complex system computation model nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction Integrated toolkit for performance instrumentation, measurement, analysis, and visualization Portable, configurable performance profiling/tracing facility Open software approach University of Oregon, LANL, FZJ Germany http://www.cs.uoregon.edu/research/paracomp/tau http://www.cs.uoregon.edu/research/paracomp/tau
6
6 TAU Performance System Architecture event selection
7
7 Performance Evaluation Alternatives Flat profile Depthlimit profile Parameter profile Callpath/ callgraph profile Phase profile Trace Volume of performance data Each alternative has: - one metric/counter - multiple counters
8
8 Enhancements in TAU to support CQoS Instrumentation Runtime MPI wrapper interposition for CCA framework instrumentation Automatic proxy component creation for classic and SIDL components PDT v3.10 (coming, beta released) supports EDG v3.8 for better C/C++ parsing support (GNU extensions, BOOST, ASM statements) Profile Measurement Parameter based profiling to capture application data Context Events to capture callpath with user Support for memory profiling and memory leak detection Timestamped profile snapshots (coming) Analysis Extensions to PerfDMF to support model storage Application specific metadata ParaProf extensions to display profile snapshots, parameter based profiles PerfExplorer data mining framework Web based access to performance database via a TAU portal Ability to store images, share data, metadata
9
9 TAU’s CCA Performance Component: Core API Measurement port and interfaces Timer set name/type/group start/stop Phase set name/type/group start/stop Control enable/disable groups Query get timer names, get metric names, get user-defined event names get timer data, get user-defined event data, dump data to disk Event set name, trigger event Context Event (callpath of routines + user event information) set name, trigger event MemoryTracker and MemoryHeadroomTracker enable interrupt tracking, track memory/headroom here, set interrupt interval enable/disable tracking memory/headroom
10
10 Performance evaluation using Performance component Uses underlying TAU library for measurement Timer, Phase, Event/ContextEvent, Control, Query, MemoryTracker/MemoryHeadroomTracker interfaces Lightweight instrumentation option Performance modeling using Mastermind component Tracks per-invocation performance data Associates performance data with application data Method arguments logged with performance data Callpath information Helps us build performance models Updated performance component 1.7.2 released Jan. ’07 TAU’s CCA Interfaces
11
11 Phase Interface interface Timer { /* Start/stop the Timer */ void start(); void stop(); /* Set/get the Timer name */ void setName(in string name); string getName(); /* Set/get Timer type information (e.g., signature of the routine) */ void setType(in string name); string getType(); /* Set/get the group name associated with the Timer */ void setGroupName(in string name); string getGroupName(); /* Set/get the group id associated with the Timer */ void setGroupId(in long group); long getGroupId(); } interface Measurement extends gov.cca.Port { /* Create a Timer */ Timer createTimer(); Timer createTimerWithName(in string name); Timer createTimerWithNameType(in string name, in string type); Timer createTimerWithNameTypeGroup(in string name, in string type, in string group); interface Phase { /* Start/stop the Phase */ void start(); void stop(); /* Set/get the Phase name */ void setName(in string name); string getName(); /* Set/get Phase type information (e.g., signature of the routine) */ void setType(in string name); string getType(); /* Set/get the group name associated with the Phase */ void setGroupName(in string name); string getGroupName(); /* Set/get the group id associated with the Phase */ void setGroupId(in long group); long getGroupId(); } interface Measurement extends gov.cca.Port { /* Create a Phase */ Phase createPhase(); Phase createPhaseWithName(in string name); Phase createPhaseWithNameType(in string name, in string type); Phase createPhaseWithNameTypeGroup(in string name, in string type, in string group);
12
12 Measurement Proxy Component Interpose a proxy component for each port Inside the proxy Make calls to Performance component for each invocation MidpointIntegrator IntegratorPort Go Driver IntegratorPort IntegratorProxy Component IntegratorPortUsesIntegratorPortProvides MeasurementPort Performance MeasurementPort
13
13 MasterMind Component Idea: Create a performance model for the component by tracking performance per invocation Uses Monitor Port Outputs: Times per invocation, e.g. Component call path Regular performance data (uses performance component) # integ_proxy::integrate(double, double, int) # MPI_TIME Time count lowBound upBound 72420 336 10000 0 1 407 449 1000 0 1 364 540 100 0 1 64838 844 10000 0 1 381 945 1000 0 1 332 1027 100 0 1
14
14 Monitor Proxy Component Same idea (from the user’s point of view) MidpointIntegrator IntegratorPort Go Driver IntegratorPort Integrator Monitor Proxy IntegratorPortUsesIntegratorPortProvides MonitorPort MasterMind MeasurementPort Performance
15
15 Tree pruner Input: Callgraph generated by Mastermind component User specified rules Output: Pruned callgraph with insignificant nodes removed Performance modeling library – brute force Tries all possible permutations of component instances Input: performance model of each component Selects optimal component assembly for the ensemble Optimizer Swaps one component instance with another Tools Included with MasterMind Component
16
16 Generate regular measurement proxy or monitor (MasterMind) proxy Arguments: Options: TAU’s Proxy Generator for SIDL/Classic CCA -c Full name of the component -t Type of component -p Name of port to generate proxy for -d Name of pdb file created from cxxparse -h Header file for this port -n Name of the proxy component (default: base of component name + Proxy) -o Name of output file (default: proxy.cc) -f Use Pre-generated Selective instrumentation file -x Namespace Tag -m Generate MasterMind component proxy
17
17 TAU’s Proxy Generator for Classic C++ Interface Creating PDB Files: Merging PDB Files: Invoking tau_pg (example) pdbmerge -o merged.pdb file1.pdb file2.pdb … cxxparse -I -D tau_pg -c integrators::ccaports::Integrator -t integrators.ccaports.Integrator -p IntegratorPort -d ParallelIntegrator_CCA.pdb -o Proxy.cc -h ports/Integrator_CCA.h -f select.dat
18
18 What’s Going On Here? Alternative implementations of performance component runtime TAU performance data TAU API other API … Application Component Application Component Performance Component TAU API Application Component Application Component
19
19 Multi-Level Instrumentation Inter-Component Proxy components created automatically Proxy interposed between caller and callee Intra-Component PDT based source instrumentation Compiler scripts mpif90 => tau_f90.sh mpicxx => tau_cxx.sh mpicc => tau_cc.sh Framework level MPI instrumentation Shared library MPI based CCAFFEINE framework LD_PRELOAD based interposition of MPI wrapper mpirun –np 4./ccafe-batch mpirun –np 4 tau_load.sh./ccafe-batch
20
20 MasterMind Component Idea: Create a performance model for the component by tracking performance per invocation Uses Monitor Port Outputs: Times per invocation, e.g. Component call path Regular performance data (uses performance component) # integ_proxy::integrate(double, double, int) # MPI_TIME Time count lowBound upBound 72420 336 10000 0 1 407 449 1000 0 1 364 540 100 0 1 64838 844 10000 0 1 381 945 1000 0 1 332 1027 100 0 1
21
21 Parameter Based Profiling for CQoS Idea: partition performance data for individual functions based on runtime parameters Enable by configuring with –PROFILEPARAM TAU call: TAU_PROFILE_PARAM1L (value, “name”) Simple example: void foo(long input) { TAU_PROFILE("foo", "", TAU_DEFAULT); TAU_PROFILE_PARAM1L(input, "input");... }
22
22 Parameter Based Profiling 5 seconds spent in function “ foo ” becomes 2 seconds for “ foo [ = ] ” 1 seconds for “ foo [ = ] ” … Demonstrated in MPI wrapper library Allows for partitioning of time spent in MPI routines based on parameters (message size, message tag, destination node) Can be extrapolated to infer specifics about the MPI subsystem and system as a whole
23
23 Workload Characterization Simple example, send/receive squared message sizes (0-32MB) #include int buffer[8*1024*1024]; int main(int argc, char **argv) { int rank, size, i, j; MPI_Init(&argc, &argv); MPI_Comm_size( MPI_COMM_WORLD, &size ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); for (i=0;i<1000;i++) for (j=1;j<=8*1024*1024;j*=2) { if (rank == 0) { MPI_Send(buffer,j,MPI_INT,1,42,MPI_COMM_WORLD); } else { MPI_Status status; MPI_Recv(buffer,j,MPI_INT,0,42,MPI_COMM_WORLD,&status); } MPI_Finalize(); }
24
24 Workload Characterization Use tau_load.sh to instrument MPI routines (SGI Altix) % icc mpi.c –lmpi % mpirun –np 2 tau_load.sh –XrunTAU-icpc-mpi-pdt.so a.out SGI MPI (SGI Altix) Intel MPI (SGI Altix)
25
25 Workload Characterization Two different message sizes (~3.3MB and ~4K)
26
26 Parameter Based Profiling: SIDL Interface package Performance version 1.7.2 { interface Timer { /* Start/stop the Timer */ void start(); void stop(); /* Set Profile Parameter */ void setParam1L(in long value, in string name);... }
27
27 PerfDMF: Performance Data Mgmt. Framework
28
28 TAU Portal
29
29 TAU Portal https://tau.nic.uoregon.edu
30
30 TAU Portal: Application Specific Metadata Storage
31
31 Performance Data Mining (PerfExplorer) Performance knowledge discovery framework Data mining analysis applied to parallel performance data comparative, clustering, correlation, dimension reduction, … Use the existing TAU infrastructure TAU performance profiles, PerfDMF Client-server based system architecture Technology integration Java API and toolkit for portability PerfDMF R-project/Omegahat, Octave/Matlab statistical analysis WEKA data mining package JFreeChart for visualization, vector output (EPS, SVG)
32
32 Performance Data Mining (PerfExplorer)
33
33 PerfExplorer - Interface Select analysis
34
34 PerfExplorer - Relative Efficiency Plots
35
35 PerfExplorer - Relative Efficiency by Routine
36
36 PerfExplorer - Relative Speedup
37
37 PerfExplorer - Timesteps Per Second
38
38 Summary Create component version of GAMESS, identify interfaces Work with GAMESS and other application teams to apply TAU for inter and intra-component instrumentation Gather requirements for swapping components Generate proxy components for applications, gather performance data, store results in performance data Cross-experiment application performance characterization Develop prototype for CQoS http://www.cs.uoregon.edu/research/paracomp/tau/cca http://www.cs.uoregon.edu/research/paracomp/tau/cca
39
39 Support Acknowledgements Department of Energy (DOE) Office of Science contracts University of Utah DOE ASCI Level 1 sub-contract DOE ASC/NNSA Level 3 contract LLNL, LANL, ANL contracts NSF Software and Tools for High-End Computing Grant Research Centre Juelich John von Neumann Institute for Computing Dr. Bernd Mohr
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.