Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of Computer and Information ScienceSandia National Laboratories Performance Research Laboratory University of Oregon “Performance Modeling of Component Assemblies with TAU”
Component Performance Modeling with TAUCompframe, Jun. 23, Outline Motivation Introduction and Background Performance Measurement in HPC Component Environment Performance Measuring and Modeling Infrastructure Proxies TAU component Mastermind Component Assembly Optimization Conclusions
Component Performance Modeling with TAUCompframe, Jun. 23, Motivations Given a set of components, where each component has multiple implementations, what is the optimal subset of implementations that solve a given problem? How to model a single component? How to create a global model from a set of component models? How to select optimal subset of implementations? From a performance perspective, a component by itself has no meaning. A component needs a context. Context is affected by: The problem being solved Parameters (e.g., size of an array) Mismatched data structures
Component Performance Modeling with TAUCompframe, Jun. 23, Performance in HPC Component Environments Traditional role of performance measurement and modeling Analysis-and-optimization phase e.g., porting a stable code base to a new architecture Performance model => predict scalability In a component environment Applications are dynamically composed at runtime Application developers typically do not implement all of their own components Performance measurements need to be non-intrusive Users interested in a coarse-grained performance
Component Performance Modeling with TAUCompframe, Jun. 23, What does performance mean? Given a problem (characterized by tuple P), what time T e does a component C need to solve it ? i.e T e = f ( P ) ; what’s f ? To create a performance model f ( P ), we need: T e = Execution time for a method call T m = Execution time of message passing calls within a method T c = Compute time for a given method (T c = T e - T m ) Input parameters that affect performance (e.g., size of an array) For our purposes start with simplifying assumptions Blocking communication and no overlap of communication and computation Ignore disk I/O
Component Performance Modeling with TAUCompframe, Jun. 23, How to measure performance? Need to “instrument” the code But has to be non-intrusive What kind of performance infrastructure can achieve this? Previous research suggests proxies Proxies serve to intercept and forward method calls
Component Performance Modeling with TAUCompframe, Jun. 23, CCA Performance Infrastructure The proxy measurement system infrastructure: Proxy Lightweight : simply, a switch that turns measurement on and off 1 proxy per component Tuning and Analysis Utilities (TAU) component Utilizes the TAU measurement library Provides a measurement port Responsible for making the measurements Mastermind component Responsible for gathering, storing, and reporting measurement data (timing data from TAU as well as input parameters from proxies) Queries the TAU component for method-level measurements
Component Performance Modeling with TAUCompframe, Jun. 23, Proxy A proxy uses and provides the same ports that the actual component provides Also, uses a MonitorPort Identifies performance- dependent parameters C1 C2 Before: C1 C2 P2 After: MM
Component Performance Modeling with TAUCompframe, Jun. 23, Automatic Proxy Generation A tool based upon the Program Database Toolkit (University of Oregon) 1 proxy created per port
Component Performance Modeling with TAUCompframe, Jun. 23, MasterMind A record is created for each instrumented routine and stores, for each invocation: Measurement data (e.g., execution time, communication time, cache hits, etc.) Input parameters Currently, the MasterMind outputs all records at application completion In the future, perhaps the MasterMind could output a performance model for a given component (based upon a linear regression) ?
Component Performance Modeling with TAUCompframe, Jun. 23, TAU Component TAU component is a wrapper to the TAU library Provides access to timers to measure execution time and communication time Also provides access to hardware metrics (e.g., cache hits) via external libraries such as PAPI or PCL See
Component Performance Modeling with TAUCompframe, Jun. 23, TAU Performance System Architecture
Component Performance Modeling with TAUCompframe, Jun. 23, Using performance timings to select optimal components To find optimal solution, need to reduce solution space Eliminate “insignificant” components 2-step heuristic Are children, as a group, insignificant to a parent? Is an individual node insignificant relative to its siblings? Optimize reduced core for an approximately optimal solution
Component Performance Modeling with TAUCompframe, Jun. 23, Case Study Example Core identification ran on hydro shock simulation developed at Sandia National Labs 10% thresholds The original call-graph consisting of 18 nodes reduced to 8 nodes
Component Performance Modeling with TAUCompframe, Jun. 23, Conclusions The proxy-based measurement system allows for non- intrusive measurement of components A single component may have multiple performance models based on different contexts Eliminating “insignificant” components can ease the identification of an approximately optimal solution.
Component Performance Modeling with TAUCompframe, Jun. 23, Future Work Synthesize a composite performance model from individual component models Generalizing performance models (e.g. parameterizing models by a processor speed and cache model to make them architecture independent) Model representation XML? Quality-of-Service Dynamic Implementation Selection