Download presentation
Presentation is loading. Please wait.
Published byAlexander Taylor Modified over 9 years ago
1
Allen D. Malony, Sameer Shende, Li Li, Kevin Huck {malony,sameer,lili,khuck}@cs.uoregon.edu Department of Computer and Information Science Performance Research Laboratory University of Oregon Parallel Performance Mapping, Diagnosis, and Data Mining
2
ParCo 20052 Research Motivation Tools for performance problem solving Empirical-based performance optimization process Performance technology concerns characterization Performance Tuning Performance Diagnosis Performance Experimentation Performance Observation hypotheses properties Instrumentation Measurement Analysis Visualization Performance Technology Experiment management Performance storage Performance Technology
3
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20053 Challenges in Performance Problem Solving How to make the process more effective (productive)? Process may depend on scale of parallel system What are the important events and performance metrics? Tied to application structure and computational model Tied to application domain and algorithms Process and tools can/must be more application-aware Tools have poor support for application-specific aspects What are the significant issues that will affect the technology used to support the process? Enhance application development and benchmarking New paradigm in performance process and technology
4
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20054 Large Scale Performance Problem Solving How does our view of this process change when we consider very large-scale parallel systems? What are the significant issues that will affect the technology used to support the process? Parallel performance observation is clearly needed In general, there is the concern for intrusion Seen as a tradeoff with performance diagnosis accuracy Scaling complicates observation and analysis Performance data size becomes a concern Analysis complexity increases Nature of application development may change
5
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20055 Role of Intelligence, Automation, and Knowledge Scale forces the process to become more intelligent Even with intelligent and application-specific tools, the decisions of what to analyze is difficult and intractable More automation and knowledge-based decision making Build automatic/autonomic capabilities into the tools Support broader experimentation methods and refinement Access and correlate data from several sources Automate performance data analysis / mining / learning Include predictive features and experiment refinement Knowledge-driven adaptation and optimization guidance Will allow scalability issues to be addressed in context
6
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20056 Outline of Talk Performance problem solving Scalability, productivity, and performance technology Application-specific and autonomic performance tools TAU parallel performance system (Bernd said “No!”) Parallel performance mapping Performance data management and data mining Performance Data Management Framework (PerfDMF) PerfExplorer Model-based parallel performance diagnosis Poirot and Hercule Conclusions
7
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20057 TAU Performance System event selection
8
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20058 Semantics-Based Performance Mapping Associate performance measurements with high-level semantic abstractions Need mapping support in the performance measurement system to assign data correctly
9
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 20059 Hypothetical Mapping Example Particles distributed on surfaces of a cube Particle* P[MAX]; /* Array of particles */ int GenerateParticles() { /* distribute particles over all faces of the cube */ for (int face=0, last=0; face < 6; face++){ /* particles on this face */ int particles_on_this_face = num(face); for (int i=last; i < particles_on_this_face; i++) { /* particle properties are a function of face */ P[i] =... f(face);... } last+= particles_on_this_face; }
10
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200510 Hypothetical Mapping Example (continued) How much time (flops) spent processing face i particles? What is the distribution of performance among faces? How is this determined if execution is parallel? int ProcessParticle(Particle *p) { /* perform some computation on p */ } int main() { GenerateParticles(); /* create a list of particles */ for (int i = 0; i < N; i++) /* iterates over the list */ ProcessParticle(P[i]); } … engine work packets
11
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200511 No Performance Mapping versus Mapping Typical performance tools report performance with respect to routines Does not provide support for mapping TAU’s performance mapping can observe performance with respect to scientist’s programming and problem abstractions TAU (no mapping) TAU (w/ mapping)
12
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200512 ParaMap (Miller and Irvin) Low-level performance to high-level source constructs Noun-Verb (NV) model to describe the mapping noun is an program entity verb represents an action performed on a noun sentences (nouns and verb) map to other sentences Mappings: static, dynamic, set of active sentences (SAS) Semantics Entities / Abstractions/ Associations (SEAA) Entities defined at any level of abstraction (user-level) Attribute entity with semantic information Entity-to-entity associations Target measurement layer and asynchronous operation Performance Mapping Approaches
13
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200513 Two association types (implemented in TAU API) Embedded – extends associated object to store performance measurement entity External – creates an external look-up table using address of object as key to locate performance measurement entity Implemented in TAU API Applied to performance measurement problems callpath/phase profiling, C++ templates, … SEAA Implementation …
14
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200514 Uintah Problem Solving Environment (PSE) Uintah component architecture for Utah C-SAFE project Application programmers provide: description of computation (tasks and variables) code to perform task on single “patch” (sub-region of space) Components for scheduling, partitioning, load balance, … Uintah Computational Framework (UCF) Execution model based on software (macro) dataflow computations expressed a directed acyclic graphs of tasks input/outputs specified for each patch in a structured grid Abstraction of global single-assignment memory Task graph gets mapped to processing resources Communications schedule approximates global optimal
15
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200515 Uintah Task Graph (Material Point Method) Diagram of named tasks (ovals) and data (edges) Imminent computation Dataflow-constrained MPM Newtonian material point motion time step Solid: values defined at material point (particle) Dashed: values defined at vertex (grid) Prime (’): values updated during time step
16
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200516 Task Execution in Uintah Parallel Scheduler Profile methods and functions in scheduler and in MPI library Need to map performance data! Task execution time dominates (what task?) MPI communication overheads (where?) Task execution time distribution
17
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200517 Mapping Instrumentation in UCF (example) Use TAU performance mapping API void MPIScheduler::execute(const ProcessorGroup * pc, DataWarehouseP & old_dw, DataWarehouseP & dw ) {... TAU_MAPPING_CREATE( task->getName(), "[MPIScheduler::execute()]", (TauGroup_t)(void*)task->getName(), task->getName(), 0);... TAU_MAPPING_OBJECT(tautimer) TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void*)task->getName()); // EXTERNAL ASSOCIATION... TAU_MAPPING_PROFILE_TIMER(doitprofiler, tautimer, 0) TAU_MAPPING_PROFILE_START(doitprofiler,0); task->doit(pc); TAU_MAPPING_PROFILE_STOP(0);... }
18
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200518 Task Performance Mapping (Profile) Performance mapping for different tasks Mapped task performance across processes
19
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200519 Work Packet – to – Task Mapping (Trace) Work packet computation events colored by task type Distinct phases of computation can be identifed based on task
20
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200520 Comparing Uintah Traces for Scalability Analysis 8 processes 32 processes
21
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200521 Important Questions for Application Developers How does performance vary with different compilers? Is poor performance correlated with certain OS features? Has a recent change caused unanticipated performance? How does performance vary with MPI variants? Why is one application version faster than another? What is the reason for the observed scaling behavior? Did two runs exhibit similar performance? How are performance data related to application events? Which machines will run my code the fastest and why? Which benchmarks predict my code performance best?
22
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200522 Performance Problem Solving Goals Answer questions at multiple levels of interest Data from low-level measurements and simulations use to predict application performance High-level performance data spanning dimensions machine, applications, code revisions, data sets examine broad performance trends Discover general correlations application performance and features of their external environment Develop methods to predict application performance on lower-level metrics Discover performance correlations between a small set of benchmarks and a collection of applications that represent a typical workload for a given system
23
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200523 Empirical-Based Performance Optimization characterization Performance Tuning Performance Diagnosis Performance Experimentation Performance Observation hypotheses properties observability requirements ? Process Experiment Schemas Experiment Trials Experiment management
24
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200524 Performance Data Management Framework ICPP 2005 paper
25
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200525 PerfExplorer (K. Huck, Ph.D. student, UO) Performance knowledge discovery framework Use the existing TAU infrastructure TAU instrumentation data, PerfDMF Client-server based system architecture Data mining analysis applied to parallel performance data comparative, clustering, correlation, dimension reduction,... Technology integration Relational DatabaseManagement Systems (RDBMS) Java API and toolkit R-project / Omegahat statistical analysis WEKA data mining package Web-based client
26
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200526 PerfExplorer Architecture SC’05 paper
27
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200527 PerfExplorer Client GUI
28
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200528 Hierarchical and K-means Clustering (sPPM)
29
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200529 Miranda Clustering on 16K Processors
30
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200530 Parallel Performance Diagnosis Performance tuning process Process to find and report performance problems Performance diagnosis: detect and explain problems Performance optimization: performance problem repair Experts approach systematically and use experience Hard to formulate and automate expertise Performance optimization is fundamentally hard Focus on the performance diagnosis problem Characterize diagnosis processes How it integrates with performance experimentation Understand the knowledge engineering
31
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200531 Parallel Performance Diagnosis Architecture
32
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200532 Performance Diagnosis System Architecture
33
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200533 Problems in Existing Diagnosis Approaches Low-level abstraction of properties/metrics Independent of program semantics Relate to component structure not algorithmic structure or parallelism model Insufficient explanation power Hard to interpret in the context of program semantics Performance behavior not tied to operational parallelism Low applicability and adaptability Difficult to apply in different contexts Hard to adapt to new requirements
34
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200534 Poirot Project Lack of a formal theory of diagnosis processes Compare and analyze performance diagnosis systems Use theory to create system that is automated / adaptable Poirot performance diagnosis (theory, architecture) Survey of diagnosis methods / strategies in tools Heuristic classification approach (match to characteristics) Heuristic search approach (based on problem knowledge) Problems Descriptive results do not explain with respect to context users must reason about high-level causes Performance experimentation not guided by diagnosis Lacks automation
35
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200535 Model-Based Approach Knowledge-based performance diagnosis Capture knowledge about performance problems Capture knowledge about how to detect and explain them Where does the knowledge come from? Extract from parallel computational models Structural and operational characteristics Associate computational models with performance Do parallel computational models help in diagnosis? Enables better understanding of problems Enables more specific experimentation Enables more efffective hypothesize testing and search
36
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200536 Implications for Performance Diagnosis Models benefit performance diagnosis Base instrumentation on program semantics Capture performance-critical features Enable explanations close to user’s understanding of computation operation of performance behavior Reuse performance analysis expertise on the commonly-used models Model examples Master-worker model Pipeline Divide-and-conquer Domain decomposition Phase-based Compositional
37
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200537 Hercule Project Goals of automation, adaptability, validation
38
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200538 Approach Make use of model knowledge to diagnose performance Start with commonly-used computational models Engineering model knowledge Integrate model knowledge with performance measurement system Build a cause inference system define “causes” at parallelism level build causality relation between the low-level “effects” and the “causes”
39
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200539 Master-Worker Parallel Computation Model
40
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200540 Init. or final. time significant 1.Insufficient-parallelism Low speedup 3.Master-being-bottleneck Worker number saturation Worker starvation master-assign-task time significant 2.Fine-granularity Large amount of message exchanged every time : Hypotheses : Causes number : priority Num of reqs in master queue > Κ 1 in some time intervals Waiting long time for Master assigning each individual task Such intervals >Κ 2 Such intervals <Κ 2 + + + + Time imbalance 4. Some workers Noticeably inefficient + Κ i : threshold + : coexistence : Observation The workers waited quite a while in master queue in Some time intervals + Performance Diagnosis Inference Tree (MW)
41
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200541 Knowledge Engineering - Abstract Event (MW) Use CLIPS expert system building tool
42
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200542 Diagnosis Results Output (MW)
43
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200543 Experimental Diagnosis Results (MW)
44
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200544 Concluding Discussion Performance tools must be used effectively More intelligent performance systems for productive use Evolve to application-specific performance technology Deal with scale by “full range” performance exploration Autonomic and integrated tools Knowledge-based and knowledge-driven process Performance observation methods do not necessarily need to change in a fundamental sense More automatically controlled and efficiently use Support model-driven performance diagnosis Develop next-generation tools and deliver to community
45
Parallel Performance Mapping, Diagnosis, and Data MiningParCo 200545 Support Acknowledgements Department of Energy (DOE) Office of Science contracts University of Utah ASCI Level 1 sub-contract ASC/NNSA Level 3 contract NSF High-End Computing Grant Research Centre Juelich John von Neumann Institute Dr. Bernd Mohr Los Alamos National Laboratory
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.