Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Reference: Message Passing Fundamentals.

Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.

Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.

Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.

The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.

On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.

Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University.

The TAU Performance System: Advances in Performance Mapping Sameer Shende University of Oregon.

Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.

TAU: Performance Regression Testing Harness for FLASH Sameer Shende

Allen D. Malony, Sameer Shende, Robert Bell Department of Computer and Information Science Computational Science Institute, NeuroInformatics.

Kai Li, Allen D. Malony, Robert Bell, Sameer Shende Department of Computer and Information Science Computational.

Sameer Shende, Allen D. Malony Computer & Information Science Department Computational Science Institute University of Oregon.

Chapter 10: Architectural Design

Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.

Architectural Design.

Computer System Architectures Computer System Software

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Martin Berzins (Steve Parker) What are the hard apps problems? How do the solutions get shared? What non-apps work is needed? Thanks to DOE for funding.

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

Lecture 9: Chapter 9 Architectural Design

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.

Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.

John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.

1 Introduction to Software Engineering Lecture 1.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.

Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.

Dynamic performance measurement control Dynamic event grouping Multiple configurable counters Selective instrumentation Application-Level Performance Access.

Computing and SE II Chapter 9: Design Methods and Design Models Er-Yu Ding Software Institute, NJU.

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.

Allen D. Malony, Sameer Shende, Li Li, Kevin Huck Department of Computer and Information Science Performance.

Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.

Connections to Other Packages The Cactus Team Albert Einstein Institute

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.

Integrated Performance Analysis in the Uintah Computational Framework Steven G. Parker Allen Morris, Scott Bardenhagen, Biswajit Banerje, James Bigler,

Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.

Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.

Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.

Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.

CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

Kai Li, Allen D. Malony, Sameer Shende, Robert Bell

Productive Performance Tools for Heterogeneous Parallel Computing

Performance Technology for Scalable Parallel Systems

Parallel Programming By J. H. Wang May 2, 2017.

Allen D. Malony, Sameer Shende

Parallel Programming in C with MPI and OpenMP

Component Frameworks:

Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee

Outline Introduction Motivation for performance mapping SEAA model

TAU Performance DataBase Framework (PerfDBF)

Parallel Programming in C with MPI and OpenMP

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance Analysis in the Uintah Software Development Cycle J. Davison de St. Germain, Allan Morris, Steven G. Parker Department of Computer Science School of Computing University of Oregon

ISHPC May 16, 2002 Outline  Scientific software engineering  C-SAFE and Uintah Computational Framework (UCF)  Goals and design  Challenges for performance technology integration  TAU performance system  Role of performance mapping  Performance analysis integration in UCF  TAU performance mapping  X-PARE  Concluding remarks

ISHPC May 16, 2002 Scientific Software (Performance) Engineering  Modern scientific simulation software is complex (  )  Large development teams of diverse expertise  Simultaneous development on different system parts  Iterative, multi-stage, long-term software development  Need support for managing complex software process  Software engineering tools for revision control, automated testing, and bug tracking are commonplace  In contrast, tools for performance engineering are not  evaluation (measurement, analysis, benchmarking)  optimization (diagnosis, tracking, prediction, tuning)  Incorporate performance engineering methodology and support by flexible and robust performance tools

ISHPC May 16, 2002 Utah ASCI/ASAP Level 1 Center (C-SAFE)  C-SAFE was established to build a problem-solving environment (PSE) for the numerical simulation of accidental fires and explosions  Combine fundamental chemistry and engineering physics  Integrate non-linear solvers, optimization, computational steering, visualization, and experimental data verification  Support very large-scale coupled simulations  Computer science problems:  Coupling multiple scientific simulation codes with different numerical and software properties  Software engineering across diverse expert teams  Achieving high performance on large-scale systems

ISHPC May 16, 2002 Example C-SAFE Simulation Problems ∑ Heptane fire simulation Material stress simulation Typical C-SAFE simulation with a billion degrees of freedom and non-linear time dynamics

ISHPC May 16, 2002 Uintah Problem Solving Environment (PSE)  Enhanced SCIRun PSE  Pure dataflow  component-based  Shared memory  scalable multi-/mixed-mode parallelism  Interactive only  interactive plus standalone  Design and implement Uintah component architecture  Application programmers provide  description of computation (tasks and variables)  code to perform task on single “patch” (sub-region of space)  Components for scheduling, partitioning, load balance, …  Follow Common Component Architecture (CCA) model  Design and implement Uintah Computational Framework (UCF) on top of the component architecture

ISHPC May 16, 2002 Uintah High-Level Component View

ISHPC May 16, 2002 High Level Architecture C-SAFE Implicitly Connected to All Components UCF Data Control / Light Data Checkpointing Mixing Model Mixing Model Fluid Model Fluid Model Subgrid Model Subgrid Model Chemistry Database Controller Chemistry Database Controller Chemistry Databases Chemistry Databases High Energy Simulations High Energy Simulations Numerical Solvers Numerical Solvers Non-PSE Components Performance Analysis Performance Analysis Simulation Controller Simulation Controller Problem Specification Numerical Solvers Numerical Solvers MPM Material Properties Database Material Properties Database Blazer Database Visualization Data Manager Data Manager Post Processing And Analysis Post Processing And Analysis Parallel Services Parallel Services Resource Management Resource Management PSE Components Scheduler Uintah Parallel Component Architecture

ISHPC May 16, 2002 Uintah Computational Framework (UCF)  Execution model based on software (macro) dataflow  Exposes parallelism and hides data transport latency  Computations expressed a directed acyclic graphs of tasks  consumes input and produces output (input to future task)  input/outputs specified for each patch in a structured grid  Abstraction of global single-assignment memory  DataWarehouse  Directory mapping names to values (array structured)  Write value once then communicate to awaiting tasks  Task graph gets mapped to processing resources  Communications schedule approximates global optimal

ISHPC May 16, 2002 Uintah Task Graph (Material Point Method)  Diagram of named tasks (ovals) and data (edges)  Imminent computation  Dataflow-constrained  MPM  Newtonian material point motion time step  Solid: values defined at material point (particle)  Dashed: values defined at vertex (grid)  Prime (’): values updated during time step

ISHPC May 16, 2002 Example Taskgraphs (MPM and Coupled)

ISHPC May 16, 2002 Uintah PSE  UCF automatically sets up:  Domain decomposition  Inter-processor communication with aggregation/reduction  Parallel I/O  Checkpoint and restart  Performance measurement and analysis (stay tuned)  Software engineering  Coding standards  CVS (Commits: Y files/day, Y files/day)  Correctness regression testing with bugzilla bug tracking  Nightly build (parallel compiles)  170,000 lines of code (Fortran and C++ tasks supported)

ISHPC May 16, 2002 Performance Technology Integration  Uintah presents challenges to performance integration  Software diversity and structure  UCF middleware, simulation code modules  component-based hierarchy  Portability objectives  cross-language and cross-platform  multi-parallelism: thread, message passing, mixed  Scalability objectives  High-level programming and execution abstractions  Requires flexible and robust performance technology  Requires support for performance mapping

ISHPC May 16, 2002 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high-performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable performance profiling/tracing facility  Open software approach

ISHPC May 16, 2002 TAU Performance System Architecture EPILOG Paraver

ISHPC May 16, 2002 Performance Analysis Objectives for Uintah  Micro tuning  Optimization of simulation code (task) kernels for maximum serial performance  Scalability tuning  Identification of parallel execution bottlenecks  overheads: scheduler, data warehouse, communication  load imbalance  Adjustment of task graph decomposition and scheduling  Performance tracking  Understand performance impacts of code modifications  Throughout course of software development  C-SAFE application and UCF software

ISHPC May 16, 2002 Uintah Performance Engineering Approach  Contemporary performance methodology focuses on control flow (function) level measurement and analysis  C-SAFE application involves coupled-models with task- based parallelism and dataflow control constraints  Performance engineering on algorithmic (task) basis  Observe performance based on algorithm (task) semantics  Analyze task performance characteristics in relation to other simulation tasks and UCF components  scientific component developers can concentrate on performance improvement at algorithmic level  UCF developers can concentrate on bottlenecks not directly associated with simulation module code

ISHPC May 16, 2002 Task execution time dominates (what task?) MPI communication overheads (where?) Task Execution in Uintah Parallel Scheduler  Profile methods and functions in scheduler and in MPI library Task execution time distribution per process  Need to map performance data!

ISHPC May 16, 2002 Semantics-Based Performance Mapping  Associate performance measurements with high-level semantic abstractions  Need mapping support in the performance measurement system to assign data correctly

ISHPC May 16, 2002 Hypothetical Mapping Example  Particles distributed on surfaces of a cube Particle* P[MAX]; /* Array of particles */ int GenerateParticles() { /* distribute particles over all faces of the cube */ for (int face=0, last=0; face < 6; face++){ /* particles on this face */ int particles_on_this_face = num(face); for (int i=last; i < particles_on_this_face; i++) { /* particle properties are a function of face */ P[i] =... f(face);... } last+= particles_on_this_face; }

ISHPC May 16, 2002 Hypothetical Mapping Example (continued)  How much time (flops) spent processing face i particles?  What is the distribution of performance among faces?  How is this determined if execution is parallel? int ProcessParticle(Particle *p) { /* perform some computation on p */ } int main() { GenerateParticles(); /* create a list of particles */ for (int i = 0; i < N; i++) /* iterates over the list */ ProcessParticle(P[i]); }

ISHPC May 16, 2002 No Performance Mapping versus Mapping  Typical performance tools report performance with respect to routines  Does not provide support for mapping  TAU’s performance mapping can observe performance with respect to scientist’s programming and problem abstractions TAU (no mapping) TAU (w/ mapping)

ISHPC May 16, 2002 Uintah Task Performance Mapping  Uintah partitions individual particles across processing elements (processes or threads)  Simulation tasks in task graph work on particles  Tasks have domain-specific character in the computation  “interpolate particles to grid” in Material Point Method  Task instances generated for each partitioned particle set  Execution scheduled with respect to task dependencies  How to attribute execution time among different tasks?  Assign semantic name (task type) to a task instance  SerialMPM::interpolateParticleToGrid  Map TAU timer object to (abstract) task (semantic entity)  Look up timer object using task type (semantic attribute)  Further partition along different domain-specific axes

ISHPC May 16, 2002 Task Performance Mapping (Profile) Performance mapping for different tasks Mapped task performance across processes

ISHPC May 16, 2002 Task Performance Mapping (Trace) Work packet computation events colored by task type Distinct phases of computation can be identifed based on task

ISHPC May 16, 2002 Task Performance Mapping (Trace - Zoom) Startup communication imbalance

ISHPC May 16, 2002 Task Performance Mapping (Trace - Parallelism) Communication / load imbalance

ISHPC May 16, 2002 Comparing Uintah Traces for Scalability Analysis 8 processes 32 processes

ISHPC May 16, 2002 Performance Tracking and Reporting  Integrated performance measurement allows performance analysis throughout development lifetime  Applied performance engineering in software design and development (software engineering) process  Create “performance portfolio” from regular performance experimentation (couple with software testing)  Use performance knowledge in making key software design decision, prior to major development stages  Use performance benchmarking and regression testing to identify irregularities  Support automatic reporting of “performance bugs”  Enable cross-platform (cross-generation) evaluation

ISHPC May 16, 2002 XPARE - eXPeriment Alerting and REporting  Experiment launcher automates measurement / analysis  Configuration and compilation of performance tools  Instrumentation control for Uintah experiment type  Execution of multiple performance experiments  Performance data collection, analysis, and storage  Integrated in Uintah software testing harness  Reporting system conducts performance regression tests  Apply performance difference thresholds (alert ruleset)  Alerts users via if thresholds have been exceeded  Web alerting setup and full performance data reporting  Historical performance data analysis

ISHPC May 16, 2002 XPARE System Architecture Experiment Launch Mail server Performance Database Performance Reporter Comparison Tool Regression Analyzer Alerting Setup

ISHPC May 16, 2002 Scaling Performance Optimizations (Past) Last year: initial “correct” scheduler Reduce communication by 10 x Reduce task graph overhead by 20 x ASCI Nirvana SGI Origin 2000 Los Alamos National Laboratory

ISHPC May 16, 2002 Scalability to 2000 Processors (Current) ASCI Nirvana SGI Origin 2000 Los Alamos National Laboratory

ISHPC May 16, 2002 Concluding Remarks  Modern scientific simulation environments involves a complex (scientific) software engineering process  Iterative, diverse expertise, multiple teams, concurrent  Complex parallel software and systems pose challenging performance analysis problems that require flexible and robust performance technology and methods  Cross-platform, cross-language, large-scale  Fully-integrated performance analysis system  Performance mapping  Neet to support performance engineering methodology within scientific software design and development  Performance comparison and tracking

ISHPC May 16, 2002 Acknowledgements  Department of Energy (DOE), ASCI Academic Strategic Alliances Program (ASAP)  Center for the Simulation of Accidental Fires and Explosions (C-SAFE), ASCI/ASAP Level 1 center, University of Utah  Computational Science Institute, ASCI/ASAP Level 3 projects with LLNL / LANL, University of Oregon  ftp://ftp.cs.uoregon.edu/pub/malony/Talks/ishpc2002.ppt