Download presentation
Presentation is loading. Please wait.
Published byMarjorie Bishop Modified over 9 years ago
1
Integrated Performance Analysis in the Uintah Computational Framework Steven G. Parker Allen Morris, Scott Bardenhagen, Biswajit Banerje, James Bigler, Jared Campbell, Curtis Larsen, Dav De St. Germain, Dawen Li, Divya Ramachandran, David Walter, Jim Guilkey, Todd Harman, John Schmidt, Jesse Hall, Jun Wang, Kurt Zimmerman, John McCorquodale, Misha Ovchinnikov, Jason Morgan, Nick Benson, Phil Sutton, Rajesh Rawat, Scott Morris, Seshadri Kumar, Steven Parker, Jennifer Spinti, Honglai Tan, Wing Yee, Wayne Witzel, Xiaodong Chen, Runing Zhang
2
The Beginning C-SAFE funded in September 1997 SCIRun PSE existed: Shared memory only Combustion code existed: NOT parallel Steady state, NOT transient C-SAFE MPM code did not exist C-SAFE ICE code did not exist ? ?
3
ASCI
4
Example situation
5
C-SAFE Goal
6
Now: Scalable Simulations September 2001 SCIRun Uintah: Distributed memory, CCA-based component model Shared-memory visualization tools Arches: Modular, parallel, transient C-SAFE MPM: Modular, parallel, transient C-SAFE ICE: Modular, parallel, transient Coupled with MPM
7
How did we get here? Designed and implemented a parallel component architecture (Uintah) Designed and implemented the Uintah Computational Framework (UCF) on top of the component architecture High Level Architecture C-SAFE Implici tly Conne cted to All Comp onents UCF Data Control / Light Data Checkpointing Mixing Model Mixing Model Fluid Model Fluid Model Subgrid Model Subgrid Model Chemistry Database Controller Chemistry Database Controller Chemistry Databases Chemistry Databases High Energy Simulations High Energy Simulations Numerical Solvers Numerical Solvers Non-PSE Components Performance Analysis Performance Analysis Simulation Controller Simulation Controller Problem Specification Numerical Solvers Numerical Solvers MPM Material Properties Database Material Properties Database Blazer Database Visualization Data Manager Data Manager Post Processing And Analysis Post Processing And Analysis Parallel Services Parallel Services Resource Management Resource Management PSE Components Scheduler
8
Introduction to Components
9
Good Fences make Good Neighbors A component architecture is all about building (and sometimes enforcing) the fences Popular in the software industry (Microsoft COM, CORBA, Enterprise Java Beans) Commercial component architectures not suitable for Scientific Computing (CCA Forum organized to address this point) Visual programming sometimes used to connect components together
10
Parallel Components Fluid Model Fluid Model Simulation Controller Simulation Controller MPM Data Manager Data Manager Two ways to split up work Task based Data based (Or a combination) Which is right? Key point: Components, by definition, make local decisions However, parallelism (scalable) is a global decision
11
Uintah Scalability Challenges Wide range of computational loads, due to: AMR Particles in subset of space Cost of ODE solvers can vary spatially Radiation models Architectural communication limitations
12
UCF Architecture Overview Application programmers provide: A description of the computation (tasks and variables) Code to perform each task on a single Patch (subregion of space) C++ or Fortran supported UCF uses this information to create a scalable parallel simulation
13
How Does It Work? Simulation Controller Simulation Controller Problem Specification Problem Specification XML Simulation (One of Arches, ICE, MPM, MPMICE, MPMArches, …) Simulation (One of Arches, ICE, MPM, MPMICE, MPMArches, …) Scheduler Tasks Data Archiver Data Archiver Tasks Callbacks MPI Assignments Load Balancer Load Balancer Configuration
14
How does the scheduler work? Scheduler component uses description of computation to create a taskgraph Taskgraph gets mapped to processing resources using the Load Balancer component
15
What is a graph?
16
CS Graphs: B D C A Vertex or Node Edge Taskgraph: A graph where the nodes are tasks (jobs) to be performed, and the edges are dependencies between those tasks
17
Example Taskgraphs
20
Taskgraph advantages Can accommodate flexible integration needs Can accommodate a wide range of unforeseen work loads Can accommodate a mix of static and dynamic load balance Helps manage complexity of a mixed threads/MPI programming model Allows pieces (including the scheduler) to evolve independently
21
Looking forward to AMR Entire UCF infrastructure is designed around complex meshes Able to achieve scalability like a structured grid code Some codes can currently handle irregular boundaries
22
Achieving scalability Parallel Taskgraph implementation Use 125 (of 128) processors per box Remaining 3 perform O/S functions 125 processors organized into 5x5x5 cube Multiple boxes by abutting cubes Nirvana load balancer performs this mapping for regular grid problems
23
Performance Analysis Tools Integrated Tools TAU calls describe costs for each Task Post-processing tools for: Average/Standard Deviation Timings Critical path/Near-critical path analysis Performance regression testing Load imbalance TAU/VAMPIR Analysis
24
Integration of TAU from Oregon Working with Allen Malony and friends to help with the integration Have identified bottlenecks and this influenced design of new scalable scheduler Have identified numerous ways in which to collaborate in the future Tuning and Analysis Utilities (TAU)
25
MPM Simulation 27 processors
26
Arches Simulation 40 of 125 processors
27
XPARE Performance Tuning typically done only for final products Or sometimes just one/twice during development Performance Analysis throughout development process Retrospective analysis possible Understanding impact of design decisions More informed optimization later
28
XPARE Regression Analyzer: alerts parties of violations of the thresholds Comparison tool: used by the automation system to report violations. Also can be run manually Integrated in a weekly testing harness for the Uintah / C-SAFE Performance comparisons Compiler flags O/S upgrades Platforms
31
XPARE Alan Morris – Utah Allen D. Malony - Oregon Sameer S. Shende - Oregon J. Davison de St. Germain - Utah Steven G. Parker - Utah XPARE - eXPeriment Alerting and REporting http://www.acl.lanl.gov/tau/xpare
33
Load balancing Taskgraph provides a nice mechanism for flexible load-balancing algorithms To date: simple, static mechanisms have sufficed But, we are outgrowing those
34
Real-world scalability Parallel I/O Parallel compiles Production run obtained speedup of 1.95 going from 500 to 1000 processors
35
New scalability - MPM
36
Breakdown
39
Mixed MPI/Thread scheduler Most ASCI platforms have SMP nodes Multi-threading and asynchronous MPI could give us ~2X speed improvement SGI MPI Implementation is supposedly thread-safe, but….
40
Network traffic into Utah Visual Supercomputing Center 2 hour average 1 day average
41
Volume Rendering
43
MPM Simulation - 500 processors 6.8 million particles, 22 timesteps interactively visualized using the real-time ray tracer (6-10 fps)
44
RTRT with MPM Data
45
Other SCIRun Applications
46
Geo Sciences
47
Conclusions Holistic performance approach Architecture Tools Scalability achieved, now we can keep it
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.