Download presentation
Presentation is loading. Please wait.
Published byJeffry Blair Modified over 9 years ago
1
Allen D. Malony malony@cs.uoregon.edu http://www.cs.uoregon.edu/research/tau Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department of Computer and Information Science University of Oregon Performance Technology for Productive, High-End Parallel Computing: the TAU Parallel Performance System
2
Performance Technology for Productive, High-End Parallel ComputingSun2 Outline Research interests and motivation TAU performance system Instrumentation Measurement Analysis tools Parallel profile analysis (ParaProf) Performance data management (PerfDMF) Performance data mining (PerfExplorer) TAU on Solaris 10 ZeptoOS and KTAU
3
Performance Technology for Productive, High-End Parallel ComputingSun3 Research Motivation Tools for performance problem solving Empirical-based performance optimization process Performance technology concerns characterization Performance Tuning Performance Diagnosis Performance Experimentation Performance Observation hypotheses properties Instrumentation Measurement Analysis Visualization Performance Technology Experiment management Performance data storage Performance Technology
4
Performance Technology for Productive, High-End Parallel ComputingSun4 Challenges in Performance Problem Solving How to make the process more effective (productive)? Process likely to change as parallel systems evolve What are the important events and performance metrics? Tied to application structure and computational model Tied to application domain and algorithms What are the significant issues that will affect the technology used to support the process? Enhance application development and optimization Process and tools can/must be more application-aware Tools have poor support for application-specific aspects Integrate performance technology and process
5
Performance Technology for Productive, High-End Parallel ComputingSun5 Performance Process, Technology, and Scale How does our view of this process change when we consider very large-scale parallel systems? Scaling complicates observation and analysis Performance data size standard approaches deliver a lot of data with little value Measurement overhead and intrusion tradeoff with analysis accuracy “noise” in the system Analysis complexity increases What will enhance productive application development? Process and technology evolution Nature of application development may change
6
Performance Technology for Productive, High-End Parallel ComputingSun6 Role of Intelligence, Automation, and Knowledge Scale forces the process to become more intelligent Even with intelligent and application-specific tools, the decisions of what to analyze is difficult and intractable More automation and knowledge-based decision making Build automatic/autonomic capabilities into the tools Support broader experimentation methods and refinement Access and correlate data from several sources Automate performance data analysis / mining / learning Include predictive features and experiment refinement Knowledge-driven adaptation and optimization guidance Address scale issues through increased expertise
7
Performance Technology for Productive, High-End Parallel ComputingSun7 TAU Performance System Tuning and Analysis Utilities (14+ year project effort) Performance system framework for HPC systems Integrated, scalable, flexible, and parallel Targets a general complex system computation model Entities: nodes / contexts / threads Multi-level: system / software / parallelism Measurement and analysis abstraction Integrated toolkit for performance problem solving Instrumentation, measurement, analysis, and visualization Portable performance profiling and tracing facility Performance data management and data mining Partners: LLNL, ANL, Research Center Jülich, LANL
8
Performance Technology for Productive, High-End Parallel ComputingSun8 TAU Parallel Performance System Goals Portable (open source) parallel performance system Computer system architectures and operating systems Different programming languages and compilers Multi-level, multi-language performance instrumentation Flexible and configurable performance measurement Support for multiple parallel programming paradigms Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component Support for performance mapping Integration of leading performance technology Scalable (very large) parallel performance analysis
9
Performance Technology for Productive, High-End Parallel ComputingSun9 memory Node VM space Context SMP Threads node memory … … Interconnection Network Inter-node message communication * * physical view model view General Complex System Computation Model Node: physically distinct shared memory machine Message passing node interconnection network Context: distinct virtual memory space within node Thread: execution threads (user/system) in context
10
Performance Technology for Productive, High-End Parallel ComputingSun10 TAU Performance System Architecture
11
Performance Technology for Productive, High-End Parallel ComputingSun11 TAU Performance System Architecture
12
Performance Technology for Productive, High-End Parallel ComputingSun12 TAU Instrumentation Approach Support for standard program events Routines, classes and templates Statement-level blocks Support for user-defined events Begin/End events (“user-defined timers”) Atomic events (e.g., size of memory allocated/freed) Selection of event statistics Support definition of “semantic” entities for mapping Support for event groups (aggregation, selection) Instrumentation optimization Eliminate instrumentation in lightweight routines
13
Performance Technology for Productive, High-End Parallel ComputingSun13 TAU Instrumentation Mechanisms Source code Manual (TAU API, TAU component API) Automatic (robust) C, C++, F77/90/95 (Program Database Toolkit (PDT)) OpenMP (directive rewriting (Opari), POMP2 spec) Object code Pre-instrumented libraries (e.g., MPI using PMPI) Statically-linked and dynamically-linked Executable code Dynamic instrumentation (pre-execution) (DynInstAPI) Virtual machine instrumentation (e.g., Java using JVMPI) TAU_COMPILER to automate instrumentation process
14
Performance Technology for Productive, High-End Parallel ComputingSun14 User-level abstractions problem domain source code object codelibraries instrumentation executable runtime image compiler linkerOS VM instrumentation performance data run preprocessor Multi-Level Instrumentation and Mapping Multiple interfaces Information sharing Between interfaces Event selection Within/between levels Mapping Associate performance data with high-level semantic abstractions
15
Performance Technology for Productive, High-End Parallel ComputingSun15 Program Database Toolkit (PDT) Application / Library C / C++ parser Fortran parser F77/90/95 C / C++ IL analyzer Fortran IL analyzer Program Database Files IL DUCTAPE PDBhtml SILOON CHASM tau_instrument or Program documentation Application component glue C++ / F90/95 interoperability Automatic source instrumentation
16
Performance Technology for Productive, High-End Parallel ComputingSun16 Program Database Toolkit (PDT) Program code analysis framework Develop source-based tools High-level interface to source code information Integrated toolkit for source code parsing, database creation, and database query Commercial grade front-end parsers Portable IL analyzer, database format, and access API Open software approach for tool development Multiple source languages Implement automatic performance instrumentation tools tau_instrumentor
17
Performance Technology for Productive, High-End Parallel ComputingSun17 TAU Measurement Approach Portable and scalable parallel profiling solution Multiple profiling types and options Event selection and control (enabling/disabling, throttling) Online profile access and sampling Online performance profile overhead compensation Portable and scalable parallel tracing solution Trace translation to Open Trace Format (OTF) Trace streams and hierarchical trace merging Robust timing and hardware performance support Multiple counters (hardware, user-defined, system) Performance measurement for CCA component software
18
Performance Technology for Productive, High-End Parallel ComputingSun18 TAU Measurement Mechanisms Parallel profiling Function-level, block-level, statement-level Supports user-defined events and mapping events TAU parallel profile stored (dumped) during execution Support for flat, callgraph/callpath, phase profiling Support for memory profiling Tracing All profile-level events Inter-process communication events Inclusion of multiple counter data in traced events
19
Performance Technology for Productive, High-End Parallel ComputingSun19 Types of Parallel Performance Profiling Flat profiles Metric (e.g., time) spent in an event (callgraph nodes) Exclusive/inclusive, # of calls, child calls Callpath profiles (Calldepth profiles) Time spent along a calling path (edges in callgraph) “main=> f1 => f2 => MPI_Send” (event name) TAU_CALLPATH_LENGTH environment variable Phase profiles Flat profiles under a phase (nested phases are allowed) Default “main” phase Supports static or dynamic (per-iteration) phases
20
Performance Technology for Productive, High-End Parallel ComputingSun20 Performance Analysis and Visualization Analysis of parallel profile and trace measurement Parallel profile analysis ParaProf: parallel profile analysis and presentation ParaVis: parallel performance visualization package Profile generation from trace data (tau2pprof) Performance data management framework (PerfDMF) Parallel trace analysis Translation to VTF (V3.0), EPILOG, OTF formats Integration with VNG (Technical University of Dresden) Online parallel analysis and visualization Integration with CUBE browser (KOJAK, UTK, FZJ)
21
Performance Technology for Productive, High-End Parallel ComputingSun21 ParaProf Parallel Performance Profile Analysis HPMToolkit MpiP TAU Raw files PerfDMF managed (database) Metadata Application Experiment Trial
22
Performance Technology for Productive, High-End Parallel ComputingSun22 Example Applications sPPM ASCI benchmark, Fortran, C, MPI, OpenMP or pthreads Miranda research hydrodynamics code, Fortran, MPI GYRO tokamak turbulence simulation, Fortran, MPI FLASH physics simulation, Fortran, MPI WRF weather research and forecasting, Fortran, MPI S3D 3D combustion, Fortran, MPI
23
Performance Technology for Productive, High-End Parallel ComputingSun23 ParaProf – Flat Profile (Miranda, BG/L) 8K processors node, context, thread Miranda hydrodynamics Fortran + MPI LLNL Run to 64K
24
Performance Technology for Productive, High-End Parallel ComputingSun24 ParaProf – Stacked View (Miranda)
25
Performance Technology for Productive, High-End Parallel ComputingSun25 ParaProf – Callpath Profile (Flash) Flash thermonuclear flashes Fortran + MPI Argonne
26
Performance Technology for Productive, High-End Parallel ComputingSun26 ParaProf – Histogram View (Miranda) 8k processors 16k processors
27
Performance Technology for Productive, High-End Parallel ComputingSun27 NAS BT – Flat Profile How is MPI_Wait() distributed relative to solver direction? Application routine names reflect phase semantics
28
Performance Technology for Productive, High-End Parallel ComputingSun28 NAS BT – Phase Profile (Main and X, Y, Z) Main phase shows nested phases and immediate events
29
Performance Technology for Productive, High-End Parallel ComputingSun29 ParaProf – 3D Full Profile (Miranda) 16k processors
30
Performance Technology for Productive, High-End Parallel ComputingSun30 ParaProf – 3D Full Profile (Flash) 128 processors
31
Performance Technology for Productive, High-End Parallel ComputingSun31 ParaProf – 3D Scatterplot (Miranda) Each point is a “thread” of execution A total of four metrics shown in relation ParaVis 3D profile visualization library JOGL
32
Performance Technology for Productive, High-End Parallel ComputingSun32 ParaProf – Callgraph Zoom (Flash) Zoom in (+) Zoom out (-)
33
Performance Technology for Productive, High-End Parallel ComputingSun33 Performance Tracing on Miranda Use TAU to generate VTF3 traces for Vampir analysis MPI calls with HW counter information (not shown) Detailed code behavior to focus optimization efforts
34
Performance Technology for Productive, High-End Parallel ComputingSun34 S3D on Lemieux (TAU-to-VTF3, Vampir) S3D 3D combustion Fortran + MPI PSC
35
Performance Technology for Productive, High-End Parallel ComputingSun35 S3D on Lemieux (Zoomed)
36
Performance Technology for Productive, High-End Parallel ComputingSun36 Hypothetical Mapping Example Particles distributed on surfaces of a cube Particle* P[MAX]; /* Array of particles */ int GenerateParticles() { /* distribute particles over all faces of the cube */ for (int face=0, last=0; face < 6; face++){ /* particles on this face */ int particles_on_this_face = num(face); for (int i=last; i < particles_on_this_face; i++) { /* particle properties are a function of face */ P[i] =... f(face);... } last+= particles_on_this_face; }
37
Performance Technology for Productive, High-End Parallel ComputingSun37 Hypothetical Mapping Example (continued) How much time (flops) spent processing face i particles? What is the distribution of performance among faces? How is this determined if execution is parallel? int ProcessParticle(Particle *p) { /* perform some computation on p */ } int main() { GenerateParticles(); /* create a list of particles */ for (int i = 0; i < N; i++) /* iterates over the list */ ProcessParticle(P[i]); } … engine work packets
38
Performance Technology for Productive, High-End Parallel ComputingSun38 No Performance Mapping versus Mapping Typical performance tools report performance with respect to routines Does not provide support for mapping TAU’s performance mapping can observe performance with respect to scientist’s programming and problem abstractions TAU (no mapping) TAU (w/ mapping)
39
Performance Technology for Productive, High-End Parallel ComputingSun39 Component-Based Scientific Applications How to support performance analysis and tuning process consistent with application development methodology? Common Component Architecture (CCA) applications Performance tools should integrate with software Design performance observation component Measurement port and measurement interfaces Build support for application component instrumentation Interpose a proxy component for each port Inside the proxy, track caller/callee invocations, timings Automate the process of proxy component creation using PDT for static analysis of components include support for selective instrumentation
40
Performance Technology for Productive, High-End Parallel ComputingSun40 Flame Reaction-Diffusion (Sandia) CCAFFEINE
41
Performance Technology for Productive, High-End Parallel ComputingSun41 Earth Systems Modeling Framework Coupled modeling with modular software framework Instrumentation for ESMF framework and applications PDT automatic instrumentation Fortran 95 code modules C / C++ code modules MPI wrapper library for MPI calls ESMF component instrumentation (using CCA) CCA measurement port manual instrumentation Proxy generation using PDT and runtime interposition Significant callpath profiling used by ESMF team
42
Performance Technology for Productive, High-End Parallel ComputingSun42 Using TAU Component in ESMF/CCA
43
Performance Technology for Productive, High-End Parallel ComputingSun43 Important Questions for Application Developers How does performance vary with different compilers? Is poor performance correlated with certain OS features? Has a recent change caused unanticipated performance? How does performance vary with MPI variants? Why is one application version faster than another? What is the reason for the observed scaling behavior? Did two runs exhibit similar performance? How are performance data related to application events? Which machines will run my code the fastest and why? Which benchmarks predict my code performance best?
44
Performance Technology for Productive, High-End Parallel ComputingSun44 Performance Problem Solving Goals Answer questions at multiple levels of interest Data from low-level measurements and simulations use to predict application performance High-level performance data spanning dimensions machine, applications, code revisions, data sets examine broad performance trends Discover general correlations application performance and features of their external environment Develop methods to predict application performance on lower-level metrics Discover performance correlations between a small set of benchmarks and a collection of applications that represent a typical workload for a given system
45
Performance Technology for Productive, High-End Parallel ComputingSun45 Automatic Performance Analysis Tool (Concept) Performance database Build application Execute application Simple analysis feedback 105% Faster! 72% Faster! build information environment / performance data Offline analysis
46
Performance Technology for Productive, High-End Parallel ComputingSun46 Performance Data Management (PerfDMF) K. Huck, A. Malony, R. Bell, A. Morris, “Design and Implementation of a Parallel Performance Data Management Framework,” ICPP 2005. (awarded best paper)
47
Performance Technology for Productive, High-End Parallel ComputingSun47 Performance Data Mining (Objectives) Conduct parallel performance analysis in a systematic, collaborative and reusable manner Manage performance complexity Discover performance relationship and properties Automate process Multi-experiment performance analysis Large-scale performance data reduction Summarize characteristics of large processor runs Implement extensible analysis framework Abtraction / automation of data mining operations Interface to existing analysis and data mining tools
48
Performance Technology for Productive, High-End Parallel ComputingSun48 Performance Data Mining (PerfExplorer) Performance knowledge discovery framework Data mining analysis applied to parallel performance data comparative, clustering, correlation, dimension reduction, … Use the existing TAU infrastructure TAU performance profiles, PerfDMF Client-server based system architecture Technology integration Java API and toolkit for portability PerfDMF R-project/Omegahat, Octave/Matlab statistical analysis WEKA data mining package JFreeChart for visualization, vector output (EPS, SVG)
49
Performance Technology for Productive, High-End Parallel ComputingSun49 Performance Data Mining (PerfExplorer) K. Huck and A. Malony, “PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing,” SC 2005.
50
Performance Technology for Productive, High-End Parallel ComputingSun50 PerfExplorer Analysis Methods Data summaries, distributions, scatterplots Clustering k-means Hierarchical Correlation analysis Dimension reduction PCA Random linear projection Thresholds Comparative analysis Data management views
51
Performance Technology for Productive, High-End Parallel ComputingSun51 Cluster Analysis Performance data represented as vectors - each dimension is the cumulative time for an event k-means: k random centers are selected and instances are grouped with the "closest" (Euclidean) center New centers are calculated and the process repeated until stabilization or max iterations Dimension reduction necessary for meaningful results Virtual topology, summaries constructed
52
Performance Technology for Productive, High-End Parallel ComputingSun52 sPPM Cluster Analysis
53
Performance Technology for Productive, High-End Parallel ComputingSun53 Hierarchical and K-means Clustering (sPPM)
54
Performance Technology for Productive, High-End Parallel ComputingSun54 Miranda Clusters, Average Values (16K CPUs) Two primary clusters due to MPI_Alltoall behavior … … also inverse relationship between MPI_Barrier and MPI_Group_translate_ranks
55
Performance Technology for Productive, High-End Parallel ComputingSun55 Miranda Modified After code modifications, work distribution is even MPI_Barrier and MPI_Group_translate_ranks are no longer significant contributors to run time
56
Performance Technology for Productive, High-End Parallel ComputingSun56 Flash Clustering on 16K BG/L Processors Four significant events automatically selected Clusters and correlations are visible
57
Performance Technology for Productive, High-End Parallel ComputingSun57 Correlation Analysis Describes strength and direction of a linear relationship between two variables (events) in the data
58
Performance Technology for Productive, High-End Parallel ComputingSun58 Comparative Analysis Relative speedup, efficiency total runtime, by event, one event, by phase Breakdown of total runtime Group fraction of total runtime Correlating events to total runtime Timesteps per second
59
Performance Technology for Productive, High-End Parallel ComputingSun59 User-Defined Views Reorganization of data for multiple parametric studies Construction of views / sub-views with simple operators Simple “wizard” like interface for creating view Application Processors Problem size Application Problem type Processors
60
Performance Technology for Productive, High-End Parallel ComputingSun60 PerfExplorer Future Work Extensions to PerfExplorer framework Examine properties of performance data Automated guidance of analysis Workflow scripting for repeatable analysis Dependency modeling (go beyond correlation) Time-series analysis of phase-based data
61
Performance Technology for Productive, High-End Parallel ComputingSun61 TAU Eclipse Integration Eclipse GUI integration of existing TAU tools New Eclipse plug-in for code instrumentation Integration with CDT and FDT Java, C/C++, and Fortran projects Can be instrumented and run from within eclipse Each project can be given multiple build configurations corresponding to available TAU makefiles All TAU configuration options are available Paraprof tool can be launched automatically
62
Performance Technology for Productive, High-End Parallel ComputingSun62 TAU Eclipse Integration TAU configuration TAU experimentation
63
Performance Technology for Productive, High-End Parallel ComputingSun63 TAU Eclipse Future Work Development of the TAU Eclipse plugins for Java and the CDT/FDT is ongoing Planned features include: Full integration with the Eclipse Parallel Tools project Database storage of project performance data Refinement of the plugin settings interface to allow easier selection of TAU runtime and compiletime options Accessibility of TAU configuration and commandline tools via the Eclipse UI
64
Performance Technology for Productive, High-End Parallel ComputingSun64 Porting TAU to Sun Solaris 10 / Opteron Already supported earlier verions of Solaris Already supported Linux and Windows Will work directly with Sun Opteron systems Now have full support for Solaris 10 Intel-based systems Opteron systems Profiling and tracing with all options Compilers supported Sun Studio 10 compilers (C,C++, and F90) Parallel models supported MPI, OpenMP, hybrid
65
Performance Technology for Productive, High-End Parallel ComputingSun65 Porting Challenge – Nested OpenMP TAU did not support nested parallelism before at all Opari could instrument but does not distinguish nesting Lack of OpenMP support for performance tools determining thread information for nested threads Want to build a portable mechanism for OpenMP OpenMP may provide runtime call in the future Explore what can be implemented otherwise now Now supports nested OpenMP parallelism OpenMP runtime system independent Depends on language feature
66
Performance Technology for Productive, High-End Parallel ComputingSun66 Nested Parallelism Implementation in TAU TAU normally uses omp_get_thread_num() Identify threads with globally unique identifier (tid) Use to create measurement structures for thread event callstack, profile objects, … Approach breaks down for nested parallelism omp_get_thread_num() returns id in current team Not globally unique Use #pragma threadprivate() directive Gives thread local storage (TLS) for identifying threads TAU generates tid for each thread first come, first serve Supported in Intel and Sun compilers (probably IBM) Tested on Sun (Sparc, x64) and SGI Prism (Itanium2)
67
Performance Technology for Productive, High-End Parallel ComputingSun67 ZeptoOS and TAU DOE OS/RTS for Extreme Scale Scientific Computation ZeptoOS scalable components for petascale architectures Argonne National Laboratory and University of Oregon University of Oregon Kernel-level performance monitoring OS component performance assessment and tuning KTAU (Kernel Tuning and Analysis Utilities) integration of TAU infrastructure in Linux kernel integration with ZeptoOS installation on BG/L Port to 32-bit and 64-bit Linux platforms
68
Performance Technology for Productive, High-End Parallel ComputingSun68 Linux Kernel Profiling using TAU – Goals Fine-grained kernel-level performance measurement Parallel applications Support both profiling and tracing Both process-centric and system-wide view Merge user-space performance with kernel-space User-space: (TAU) profile/trace Kernel-space: (KTAU) profile/trace Detailed program-OS interaction data Including interrupts (IRQ) Analysis and visualization compatible with TAU
69
Performance Technology for Productive, High-End Parallel ComputingSun69 KTAU System Architecture
70
Performance Technology for Productive, High-End Parallel ComputingSun70 KTAU on BG/L I/O Node
71
Performance Technology for Productive, High-End Parallel ComputingSun71 KTAU Usage Models Daemon-based monitoring (KTAU-D) Use KTAU-D to monitor (profile/trace) a single process (e.g., CIOD) or entire IO-Node kernel No access to source code of user-space program CIOD kernel-activity available though CIOD source N/A ‘Self’ Monitoring A user-space program can be instrumented (e.g., with TAU) to access its OWN kernel-level trace/profile data ZIOD (ZeptoOS IO-D) source (when available) can be instrumented Can produce MERGED user-kernel trace/profile
72
Performance Technology for Productive, High-End Parallel ComputingSun72 KTAU-D Profile Data KTAU-D can be used to access profile data (system- wide and individual process) of BGL IO-Node Data is obtained at the start and stop of KTAUD, and then the resulting profile is generated (Work in progress) Currrently flat profiles with inclusive/exclusive times and Function call counts are produced (Future work: Call-graph profiles). Profile data is viewed using ParaProf visualization tool
73
Performance Technology for Productive, High-End Parallel ComputingSun73 KTAU-D Trace KTAU-D can be used to access system-wide and individual process trace data of BGL IO-Node Trace from KTAU-D is converted into TAU trace-format which then can be converted into other formats Vampir, Jumpshot Trace from KTAU-D can be used together (merged) with trace from TAU to monitor both user and kernel space activities (Work in progress)
74
Performance Technology for Productive, High-End Parallel ComputingSun74 Experiment to Show KTAU in Use Workload “iotest” benchmark on BGL 2, 4, 16, 32, 48, and 64 compute nodes Use KTAU in IO-Node ZeptoOS Kernel Collect trace data KTAU-D on IO-Node periodically monitors system activities and dumps out trace-data We visualize the activities in the trace using Vampir
75
Performance Technology for Productive, High-End Parallel ComputingSun75 Experiment Setup (Parameters) KTAU: Enable all instrumentation points Number of kernel trace entries per proces = 10K KTAU-D: System-wide tracing Accessing trace every 1 second and dump trace output to a file in user’s home directory through NFS IOTEST: An mpi-based benchmark (open/write/read/close) Running with default parameters (blocksize = 16MB)
76
Performance Technology for Productive, High-End Parallel ComputingSun76 SYS_WRITE KTAU Trace of CIOD running 2, 4, 8, 16, 32 nodes As the number of compute node increase, CIOD has to handle larger amount of sys_call being forwarded. 1,769 sys_write 3,142 sys_write 5,838 sys_write 10,980 sys_write 37,985 sys_write
77
Performance Technology for Productive, High-End Parallel ComputingSun77 Zoomed View of CIOD Trace (8 compute nodes)
78
Performance Technology for Productive, High-End Parallel ComputingSun78 TAU Performance System Status Computing platforms IBM, SGI, Cray, HP, Sun, Hitachi, NEC, Linux clusters, Apple, Windows, … Programming languages C, C++, Fortran 90/95, UPC, HPF, Java, OpenMP, Python Thread libraries pthreads, SGI sproc, Java,Windows, OpenMP Communications libraries MPI-1/2, PVM, shmem, … Compilers IBM, Intel, PGI, GNU, Fujitsu, Sun, NAG, Microsoft, SGI, Cray, HP, NEC, Absoft, Lahey, PathScale, Open64
79
Performance Technology for Productive, High-End Parallel ComputingSun79 Project Affiliations (selected) Lawrence Livermore National Lab Hydrodynamics (Miranda), radiation diffusion (KULL) Open Trace Format (OTF) implementation on BG/L Argonne National Lab ZeptoOS project and KTAU Astrophysical thermonuclear flashes (Flash) Center for Simulation of Accidental Fires and Explosion University of Utah, ASCI ASAP Center, C-SAFE Uintah Computational Framework (UCF) Oak Ridge National Lab Contribution to the Joule Report (S3D, AORSA3D)
80
Performance Technology for Productive, High-End Parallel ComputingSun80 Project Affiliations (continued) Sandia National Lab Simulation of turbulent reactive flows (S3D) Combustion code (CFRFS) Los Alamos National Lab Monte Carlo transport (MCNP) SAIC’s Adaptive Grid Eulerian (SAGE) CCSM / ESMF / WRF climate/earth/weather simulation NSF, NOAA, DOE, NASA, … Common component architecture (CCA) integration Performance Evaluation Research Center (PERC) DOE SciDAC center
81
Performance Technology for Productive, High-End Parallel ComputingSun81 Support Acknowledgements Department of Energy (DOE) Office of Science MICS, Argonne National Lab ASC/NNSA University of Utah ASC/NNSA Level 1 ASC/NNSA, Lawrence Livermore National Lab Department of Defense (DoD) HPC Modernization Office (HPCMO) Programming Environment and Training (PET) NSF Software and Tools for High-End Computing Research Centre Juelich Los Alamos National Laboratory ParaTools
82
Performance Technology for Productive, High-End Parallel ComputingSun82 Acknowledgements Dr. Sameer Shende, Senior Scientist Alan Morris, Senior Software Engineer Wyatt Spear, PRL staff Scott Biersdorff, PRL staff Kevin Huck, Ph.D. student Aroon Nataraj, Ph.D. student Kai Li, Ph.D. student Li Li, Ph.D. student Adnan Salman, Ph.D. student Suravee Suthikulpanit, M.S. student
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.