This presentation was adapted from explicitGPU_INT_drive.odp

Slides:



Advertisements
Similar presentations
Formal Computational Skills
Advertisements

1 Using ANN to solve the Navier Stoke Equations Motivation: Solving the complete Navier Stokes equations using direct numerical simulation is computationally.
Eigenvalue and eigenvectors  A x = λ x  Quantum mechanics (Schrödinger equation)  Quantum chemistry  Principal component analysis (in data mining)
Coupling Continuum Model and Smoothed Particle Hydrodynamics Methods for Reactive Transport Yilin Fang, Timothy D Scheibe and Alexandre M Tartakovsky Pacific.
Computer Organization and Architecture 18 th March, 2008.
GPU Computing with CUDA as a focus Christie Donovan.
Demo of running CUDA programs on GPU and potential speed-up over CPU ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 10, 2011.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Computer Graphics & Scientific Computing
Digital Filter Stepsize Control in DASPK and its Effect on Control Optimization Performance Kirsten Meeker University of California, Santa Barbara.
Page - 1 Rocketdyne Propulsion & Power Role of EASY5 in Integrated Product Development Frank Gombos Boeing Canoga Park, CA.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
Are their more appropriate domain-specific performance metrics for science and engineering HPC applications available then the canonical “percent of peak”
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
A Framework for Distributed Model Predictive Control
ME451 Kinematics and Dynamics of Machine Systems
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.
The System and Software Development Process Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.
An Integrated Hydrodynamic Approach to River, Sewer and Overland Flow Modelling presented by Andrew Walker Innovyze Ltd.
Dr. Jacob Barhen Computer Science and Mathematics Division.
Threads. Readings r Silberschatz et al : Chapter 4.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
Medical Applications of Signal Processing Research Memory Data Joint Image Reconstruction and Field Map Estimation in MRI H.M. Nguyen,
Part 3 Chapter 9 Gauss Elimination
GPU Architecture and Its Application
Productive Performance Tools for Heterogeneous Parallel Computing
Parallel Computing and Parallel Computers
18-447: Computer Architecture Lecture 30B: Multiprocessors
CHAPTER 2 - EXPLICIT TRANSIENT DYNAMIC ANALYSYS
Hasan Nourdeen Martin Blunt 10 Jan 2017
Chapter 30.
Overview of Molecular Dynamics Simulation Theory
Tohoku University, Japan
Enabling machine learning in embedded systems
Date of download: 11/5/2017 Copyright © ASME. All rights reserved.
Efficient and flexible modelling of dynamical biochemical systems
Introduction to Parallelism.
ECE 576 – Power System Dynamics and Stability
IMPLEMENTATION OF MULTIPLE-PRECISION MODULAR MULTIPLICATION ON GPU
Nathan Grabaskas: Batched LA and Parallel Communication Optimization
GPU Implementations for Finite Element Methods
Lesson Objectives Aims You should be able to:
Scientific Computing Partial Differential Equations Implicit Solution of Heat Equation.
Computational Fluid Dynamics - Fall 2001
Objective Numerical methods Finite volume.
About Hardware Optimization in Midas SW
The Free Lunch Ended 7 Years Ago
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
AAE 556 Aeroelasticity Lecture 22
COMP60621 Designing for Parallelism
COMP60621 Fundamentals of Parallel and Distributed Systems
Multithreaded Programming
2.C Memory GCSE Computing Langley Park School for Boys.
Corentin Briat, Ankit Gupta, Mustafa Khammash  Cell Systems 
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Lecture 2 The Art of Concurrency
University of Virginia
Chapter 01: Introduction
Multicore and GPU Programming
Portable Performance for Many-Core Particle Advection
COMP60611 Fundamentals of Parallel and Distributed Systems
CS703 – Advanced Operating Systems
Multicore and GPU Programming
N-Body Gravitational Simulations
Presentation transcript:

New Approaches to Solving Large Kinetic Networks in Scientific Applications This presentation was adapted from explicitGPU_INT_drive.odp See the folder drive_png for whole-page png images saved from a PDF version of a Google Drive presentation. Some of these images are pasted as whole pages into present doc. See /home/guidry/mwg/research/workshopsConf/2014/shanghaiSchoolMay2014/presentation/images/ for many original images

See pg-1.png

See pg-4.png

There are two general approaches that we might take: Coupling Realistic Thermonuclear Networks to Hydrodynamics To incorporate realistic networks in astrophysical simulations we must improve (substantially) the speed and efficiency for computing kinetic networks coupled to fluid dynamics. There are two general approaches that we might take: Improve the algorithms used to solve the kinetic networks. Improve the hardware on which the algorithms are executed. This presentation is about using both to affect a dramatic improvement in the speed and efficiency for solving this problem. See pg-5.png

Integrating Stiff Equations Numerically Explicit numerical integration: To advance the solution from time tn to tn+1, only information already available at tn is required. Implicit numerical integration: To advance the solution from time tn to tn+1, information at the new point tn+1 is required, implying an iterative solution. Thus, for numerical integration Explicit methods are inherently simple, but potentially unstable. Implicit methods are inherently complicated, but stable.

Methods to Integrate Stiff Equations There are two general approaches that we might use to deal with stiffness. If we could stabilize explicit integration we could do each timestep more quickly in large networks. The traditional way: Integrate equations implicitly, which is stable but requires an iterative solution with matrix inversions at each step (expensive for large networks). A new way: Replace equations with some that are more stable and integrate them explicitly.

Macroscopic equilibration Fundamental Sources of Stiffness Negative populations Macroscopic equilibration Microscopic equilibration eq1.png The key to stabilizing explicit integration is to understand the three basic sources of stiffness for a typical reaction network: Negative populations, Macroscopic equilibration Microscopic equilibration.

Example: Explicit Integration for a Nova Simulation QSS method Method Steps Speed Implicit 1332 1 Asy 935 10 QSS 777 12 134 isotopes 1531 couplings REACLIB

Summary of Results: Explicit vs Implicit Speedup for a Single Network See scalingNetworkSizeNovaSimul.png Thus our new algorithms can give a speed increase of about an order of magnitude for networks with several hundred species. Now let us consider the role of modern hardware in this problem.

Computing Power for Astrophysics Titan Total of 299,008 CPU cores and 18,868 GPUs. Capable of 27 x 1015 floating point operations per second (27 petaflops).

GPU Acceleration for the Network CUDAnetworkFlow.png Implemented with CUDA C/C++

Scaling for a Single Network See scaling_isotopes2.png This is impressive speedup but a single network utilizes only a small fraction of available GPU threads. Greater efficiency requires that we give the GPU more work.

Stacking Multiple Networks on a GPU See multipleNetworks.png Thus, not only might it be possible to run one network of realistic size faster than is now feasible, it may be possible to run many such networks faster than it is now possible to run one such network.

Timing: Concurrent Network Launches Brock et al (2015) See fern_newsum.png Haidar et al (2015)

Timing: Concurrent Network Launches See fern_newsum2.png

References Algebraic Stabilization of Explicit Numerical Integration for Extremely Stiff Reaction Networks, Mike Guidry, J. Comp. Phys. 231, 5266-5288 (2012). [ArXiv:1112.4778] Explicit Integration of Extremely-Stiff Reaction Networks: Quasi-Steady-State Methods, M. W. Guidry and J. A. Harris, Comput. Sci. Disc. 6, 015002 (2013) [ArXiv:1112.4750] Explicit Integration of Extremely-Stiff Reaction Networks: Partial Equilibrium Methods, M. W. Guidry, J. J. Billings, and W. R. Hix, Comput. Sci. Disc. 6, 015003 (2013) [arXiv: 1112.4738] Explicit Integration of Extremely-Stiff Reaction Networks: Asymptotic Methods, M. W. Guidry, R. Budiardja, E. Feger, J. J. Billings, W. R. Hix, O. E. B. Messer, K. J. Roche, E. McMahon, and M. He, Comput. Sci. Disc. 6, 015001 (2013) [ArXiv: 1112.4716] Explicit Integration with GPU Acceleration for Large Kinetic Networks, Ben Brock, Andrew Belt, Jay Billings, and Mike Guidry, submitted to J. Comp. Phys. [arXiv:1409.5826] arXiv = http://arxiv.org/

Collaborators Ben Brock, University of Tennessee Daniel Shyles, University of Tennessee Azzam Haidar, University of Tennessee Stan Tomov, University of Tennessee Jay Billings, Oak Ridge National Laboratory Andrew Belt, University of Tennessee