This presentation was adapted from explicitGPU_INT_drive.odp

Slides:

Advertisements

Similar presentations

Formal Computational Skills

Advertisements

1 Using ANN to solve the Navier Stoke Equations Motivation: Solving the complete Navier Stokes equations using direct numerical simulation is computationally.

Eigenvalue and eigenvectors  A x = λ x  Quantum mechanics (Schrödinger equation)  Quantum chemistry  Principal component analysis (in data mining)

Coupling Continuum Model and Smoothed Particle Hydrodynamics Methods for Reactive Transport Yilin Fang, Timothy D Scheibe and Alexandre M Tartakovsky Pacific.

Computer Organization and Architecture 18 th March, 2008.

GPU Computing with CUDA as a focus Christie Donovan.

Demo of running CUDA programs on GPU and potential speed-up over CPU ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 10, 2011.

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

Computer Graphics & Scientific Computing

Digital Filter Stepsize Control in DASPK and its Effect on Control Optimization Performance Kirsten Meeker University of California, Santa Barbara.

Page - 1 Rocketdyne Propulsion & Power Role of EASY5 in Integrated Product Development Frank Gombos Boeing Canoga Park, CA.

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

Are their more appropriate domain-specific performance metrics for science and engineering HPC applications available then the canonical “percent of peak”

Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.

A Framework for Distributed Model Predictive Control

ME451 Kinematics and Dynamics of Machine Systems

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.

Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.

Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.

The System and Software Development Process Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.

Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.

An Integrated Hydrodynamic Approach to River, Sewer and Overland Flow Modelling presented by Andrew Walker Innovyze Ltd.

Dr. Jacob Barhen Computer Science and Mathematics Division.

Threads. Readings r Silberschatz et al : Chapter 4.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

Medical Applications of Signal Processing Research Memory Data Joint Image Reconstruction and Field Map Estimation in MRI H.M. Nguyen,

Part 3 Chapter 9 Gauss Elimination

GPU Architecture and Its Application

Productive Performance Tools for Heterogeneous Parallel Computing

Parallel Computing and Parallel Computers

18-447: Computer Architecture Lecture 30B: Multiprocessors

CHAPTER 2 - EXPLICIT TRANSIENT DYNAMIC ANALYSYS

Hasan Nourdeen Martin Blunt 10 Jan 2017

Overview of Molecular Dynamics Simulation Theory

Tohoku University, Japan

Enabling machine learning in embedded systems

Date of download: 11/5/2017 Copyright © ASME. All rights reserved.

Efficient and flexible modelling of dynamical biochemical systems

Introduction to Parallelism.

ECE 576 – Power System Dynamics and Stability

IMPLEMENTATION OF MULTIPLE-PRECISION MODULAR MULTIPLICATION ON GPU

Nathan Grabaskas: Batched LA and Parallel Communication Optimization

GPU Implementations for Finite Element Methods

Lesson Objectives Aims You should be able to:

Scientific Computing Partial Differential Equations Implicit Solution of Heat Equation.

Computational Fluid Dynamics - Fall 2001

Objective Numerical methods Finite volume.

About Hardware Optimization in Midas SW

The Free Lunch Ended 7 Years Ago

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

AAE 556 Aeroelasticity Lecture 22

COMP60621 Designing for Parallelism

COMP60621 Fundamentals of Parallel and Distributed Systems

Multithreaded Programming

2.C Memory GCSE Computing Langley Park School for Boys.

Corentin Briat, Ankit Gupta, Mustafa Khammash Cell Systems

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Lecture 2 The Art of Concurrency

University of Virginia

Chapter 01: Introduction

Multicore and GPU Programming

Portable Performance for Many-Core Particle Advection

COMP60611 Fundamentals of Parallel and Distributed Systems

CS703 – Advanced Operating Systems

Multicore and GPU Programming

N-Body Gravitational Simulations

Presentation transcript:

New Approaches to Solving Large Kinetic Networks in Scientific Applications This presentation was adapted from explicitGPU_INT_drive.odp See the folder drive_png for whole-page png images saved from a PDF version of a Google Drive presentation. Some of these images are pasted as whole pages into present doc. See /home/guidry/mwg/research/workshopsConf/2014/shanghaiSchoolMay2014/presentation/images/ for many original images

See pg-1.png

See pg-4.png

There are two general approaches that we might take: Coupling Realistic Thermonuclear Networks to Hydrodynamics To incorporate realistic networks in astrophysical simulations we must improve (substantially) the speed and efficiency for computing kinetic networks coupled to fluid dynamics. There are two general approaches that we might take: Improve the algorithms used to solve the kinetic networks. Improve the hardware on which the algorithms are executed. This presentation is about using both to affect a dramatic improvement in the speed and efficiency for solving this problem. See pg-5.png

Integrating Stiff Equations Numerically Explicit numerical integration: To advance the solution from time tn to tn+1, only information already available at tn is required. Implicit numerical integration: To advance the solution from time tn to tn+1, information at the new point tn+1 is required, implying an iterative solution. Thus, for numerical integration Explicit methods are inherently simple, but potentially unstable. Implicit methods are inherently complicated, but stable.

Methods to Integrate Stiff Equations There are two general approaches that we might use to deal with stiffness. If we could stabilize explicit integration we could do each timestep more quickly in large networks. The traditional way: Integrate equations implicitly, which is stable but requires an iterative solution with matrix inversions at each step (expensive for large networks). A new way: Replace equations with some that are more stable and integrate them explicitly.

Macroscopic equilibration Fundamental Sources of Stiffness Negative populations Macroscopic equilibration Microscopic equilibration eq1.png The key to stabilizing explicit integration is to understand the three basic sources of stiffness for a typical reaction network: Negative populations, Macroscopic equilibration Microscopic equilibration.

Example: Explicit Integration for a Nova Simulation QSS method Method Steps Speed Implicit 1332 1 Asy 935 10 QSS 777 12 134 isotopes 1531 couplings REACLIB

Summary of Results: Explicit vs Implicit Speedup for a Single Network See scalingNetworkSizeNovaSimul.png Thus our new algorithms can give a speed increase of about an order of magnitude for networks with several hundred species. Now let us consider the role of modern hardware in this problem.

Computing Power for Astrophysics Titan Total of 299,008 CPU cores and 18,868 GPUs. Capable of 27 x 1015 floating point operations per second (27 petaflops).

GPU Acceleration for the Network CUDAnetworkFlow.png Implemented with CUDA C/C++

Scaling for a Single Network See scaling_isotopes2.png This is impressive speedup but a single network utilizes only a small fraction of available GPU threads. Greater efficiency requires that we give the GPU more work.

Stacking Multiple Networks on a GPU See multipleNetworks.png Thus, not only might it be possible to run one network of realistic size faster than is now feasible, it may be possible to run many such networks faster than it is now possible to run one such network.

Timing: Concurrent Network Launches Brock et al (2015) See fern_newsum.png Haidar et al (2015)

Timing: Concurrent Network Launches See fern_newsum2.png

References Algebraic Stabilization of Explicit Numerical Integration for Extremely Stiff Reaction Networks, Mike Guidry, J. Comp. Phys. 231, 5266-5288 (2012). [ArXiv:1112.4778] Explicit Integration of Extremely-Stiff Reaction Networks: Quasi-Steady-State Methods, M. W. Guidry and J. A. Harris, Comput. Sci. Disc. 6, 015002 (2013) [ArXiv:1112.4750] Explicit Integration of Extremely-Stiff Reaction Networks: Partial Equilibrium Methods, M. W. Guidry, J. J. Billings, and W. R. Hix, Comput. Sci. Disc. 6, 015003 (2013) [arXiv: 1112.4738] Explicit Integration of Extremely-Stiff Reaction Networks: Asymptotic Methods, M. W. Guidry, R. Budiardja, E. Feger, J. J. Billings, W. R. Hix, O. E. B. Messer, K. J. Roche, E. McMahon, and M. He, Comput. Sci. Disc. 6, 015001 (2013) [ArXiv: 1112.4716] Explicit Integration with GPU Acceleration for Large Kinetic Networks, Ben Brock, Andrew Belt, Jay Billings, and Mike Guidry, submitted to J. Comp. Phys. [arXiv:1409.5826] arXiv = http://arxiv.org/

Collaborators Ben Brock, University of Tennessee Daniel Shyles, University of Tennessee Azzam Haidar, University of Tennessee Stan Tomov, University of Tennessee Jay Billings, Oak Ridge National Laboratory Andrew Belt, University of Tennessee