1 1 A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell Approach F.-A. Kuo 1,2, M.R. Smith 3, and J.-S. Wu 1* 1 Department of.

Slides:



Advertisements
Similar presentations
Joint Mathematics Meetings Hynes Convention Center, Boston, MA
Advertisements

Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Progress Report on SPARTAN Chamber Dynamics Simulation Code Farrokh Najmabadi and Zoran Dragojlovic HAPL Meeting February 5-6, 2004 Georgia Institute of.
Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames Qingan Andy Zhang PhD Candidate Department of Mechanical and Industrial Engineering.
Numerical Simulation of Benchmark Problem 2
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
OpenFOAM on a GPU-based Heterogeneous Cluster
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
1 Internal Seminar, November 14 th Effects of non conformal mesh on LES S. Rolfo The University of Manchester, M60 1QD, UK School of Mechanical,
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
Accurate Numerical Treatment of the Source Terms in the Non-linear Shallow Water Equations J.G. Zhou, C.G. Mingham, D.M. Causon and D.M. Ingram Centre.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Chamber Dynamic Response Modeling Zoran Dragojlovic.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
1/36 Gridless Method for Solving Moving Boundary Problems Wang Hong Department of Mathematical Information Technology University of Jyväskyklä
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
A TWO-FLUID NUMERICAL MODEL OF THE LIMPET OWC CG Mingham, L Qian, DM Causon and DM Ingram Centre for Mathematical Modelling and Flow Analysis Manchester.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
1 CFD Analysis Process. 2 1.Formulate the Flow Problem 2.Model the Geometry 3.Model the Flow (Computational) Domain 4.Generate the Grid 5.Specify the.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Parallel Adaptive Mesh Refinement Combined With Multigrid for a Poisson Equation CRTI RD Project Review Meeting Canadian Meteorological Centre August.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
Stratified Magnetohydrodynamics Accelerated Using GPUs:SMAUG.
AIAA th AIAA/ISSMO Symposium on MAO, 09/05/2002, Atlanta, GA 0 AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINITIES Serhat Hosder,
Hybrid WENO-FD and RKDG Method for Hyperbolic Conservation Laws
A Hybrid Particle-Mesh Method for Viscous, Incompressible, Multiphase Flows Jie LIU, Seiichi KOSHIZUKA Yoshiaki OKA The University of Tokyo,
C M C C Centro Euro-Mediterraneo per i Cambiamenti Climatici COSMO General Meeting - September 8th, 2009 COSMO WG 2 - CDC 1 An implicit solver based on.
A Novel Wave-Propagation Approach For Fully Conservative Eulerian Multi-Material Simulation K. Nordin-Bates Lab. for Scientific Computing, Cavendish Lab.,
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Discontinuous Galerkin Methods and Strand Mesh Generation
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
Discontinuous Galerkin Methods for Solving Euler Equations Andrey Andreyev Advisor: James Baeder Mid.
Numerical Investigation into Potential Flow Around High-speed Hydrofoil Assisted Craft ZHONGYU YANG supervised by Prof G.E HEARN and.
RPI Master’s Project Proposal Noel A. Modesto-Madera September 28, 2010 Numerical Investigation of Supersonic Flow Over a Blunt Body.
Review of Urban Modeling Program at LLNL CRTI RD Project Review Meeting Canadian Meteorological Centre August 22-23, 2006.
J.-Ph. Braeunig CEA DAM Ile-de-FrancePage 1 Jean-Philippe Braeunig CEA DAM Île-de-France, Bruyères-le-Châtel, LRC CEA-ENS Cachan
© Fluent Inc. 11/24/2015J1 Fluids Review TRN Overview of CFD Solution Methodologies.
Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.
CFX-10 Introduction Lecture 1.
QCAdesigner – CUDA HPPS project
Parallel Solution of the Poisson Problem Using MPI
Graduate Institute of Astrophysics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics Chia-Yu Hu OSU Radio Simulation Workshop.
GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
1 Rocket Science using Charm++ at CSAR Orion Sky Lawlor 2003/10/21.
School of Aerospace Engineering MITE Numerical Simulation of Centrifugal Compressor Stall and Surge Saeid NiaziAlex SteinLakshmi N. Sankar School of Aerospace.
AIAA th AIAA/ISSMO Symposium on MAO, 09/05/2002, Atlanta, GA 0 AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINTIES Serhat Hosder, Bernard.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
By Arseniy Kotov CAL POLY San Luis Obispo, Aerospace Engineering Intern at Applied Modeling & Simulation Branch Mentors: Susan Cliff, Emre Sozer, Jeff.
Application of Compact- Reconstruction WENO Schemes to the Navier-Stokes Equations Alfred Gessow Rotorcraft Center Aerospace Engineering Department University.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Numerical Methods for Acoustic Problems with Complex Geometries Based on Cartesian Grids D.N. Vedder
Xing Cai University of Oslo
Parallel Plasma Equilibrium Reconstruction Using GPU
Chamber Dynamic Response Modeling
Transient Mixed Flow Modeling Capability in SRH-2D for Culverts and Bridges Yong G. Lai.
A TWO-FLUID NUMERICAL MODEL OF THE LIMPET OWC
CI2 – Inviscid Strong Vortex-Shock Wave Interaction
AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINITIES
AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINTIES
AIAA OBSERVATIONS ON CFD SIMULATION UNCERTAINTIES
High Accuracy Schemes for Inviscid Traffic Models
Presentation transcript:

1 1 A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell Approach F.-A. Kuo 1,2, M.R. Smith 3, and J.-S. Wu 1* 1 Department of Mechanical Engineering National Chiao Tung University Hsinchu, Taiwan 2 National Center for High-Performance Computing, NARL Hsinchu, Taiwan 3 Department of Mechanical Engineering National Cheng Kung University Tainan, Taiwan * IWCSE Taipei, Taiwan October 14-17, 2013 Session: Supercomputer/GPU and Algorithms (GPU-2)

Background & Motivation Objectives Split HLL (SHLL) Scheme Cubic-Spline Immersed Boundary Method (IBM) Results & Discussion  Parallel Performance  Demonstrations Conclusion and Future work Outline 22

33 Background & Motivation

4 Computational fluid dynamics (CFD) has played an important role in accelerating the progress of aerospace/space and other technologies. For several challenging 3D flow problems, parallel computing of CFD bceomes necessary to greatly shorten the very lengthy computational time. Parallel computing of CFD has evolved from SIMD type vectorized processing to SPMD type distributed-memory processing for the past 2 decades, mainly because of the much lower cost for H/W of the latter and easier programming. 4 Parallel CFD

5 SIMD (Single instruction, multiple data), which is a class of parallel computers, performs the same operation on multiple data points at the instruction level simultaneously.  SSE/AVX instructions in CPU and GPU computation, e.g., CUDA. SPMD (Single program, multiple data) is a higher level abstraction where programs are run across multiple processors and operate on different subsets of the data.  Message passing programming on distributed memory computer architectures, e.g., MPI. 5 SIMD vs. SPMD

6 Most well-known parallel CFD codes adopt SPMD parallelism using MPI.  e.g., Fluent (Ansys), CFL3D (NASA), to name a few. Recently, because of the potentially very high C/P ratio by using graphics processor units (GPUs), parallelization of CFD code using GPUs has become an active research area based on CUDA, developed by Nvidia. However, redesign of the numerical scheme may be necessary to take full advantage of the GPU architecture. 6 MPI vs. CUDA

7 Split Harten-Lax-van Leer (SHLL) scheme (Kuo et al., 2011)  a highly local numerical scheme, modified from the original HLL scheme  Cartesian grid  ~ 60 times of speedup (Nvidia C1060 GPU vs. Intel X5472 Xeon CPU) with explicit implementation However, it is difficult to treat objects with complex geometry accurately, especially for high-speed gas flow. One example is given in the next page. Thus, how to take advantage of easy implementation of Cartesian grid on GPUs, while improving the capability of treating objects with complex geometry becomes important in further extending the applicability of SHLL scheme in CFD simulations. 7 Split HLL Scheme on GPUs

8 Spurious waves are often generated using staircase-like solid surface for high-speed gas flows. 8 Shock direction Staircase-like IBM Staircase-like vs. IBM

9 Immersed boundary method (IBM) (Peskin, 1972; Mittal & Iaccarino, 2005 )  easy treatment of objects with complex geometry on a Cartesian grid  grid computation near the objects become automatic or very easy  easy treatment of moving objects in computational domain w/o remeshing Major idea of IBM is simply to enforce the B.C.’s at computational grid points thru interpolation among fluid grid and B.C.’s at solid boundaries. Stencil of IBM operation is local in general. Enabling an efficient use of original numerical scheme, e.g., SHLL Easy parallel implementation 9 Immersed Boundary Method

10 Objectives

11 To develop and validate an explicit cell- centered finite-volume solver for solving Euler equation, based on SHLL scheme, on a Cartesian grid with cubic-spline IBM on multiple GPUs To study the parallel performance of the code on single and multiple GPUs To demonstrate the capability of the code with several applications 11 Goals

12 Split HLL Scheme

13 SHLL Scheme i-1 i i i+1 SIMD model for 2D flux computation +Flux- Flux Original HLL Introduce local approximations Final form (SHLL) is a highly local scheme New S R & S L term are approximated w/o involving the neighbor-cell data. A highly local flux computational scheme: great for GPU!

14 Final Form (SHLL) Flux computation is perfect for GPU application. Almost the same as the vector addition case. > 60 times speedup possible using a single Tesla C1060 GPU device. Performance compares to single thread of a high-performance CPU (Intel Xeon X5472) i-1 i i i+1 SIMD model for 2D flux computation +Flux- Flux SHLL Scheme - 2

15 Cubic-spline IBM

16 Two Critical Issues of IBM How to approximate solid boundaries?  Local Cubic Spline for reconstructing solid boundaries w/ much fewer points  Easier calculation of surface normal/tangent How to apply IBM in a cell-centered FVM framework?  Ghost-cell approach  Obtain ghost cell properties by the interpolation of data among neighboring fluid cells  Enforce BCs at solid boundaries to ghost cells through data mapping from image points

17 1.Define a cubic-spline function for each segment of boundary data to best fit solid boundary geometry 2.Identify all the solid cells, fluid cells and ghost points 3.Locate image points corresponding to ghost cells Cell Identification Solid cell Fluid cell Solid boundary curve Ghost cell

18 Cubic-Spline Reconstruction (Solid Boundary) The cubic spline method provides the advantages including : 1.A high order curve fitting boundary 2.Find these ghost cells easily. 3.Calculate the normal vector which is normal to the body surface. 18

19 BCs of Euler Eqns. unit normal of body surface Approximated form

20 Approximate the properties of the image points using bi-linear interpolation among neighboring fluid cells IBM Procedures Image point Ghost point Interpolation Fluid cell Solid cell

21 SHLL/IBM Scheme on GPU

22 Nearly All-Device Computation 22 Initialize Flux calculation State calculation CFL calculation Set GPU device ID and flowtime T > flowtime flowtime += dt Output the result True False Device Host Start IBM

23 Results & Discussion (Parallel Performance)

24 Also named as “Schardin’s problem” Test Conditions –Moving shock w/ Mach 1.5 –Resolution: 2000x2000 cells –CFL max =0.2 –Physical time: 0.35 sec. for 9843 time- steps using one GPU 24 Parallel Performance - 1 L=1 H=1 Moving shock x  t=0

25 Resolution  2000x2000 cells GPU cluster  GPU: Geforce GTX590 (2x 512 cores, 1.2 Ghz 3GB DDR5)  CPU: Intel Xeon X5472 Overhead w/ IBM  3% only Speedup  GPU/CPU: ~ 60x  GPU/GPU: GPUs  GPU/GPU: GPUs 25 Sec.Speedup Parallel Performance - 2

26 Results & Discussion (Demonstrations)

27 In the case of 400x400 cells w/o IBM, the staircase solid boundary generates spurious waves, which destroys the accuracy of the surface properties. By comparison, the case w/ IBM shows much more improvement for the surface properties. w/ IBM w/o IBM Shock over a finite wedge - 1

28 with IBMw/o IBM Shock over a finite wedge - 2 All important physical phenomena are well captured by the solver with IBM without spurious wave generation. t= 0.35 s Density contour comparison

29 Transonic Flow past a NACA Airfoil pressure IBM result Staircase boundary w/o IBM In the left case, the spurious waves appear near the solid boundary, but in the right case, we modify the boundary by using the IBM.

30 Transonic Flow past a NACA Airfoil Upper surf. Lower surf. Distribution of pressure around the surface of the airfoil Ghost cell method, J. Liu et al., 2009 New approach method These 2 results are very closed, and the right result is made by Liu in 2009, and the left result is made by the cubic spline IBM.

31 Transonic Flow past a NACA Airfoil Top-side shock wave comparison * Petr Furmánek, “Numerical Solution of Steady and Unsteady Compressible Flow”, Czech Technical University in Prague, 2008 New approach method Furmanek*, 2008

32 Transonic Flow past a NACA Airfoil Bottom-side shock wave comparison * Petr Furmánek, “Numerical Solution of Steady and Unsteady Compressible Flow”, Czech Technical University in Prague, 2008 New approach method Furmanek*, 2008

33 Conclusion & Future Work

34 A cell-centered 2-D finite-volume solver for the inviscid Euler equation, which can easily treat objects with complex geometry on a Cartesian grid by using the cubic-spline IBM on multiple GPUs, is completed and validated The addition of cubic-spline IBM only increase 3% of the computational time, which is negligible. Speedup for GPU/CPU generally exceeds 60 times on a single GPU (Nvidia, Telsa C1060) as compared to that on a single thread of an Intel X5472 Xeon CPU. Speedup for GPUs/GPU reaches 3.6 at 4 GPUs (GeForce) for a simulation w/ 2000x2000 cells. Summary

35 To modify the Cartesian grid to the adaptive mesh grid. To simulate the moving boundary problem and real-life problems with this immersed boundary method To change the SHLL solver to the true- direction finite volume solver, likes QDS Future Work

36 Thanks for your patient and Questions ?