Two Phase Flow using two levels of preconditioning on the GPU Prof. Kees Vuik and Rohit Gupta Delft Institute of Applied Mathematics.

Slides:



Advertisements
Similar presentations
Applied Mathematics Institute 3TU.AMI Ship Simulator Kees Vuik Delft University of Technology.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
Motivation Desktop accelerators (like GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance.
Computes the partial dot products for only the diagonal and upper triangle of the input matrix. The vector computed by this architecture is added to the.
L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
1 A Domain Decomposition Analysis of a Nonlinear Magnetostatic Problem with 100 Million Degrees of Freedom H.KANAYAMA *, M.Ogino *, S.Sugimoto ** and J.Zhao.
High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech 1 AFOSR-BRI Workshop December Amit Amritkar,
September 15, Utilizing CUDA for Preconditioned GMRES Solvers DCABES’09 Shiming Xu 1, Hai Xiang Lin 1, Wei Xue 2, and Ke Wang 3 1 Delft Institute.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Sparse Matrix-Vector Multiplication on Throughput-Oriented Processors
Computación algebraica dispersa con GPUs y su aplicación en tomografía electrónica Non-linear iterative optimization method for locating particles using.
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Automatic Performance Tuning of SpMV on GPGPU Xianyi Zhang Lab of Parallel Computing Institute of Software Chinese Academy of Sciences
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
GPU Architecture and Programming
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech AFOSR-BRI Workshop July 20-21, 2014 Keyur Joshi,
Simulating complex surface flow by Smoothed Particle Hydrodynamics & Moving Particle Semi-implicit methods Benlong Wang Kai Gong Hua Liu
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
Parallel Solution of the Poisson Problem Using MPI
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
Atrial fibrillation Atrial flutter Atrial tachycardia AV nodal reentrant tachycardia AV reentrant tachycardia Bigemin Premature ventricular contraction.
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Partial Derivatives Example: Find If solution: Partial Derivatives Example: Find If solution: gradient grad(u) = gradient.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.
Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Improvement to Hessenberg Reduction
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
Porting the MIT Global Circulation Model on the CellBE Processor
Xing Cai University of Oslo
Amit Amritkar & Danesh Tafti Eric de Sturler & Kasia Swirydowicz
I. E. Venetis1, N. Nikoloutsakos1, E. Gallopoulos1, John Ekaterinaris2
Parallel Computing Lecture
Solving Poisson Equations Using Least Square Technique in Image Editing Colin Zheng Yi Li.
The Problem Finding a needle in haystack An expert (CPU)
Introduction to Parallelism.
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
A Parallel Hierarchical Solver for the Poisson Equation
Deflated Conjugate Gradient Method
Deflated Conjugate Gradient Method
Shengxin Zhu The University of Oxford
A robust preconditioner for the conjugate gradient method
GENERAL VIEW OF KRATOS MULTIPHYSICS
All-Pairs Shortest Paths
Supported by the National Science Foundation.
Introduction to Scientific Computing II
Introduction to Scientific Computing II
EE 4xx: Computer Architecture and Performance Programming
Introduction to Scientific Computing II
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
6- General Purpose GPU Programming
Multicore and GPU Programming
Presentation transcript:

Two Phase Flow using two levels of preconditioning on the GPU Prof. Kees Vuik and Rohit Gupta Delft Institute of Applied Mathematics

Problem Description Delft Institute of Applied Mathematics

Computational Model Boundary Conditions Delft Institute of Applied Mathematics

Graphical Processing Unit Delft Institute of Applied Mathematics SIMD based Architecture: Army of Smaller Simpler Processors Larger Memory Bandwidth Programmer Managed Caches

Preconditioning M -1 =(I-LD -1 )(I-D -1 L T ) Delft Institute of Applied Mathematics

Deflation Optimized Storage of AZ Stripe-Wise Domains Splitting Chosen X = ( I – P T ) x + P T x P=I-AQ Q=ZE -1 Z T E=Z T AZ Delft Institute of Applied Mathematics

Factors Affecting Speed-Up Coalesced Memory Access More Deflation Vectors More preconditioning blocks

Results: Deflated Preconditioned CG HostDeviceHostDevice Number of Iterations34 37 Execution Time (seconds) Relative Error Norm of the Solution 5.25e e-03 SpeedUp Delft Institute of Applied Mathematics Poisson Type Matrix solved with Single Precision Math. ~1 Millions Unknowns (1024x1024). Precision Criteria 10e-04. Number of Blocks =512. Deflation Vectors=4096

Two Phase (Double Precision) Preliminary Results Deflated Preconditioned(IP) Conjugate Gradient Precision Criteria 10e-05. Deflation Vectors=4096 HostDevice Number of Iterations 394 Execution Time (seconds) Relative Error Norm of the Solution 7.80e e-02 SpeedUp 16.8 Delft Institute of Applied Mathematics

Conclusion Deflation suits the many core platform Two Phase requires double precision Deflation with IP Preconditioning wins Delft Institute of Applied Mathematics

References 1.J. M. Tang and C. Vuik. Acceleration of preconditioned krylov solvers for bubbly ow problems. Lecture Notes in Computer Science, Parallel Processing and Applied Mathematics, 4967(1): , S.P. Van der Pijl, A. Segal, C. Vuik, and P. Wesseling. A mass conserving level-set method for modelling of multi-phase ows. International Journal for Numerical Methods in Fluids, 47: , M. Ament, G. Knittel, D. Weiskopf, and W. Strbaer. A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-GPU platform. amentmo/docs/ament-pcgip-PDP-2010.pdf, R. Gupta. Implementation of the Deated Preconditioned Conjugate Gradient Method for Bubbly Flow on the Graphical Processing Unit(GPU). Master's thesis, Delft University of Technology, Delft, Delft Institute of Applied Mathematics