University of Virginia

Slides:



Advertisements
Similar presentations
Mutigrid Methods for Solving Differential Equations Ferien Akademie 05 – Veselin Dikov.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University.
Computational Modeling for Engineering MECN 6040
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
An Efficient Multigrid Solver for (Evolving) Poisson Systems on Meshes Misha Kazhdan Johns Hopkins University.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 26, 2013, DyanmicParallelism.ppt CUDA Dynamic Parallelism These notes will outline CUDA.
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
Solving Linear Systems (Numerical Recipes, Chap 2)
Image Reconstruction Group 6 Zoran Golic. Overview Problem Multigrid-Algorithm Results Aspects worth mentioning.
Interactive Deformation and Visualization of Level-Set Surfaces Using Graphics Hardware Aaron Lefohn Joe Kniss Charles Hansen Ross Whitaker Aaron Lefohn.
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz, Ian Farmer, Eitan Grinspun, Peter Schröder Caltech ASCI Center.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Mapping Computational Concepts to GPU’s Jesper Mosegaard Based primarily on SIGGRAPH 2004 GPGPU COURSE and Visualization 2004 Course.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.
© Fluent Inc. 9/5/2015L1 Fluids Review TRN Solution Methods.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.
Systems of Linear Equations Iterative Methods
Enhancing GPU for Scientific Computing Some thoughts.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Efficient Data Parallel Computing on GPUs Cliff Woolley University of Virginia / NVIDIA.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Cg Programming Mapping Computational Concepts to GPUs.
Scientific Computing Partial Differential Equations Poisson Equation.
STE 6239 Simulering Friday, Week 1: 5. Scientific computing: basic solvers.
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Elliptic PDEs and the Finite Difference Method
Hardware-accelerated Point-based Rendering of Surfaces and Volumes Eduardo Tejada, Tobias Schafhitzel, Thomas Ertl Universität Stuttgart, Germany.
Parallel Solution of the Poisson Problem Using MPI
HEAT TRANSFER FINITE ELEMENT FORMULATION
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl Institut für Informatik Scientific Computing In Computer Science.
Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl.
Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.
Outline Introduction Research Project Findings / Results
Introduction to Scientific Computing II
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Geometry processing on GPUs Jens Krüger Technische Universität München.
Dynamic Geometry Displacement Jens Krüger Technische Universität München.
A Massively Parallel Incompressible Smoothed Particle Hydrodynamics Simulator for Oilfield Applications Paul Dickenson 1,2, William N Dawes 1 1 CFD Laboratory,
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
EECE571R -- Harnessing Massively Parallel Processors ece
Jens Krüger Technische Universität München
Graphics Processing Unit
MultiGrid.
© Fluent Inc. 1/10/2018L1 Fluids Review TRN Solution Methods.
Amir Kamil and Katherine Yelick
A Comparison of some Iterative Methods in Scientific Computing
Introduction to Multigrid Method
Deflated Conjugate Gradient Method
Innovative Multigrid Methods
GPU Implementations for Finite Element Methods
Jens Krüger Technische Universität München
Patric Perez, Michel Gangnet, and Andrew Black
CS 252 Project Presentation
Amir Kamil and Katherine Yelick
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Programming assignment #1 Solving an elliptic PDE using finite differences Numerical Methods for PDEs Spring 2007 Jim E. Jones.
Presentation transcript:

University of Virginia A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys University of Virginia Graphics Hardware 2003 July 26-27 – San Diego, CA

General-Purpose GPU Programming Why do we port algorithms to the GPU? How much faster can we expect it to be, really? What is the challenge in porting?

Case Study Problem: Implement a Boundary Value Problem (BVP) solver using the GPU Could benefit an entire class of scientific and engineering applications, e.g.: Heat transfer Fluid flow

Related Work Krüger and Westermann: Linear Algebra Operators for GPU Implementation of Numerical Algorithms Bolz et al.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Very similar to our system Developed concurrently Complementary approach

Driving problem: Fluid mechanics sim Problem domain is a warped disc: regular grid regular grid

BVPs: Background L = f L is some operator  is the problem domain Boundary value problems are sometimes governed by PDEs of the form: L = f L is some operator  is the problem domain f is a forcing function (source term) Given L and f, solve for .

BVPs: Example Heat Transfer k2T = -S Find a steady-state temperature distribution T in a solid of thermal conductivity k with thermal source S This requires solving a Poisson equation of the form: k2T = -S This is a BVP where L is the Laplacian operator 2 All our applications require a Poisson solver.

BVPs: Solving Most such problems cannot be solved analytically Instead, discretize onto a grid to form a set of linear equations, then solve: Direct elimination Gauss-Seidel iteration Conjugate-gradient Strongly implicit procedures Multigrid method

Multigrid method Iteratively corrects an approximation to the solution Operates at multiple grid resolutions Low-resolution grids are used to correct higher-resolution grids recursively Very fast, especially for large grids: O(n)

Multigrid method  = Li - f Use coarser grid levels to recursively correct an approximation to the solution Algorithm: smooth residual restrict recurse interpolate 1 -4 1/8 1/4 1/16 1/2 1 1/4  = Li - f

Implementation For each step of the algorithm: Bind as texture maps the buffers that contain the necessary data Set the target buffer for rendering Activate a fragment program that performs the necessary kernel computation Render a grid-sized quad with multitexturing source buffer texture source buffer texture render target buffer render target buffer fragment program

Optimizing the Solver Detect steady-state natively on GPU Minimize shader length Special-case whenever possible Avoid context-switching

Optimizing the Solver: Steady-state How to detect convergence? L1 norm - average error L2 norm – RMS error (common in visual sim) L norm – max error (common in sci/eng apps) Can use occlusion query! secs to steady state vs. grid size

Optimizing the Solver: Shader length Minimize number of registers used Vectorize as much as possible Use the rasterizer to perform computations of linearly-varying values Pre-compute invariants on CPU shader original fp fastpath fp fastpath vp smooth 79-6-1 20-4-1 12-2 residual 45-7-0 16-4-0 11-1 restrict 66-6-1 21-3-0 interpolate 93-6-1 25-3-0 13-2 INSERT SLIDE HERE

Optimizing the Solver: Special-case Fast-path vs. slow-path write several variants of each fragment program to handle boundary cases eliminates conditionals in the fragment program equivalent to avoiding CPU inner-loop branching fast path, no boundaries slow path with boundaries

Optimizing the Solver: Special-case Fast-path vs. slow-path write several variants of each fragment program to handle boundary cases eliminates conditionals in the fragment program equivalent to avoiding CPU inner-loop branching secs per v-cycle vs. grid size

Optimizing the Solver: Context-switching Find best packing data of multiple grid levels into the pbuffer surfaces

Optimizing the Solver: Context-switching Find best packing data of multiple grid levels into the pbuffer surfaces

Optimizing the Solver: Context-switching Find best packing data of multiple grid levels into the pbuffer surfaces

Optimizing the Solver: Context-switching Remove context switching Can introduce operations with undefined results: reading/writing same surface Why do we need to do this? Can we get away with it? What about superbuffers?

secs to steady state vs. grid size Data Layout Performance: secs to steady state vs. grid size

Data Layout Possible additional vectorization: Compute 4 values at a time Requires source, residual, solution values to be in different buffers Complicates boundary calculations Adds setup and teardown overhead Stacked domain

secs to steady state vs. grid size Results: CPU vs. GPU Performance: secs to steady state vs. grid size

Conclusions What we need going forward: Superbuffers Developer tools or: Universal support for multiple-surface pbuffers or: Cheap context switching Developer tools Debugging tools Documentation Global accumulator Ever increasing amounts of precision, memory Textures bigger than 2048 on a side

Acknowledgements Hardware David Kirk Matt Papakipos Driver Support Nick Triantos Pat Brown Stephen Ehmann Fragment Programming James Percy Matt Pharr General-purpose GPU Mark Harris Aaron Lefohn Ian Buck Funding NSF Award #0092793