A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.

Slides:



Advertisements
Similar presentations
Steady-state heat conduction on triangulated planar domain May, 2002
Advertisements

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University.
A Discrete Adjoint-Based Approach for Optimization Problems on 3D Unstructured Meshes Dimitri J. Mavriplis Department of Mechanical Engineering University.
Computer graphics & visualization Global Illumination Effects.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
CS 290H 7 November Introduction to multigrid methods
An Efficient Multigrid Solver for (Evolving) Poisson Systems on Meshes Misha Kazhdan Johns Hopkins University.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Feb 26, 2013, DyanmicParallelism.ppt CUDA Dynamic Parallelism These notes will outline CUDA.
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al., UCB CS267.
Solving Linear Systems (Numerical Recipes, Chap 2)
Image Reconstruction Group 6 Zoran Golic. Overview Problem Multigrid-Algorithm Results Aspects worth mentioning.
Algebraic MultiGrid. Algebraic MultiGrid – AMG (Brandt 1982)  General structure  Choose a subset of variables: the C-points such that every variable.
Interactive Deformation and Visualization of Level-Set Surfaces Using Graphics Hardware Aaron Lefohn Joe Kniss Charles Hansen Ross Whitaker Aaron Lefohn.
Influence of (pointwise) Gauss-Seidel relaxation on the error Poisson equation, uniform grid Error of initial guess Error after 5 relaxation Error after.
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Jeffrey Bolz, Ian Farmer, Eitan Grinspun, Peter Schröder Caltech ASCI Center.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.
An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
Mapping Computational Concepts to GPU’s Jesper Mosegaard Based primarily on SIGGRAPH 2004 GPGPU COURSE and Visualization 2004 Course.
Exercise where Discretize the problem as usual on square grid of points (including boundaries). Define g and f such that the solution to the differential.
© Fluent Inc. 9/5/2015L1 Fluids Review TRN Solution Methods.
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys.
Enhancing GPU for Scientific Computing Some thoughts.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Efficient Data Parallel Computing on GPUs Cliff Woolley University of Virginia / NVIDIA.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Cg Programming Mapping Computational Concepts to GPUs.
Scientific Computing Partial Differential Equations Poisson Equation.
Introduction to Scientific Computing II From Gaussian Elimination to Multigrid – A Recapitulation Dr. Miriam Mehl.
GPU Program Optimization Cliff Woolley University of Virginia / NVIDIA.
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Elliptic PDEs and the Finite Difference Method
1 Complex Images k’k’ k”k” k0k0 -k0-k0 branch cut   k 0 pole C1C1 C0C0 from the Sommerfeld identity, the complex exponentials must be a function.
Hardware-accelerated Point-based Rendering of Surfaces and Volumes Eduardo Tejada, Tobias Schafhitzel, Thomas Ertl Universität Stuttgart, Germany.
GPU Data Formatting and Addressing
Parallel Solution of the Poisson Problem Using MPI
HEAT TRANSFER FINITE ELEMENT FORMULATION
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Introduction to Scientific Computing II Overview Michael Bader.
Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl Institut für Informatik Scientific Computing In Computer Science.
Introduction to Scientific Computing II Multigrid Dr. Miriam Mehl.
Lecture 21 MA471 Fall 03. Recall Jacobi Smoothing We recall that the relaxed Jacobi scheme: Smooths out the highest frequency modes fastest.
Outline Introduction Research Project Findings / Results
Introduction to Scientific Computing II
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
MULTISCALE COMPUTATIONAL METHODS Achi Brandt The Weizmann Institute of Science UCLA
CSE 690: GPGPU Lecture 8: Image Processing PDE Solvers Klaus Mueller Computer Science, Stony Brook University.
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Geometry processing on GPUs Jens Krüger Technische Universität München.
Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware Tim Foley Mike Houston Pat Hanrahan Computer Graphics Lab Stanford University.
Dynamic Geometry Displacement Jens Krüger Technische Universität München.
Multigrid Methods The Implementation Wei E Universität München. Ferien Akademie 19 th Sep
The Application of the Multigrid Method in a Nonhydrostatic Atmospheric Model Shu-hua Chen MMM/NCAR.
University of Colorado
Optimizing 3D Multigrid to Be Comparable to the FFT Michael Maire and Kaushik Datta Note: Several diagrams were taken from Kathy Yelick’s CS267 lectures.
A Massively Parallel Incompressible Smoothed Particle Hydrodynamics Simulator for Oilfield Applications Paul Dickenson 1,2, William N Dawes 1 1 CFD Laboratory,
MultiGrid.
© Fluent Inc. 1/10/2018L1 Fluids Review TRN Solution Methods.
Jens Krüger Technische Universität München
CS 252 Project Presentation
University of Virginia
Presentation transcript:

A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys University of Virginia Graphics Hardware 2003 July – San Diego, CA

General-Purpose GPU Programming n Why do we port algorithms to the GPU? n How much faster can we expect it to be, really? n What is the challenge in porting?

Case Study Problem: Implement a Boundary Value Problem (BVP) solver using the GPU Could benefit an entire class of scientific and engineering applications, e.g.: n Heat transfer n Fluid flow

Related Work n Krüger and Westermann: Linear Algebra Operators for GPU Implementation of Numerical Algorithms n Bolz et al.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid n Very similar to our system n Developed concurrently n Complementary approach

Driving problem: Fluid mechanics sim Problem domain is a warped disc: regular grid

BVPs: Background n Boundary value problems are sometimes governed by PDEs of the form: L  = f L is some operator  is the problem domain f is a forcing function (source term) Given L and f, solve for .

BVPs: Example Heat Transfer n Find a steady-state temperature distribution T in a solid of thermal conductivity k with thermal source S n This requires solving a Poisson equation of the form: k  2 T = -S This is a BVP where L is the Laplacian operator  2 All our applications require a Poisson solver.

BVPs: Solving n Most such problems cannot be solved analytically n Instead, discretize onto a grid to form a set of linear equations, then solve: n Direct elimination n Gauss-Seidel iteration n Conjugate-gradient n Strongly implicit procedures n Multigrid method

Multigrid method n Iteratively corrects an approximation to the solution n Operates at multiple grid resolutions n Low-resolution grids are used to correct higher- resolution grids recursively n Very fast, especially for large grids: O(n)

Multigrid method n Use coarser grid levels to recursively correct an approximation to the solution n Algorithm: n smooth n residual n restrict n recurse n interpolate /8 1/4 1/16 1/2 1 1/4  = L  i - f

Implementation For each step of the algorithm: n Bind as texture maps the buffers that contain the necessary data n Set the target buffer for rendering n Activate a fragment program that performs the necessary kernel computation n Render a grid-sized quad with multitexturing fragment program render target buffer source buffer texture

Optimizing the Solver n Detect steady-state natively on GPU n Minimize shader length n Special-case whenever possible n Avoid context-switching

Optimizing the Solver: Steady-state n How to detect convergence? n L 1 norm - average error n L 2 norm – RMS error (common in visual sim) n L  norm – max error (common in sci/eng apps) n Can use occlusion query! secs to steady state vs. grid size

Optimizing the Solver: Shader length n Minimize number of registers used n Vectorize as much as possible n Use the rasterizer to perform computations of linearly-varying values n Pre-compute invariants on CPU shaderoriginal fpfastpath fpfastpath vp smooth residual restrict interpolate

Optimizing the Solver: Special-case n Fast-path vs. slow-path n write several variants of each fragment program to handle boundary cases n eliminates conditionals in the fragment program n equivalent to avoiding CPU inner-loop branching slow path with boundaries fast path, no boundaries

Optimizing the Solver: Special-case n Fast-path vs. slow-path n write several variants of each fragment program to handle boundary cases n eliminates conditionals in the fragment program n equivalent to avoiding CPU inner-loop branching secs per v-cycle vs. grid size

Optimizing the Solver: Context-switching n Find best packing data of multiple grid levels into the pbuffer surfaces

Optimizing the Solver: Context-switching n Find best packing data of multiple grid levels into the pbuffer surfaces

Optimizing the Solver: Context-switching n Find best packing data of multiple grid levels into the pbuffer surfaces

Optimizing the Solver: Context-switching n Remove context switching n Can introduce operations with undefined results: reading/writing same surface n Why do we need to do this? n Can we get away with it? n What about superbuffers?

Data Layout n Performance: secs to steady state vs. grid size

Data Layout n Compute 4 values at a time n Requires source, residual, solution values to be in different buffers n Complicates boundary calculations n Adds setup and teardown overhead Stacked domain n Possible additional vectorization:

Results: CPU vs. GPU n Performance: secs to steady state vs. grid size

Conclusions What we need going forward: n Superbuffers n or: Universal support for multiple-surface pbuffers n or: Cheap context switching n Developer tools n Debugging tools n Documentation n Global accumulator n Ever increasing amounts of precision, memory n Textures bigger than 2048 on a side

Acknowledgements n Hardware n David Kirk n Matt Papakipos n Driver Support n Nick Triantos n Pat Brown n Stephen Ehmann n Fragment Programming n James Percy n Matt Pharr n General-purpose GPU n Mark Harris n Aaron Lefohn n Ian Buck n Funding n NSF Award #