Download presentation
Presentation is loading. Please wait.
1
University of Virginia
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys University of Virginia Graphics Hardware 2003 July – San Diego, CA
2
General-Purpose GPU Programming
Why do we port algorithms to the GPU? How much faster can we expect it to be, really? What is the challenge in porting?
3
Case Study Problem: Implement a Boundary Value Problem (BVP) solver using the GPU Could benefit an entire class of scientific and engineering applications, e.g.: Heat transfer Fluid flow
4
Related Work Krüger and Westermann: Linear Algebra Operators for GPU Implementation of Numerical Algorithms Bolz et al.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid Very similar to our system Developed concurrently Complementary approach
5
Driving problem: Fluid mechanics sim
Problem domain is a warped disc: regular grid regular grid
6
BVPs: Background L = f L is some operator is the problem domain
Boundary value problems are sometimes governed by PDEs of the form: L = f L is some operator is the problem domain f is a forcing function (source term) Given L and f, solve for .
7
BVPs: Example Heat Transfer k2T = -S
Find a steady-state temperature distribution T in a solid of thermal conductivity k with thermal source S This requires solving a Poisson equation of the form: k2T = -S This is a BVP where L is the Laplacian operator 2 All our applications require a Poisson solver.
8
BVPs: Solving Most such problems cannot be solved analytically
Instead, discretize onto a grid to form a set of linear equations, then solve: Direct elimination Gauss-Seidel iteration Conjugate-gradient Strongly implicit procedures Multigrid method
9
Multigrid method Iteratively corrects an approximation to the solution
Operates at multiple grid resolutions Low-resolution grids are used to correct higher-resolution grids recursively Very fast, especially for large grids: O(n)
10
Multigrid method = Li - f
Use coarser grid levels to recursively correct an approximation to the solution Algorithm: smooth residual restrict recurse interpolate 1 -4 1/8 1/4 1/16 1/2 1 1/4 = Li - f
11
Implementation For each step of the algorithm:
Bind as texture maps the buffers that contain the necessary data Set the target buffer for rendering Activate a fragment program that performs the necessary kernel computation Render a grid-sized quad with multitexturing source buffer texture source buffer texture render target buffer render target buffer fragment program
12
Optimizing the Solver Detect steady-state natively on GPU
Minimize shader length Special-case whenever possible Avoid context-switching
13
Optimizing the Solver: Steady-state
How to detect convergence? L1 norm - average error L2 norm – RMS error (common in visual sim) L norm – max error (common in sci/eng apps) Can use occlusion query! secs to steady state vs. grid size
14
Optimizing the Solver: Shader length
Minimize number of registers used Vectorize as much as possible Use the rasterizer to perform computations of linearly-varying values Pre-compute invariants on CPU shader original fp fastpath fp fastpath vp smooth 79-6-1 20-4-1 12-2 residual 45-7-0 16-4-0 11-1 restrict 66-6-1 21-3-0 interpolate 93-6-1 25-3-0 13-2 INSERT SLIDE HERE
15
Optimizing the Solver: Special-case
Fast-path vs. slow-path write several variants of each fragment program to handle boundary cases eliminates conditionals in the fragment program equivalent to avoiding CPU inner-loop branching fast path, no boundaries slow path with boundaries
16
Optimizing the Solver: Special-case
Fast-path vs. slow-path write several variants of each fragment program to handle boundary cases eliminates conditionals in the fragment program equivalent to avoiding CPU inner-loop branching secs per v-cycle vs. grid size
17
Optimizing the Solver: Context-switching
Find best packing data of multiple grid levels into the pbuffer surfaces
18
Optimizing the Solver: Context-switching
Find best packing data of multiple grid levels into the pbuffer surfaces
19
Optimizing the Solver: Context-switching
Find best packing data of multiple grid levels into the pbuffer surfaces
20
Optimizing the Solver: Context-switching
Remove context switching Can introduce operations with undefined results: reading/writing same surface Why do we need to do this? Can we get away with it? What about superbuffers?
21
secs to steady state vs. grid size
Data Layout Performance: secs to steady state vs. grid size
22
Data Layout Possible additional vectorization:
Compute 4 values at a time Requires source, residual, solution values to be in different buffers Complicates boundary calculations Adds setup and teardown overhead Stacked domain
23
secs to steady state vs. grid size
Results: CPU vs. GPU Performance: secs to steady state vs. grid size
24
Conclusions What we need going forward: Superbuffers Developer tools
or: Universal support for multiple-surface pbuffers or: Cheap context switching Developer tools Debugging tools Documentation Global accumulator Ever increasing amounts of precision, memory Textures bigger than 2048 on a side
25
Acknowledgements Hardware David Kirk Matt Papakipos Driver Support
Nick Triantos Pat Brown Stephen Ehmann Fragment Programming James Percy Matt Pharr General-purpose GPU Mark Harris Aaron Lefohn Ian Buck Funding NSF Award #
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.