Download presentation
Presentation is loading. Please wait.
Published byFrederica French Modified over 9 years ago
1
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware Nolan Goodnight Cliff Woolley Gregory Lewin David Luebke Greg Humphreys University of Virginia Graphics Hardware 2003 July 26-27 – San Diego, CA
2
General-Purpose GPU Programming n Why do we port algorithms to the GPU? n How much faster can we expect it to be, really? n What is the challenge in porting?
3
Case Study Problem: Implement a Boundary Value Problem (BVP) solver using the GPU Could benefit an entire class of scientific and engineering applications, e.g.: n Heat transfer n Fluid flow
4
Related Work n Krüger and Westermann: Linear Algebra Operators for GPU Implementation of Numerical Algorithms n Bolz et al.: Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid n Very similar to our system n Developed concurrently n Complementary approach
5
Driving problem: Fluid mechanics sim Problem domain is a warped disc: regular grid
6
BVPs: Background n Boundary value problems are sometimes governed by PDEs of the form: L = f L is some operator is the problem domain f is a forcing function (source term) Given L and f, solve for .
7
BVPs: Example Heat Transfer n Find a steady-state temperature distribution T in a solid of thermal conductivity k with thermal source S n This requires solving a Poisson equation of the form: k 2 T = -S This is a BVP where L is the Laplacian operator 2 All our applications require a Poisson solver.
8
BVPs: Solving n Most such problems cannot be solved analytically n Instead, discretize onto a grid to form a set of linear equations, then solve: n Direct elimination n Gauss-Seidel iteration n Conjugate-gradient n Strongly implicit procedures n Multigrid method
9
Multigrid method n Iteratively corrects an approximation to the solution n Operates at multiple grid resolutions n Low-resolution grids are used to correct higher- resolution grids recursively n Very fast, especially for large grids: O(n)
10
Multigrid method n Use coarser grid levels to recursively correct an approximation to the solution n Algorithm: n smooth n residual n restrict n recurse n interpolate 1 1 11-4 1/8 1/4 1/16 1/2 1 1/4 = L i - f
11
Implementation For each step of the algorithm: n Bind as texture maps the buffers that contain the necessary data n Set the target buffer for rendering n Activate a fragment program that performs the necessary kernel computation n Render a grid-sized quad with multitexturing fragment program render target buffer source buffer texture
12
Optimizing the Solver n Detect steady-state natively on GPU n Minimize shader length n Special-case whenever possible n Avoid context-switching
13
Optimizing the Solver: Steady-state n How to detect convergence? n L 1 norm - average error n L 2 norm – RMS error (common in visual sim) n L norm – max error (common in sci/eng apps) n Can use occlusion query! secs to steady state vs. grid size
14
Optimizing the Solver: Shader length n Minimize number of registers used n Vectorize as much as possible n Use the rasterizer to perform computations of linearly-varying values n Pre-compute invariants on CPU shaderoriginal fpfastpath fpfastpath vp smooth 79-6-120-4-112-2 residual 45-7-016-4-011-1 restrict 66-6-121-3-011-1 interpolate 93-6-125-3-013-2
15
Optimizing the Solver: Special-case n Fast-path vs. slow-path n write several variants of each fragment program to handle boundary cases n eliminates conditionals in the fragment program n equivalent to avoiding CPU inner-loop branching slow path with boundaries fast path, no boundaries
16
Optimizing the Solver: Special-case n Fast-path vs. slow-path n write several variants of each fragment program to handle boundary cases n eliminates conditionals in the fragment program n equivalent to avoiding CPU inner-loop branching secs per v-cycle vs. grid size
17
Optimizing the Solver: Context-switching n Find best packing data of multiple grid levels into the pbuffer surfaces
18
Optimizing the Solver: Context-switching n Find best packing data of multiple grid levels into the pbuffer surfaces
19
Optimizing the Solver: Context-switching n Find best packing data of multiple grid levels into the pbuffer surfaces
20
Optimizing the Solver: Context-switching n Remove context switching n Can introduce operations with undefined results: reading/writing same surface n Why do we need to do this? n Can we get away with it? n What about superbuffers?
21
Data Layout n Performance: secs to steady state vs. grid size
22
Data Layout n Compute 4 values at a time n Requires source, residual, solution values to be in different buffers n Complicates boundary calculations n Adds setup and teardown overhead Stacked domain n Possible additional vectorization:
23
Results: CPU vs. GPU n Performance: secs to steady state vs. grid size
24
Conclusions What we need going forward: n Superbuffers n or: Universal support for multiple-surface pbuffers n or: Cheap context switching n Developer tools n Debugging tools n Documentation n Global accumulator n Ever increasing amounts of precision, memory n Textures bigger than 2048 on a side
25
Acknowledgements n Hardware n David Kirk n Matt Papakipos n Driver Support n Nick Triantos n Pat Brown n Stephen Ehmann n Fragment Programming n James Percy n Matt Pharr n General-purpose GPU n Mark Harris n Aaron Lefohn n Ian Buck n Funding n NSF Award #0092793
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.