Download presentation
Presentation is loading. Please wait.
Published byJeremiah Stain Modified over 10 years ago
1
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University of California, San Diego
2
Motivation
3
Motivation Interactive global illumination on the GPU Interactive global illumination on the GPU Nearly have sufficient compute power and flexibility Nearly have sufficient compute power and flexibility Explore GPU-based computation algorithms Explore GPU-based computation algorithms
4
Related Work CPU-based interactive global illumination CPU-based interactive global illumination Supercomputers [Parker et al.] Supercomputers [Parker et al.] Clusters [Tole et al., Wald et al.] Clusters [Tole et al., Wald et al.] Global illumination on programmable GPUs Global illumination on programmable GPUs Ray tracing [Carr et al., Purcell et al.] Ray tracing [Carr et al., Purcell et al.] Photon mapping [Ma et al.] Photon mapping [Ma et al.] Radiosity [Carr et al., Coombe et al.] Radiosity [Carr et al., Coombe et al.] Translucency [Carr et al., Stamminger et al.] Translucency [Carr et al., Stamminger et al.]
5
Photon Mapping Algorithm Review Photon tracing Photon tracing Emission, scattering, storing into kd-tree Emission, scattering, storing into kd-tree Similar to ray tracing Similar to ray tracing Rendering Rendering Ray tracing for direct illumination Ray tracing for direct illumination Photon map visualization Photon map visualization Indirect bounce Indirect bounce
6
Computational Challenge for GPUs #1 Constructing a irregular or sparse data structure Constructing a irregular or sparse data structure
7
Computational Challenge for GPUs #2 Adaptive nearest neighbor search Adaptive nearest neighbor search Noise vs. blur Noise vs. blur
8
Computational Challenge for GPUs #2 Adaptive nearest neighbor search Adaptive nearest neighbor search Noise vs. blur Noise vs. blur
9
Photon Mapping on the CPU Balanced kd-tree Balanced kd-tree Compact storage of photons Compact storage of photons Efficient Efficient O(log n) search O(log n) search Priority queue Priority queue Nearest neighbor search Nearest neighbor search Incremental insertion and removal of photons Incremental insertion and removal of photons
10
Algorithmic Changes for the GPU Direct visualization of photon map Direct visualization of photon map Keeps rendering costs low Keeps rendering costs low Use grid instead of kd-tree Use grid instead of kd-tree Tried kd-tree… Tried kd-tree… Kd-tree construction is difficult Kd-tree construction is difficult Radiance estimate Radiance estimate –Fixed radius search works fine –Adaptive search needs priority queue No priority queue No priority queue Can’t build on GPU Can’t build on GPU Too much state Too much state
11
Contributions Mapped complete grid-based photon mapping algorithm onto the GPU Mapped complete grid-based photon mapping algorithm onto the GPU Including photon tracing, ray tracing, etc. Including photon tracing, ray tracing, etc. Implemented an adaptive k-nearest neighbor search Implemented an adaptive k-nearest neighbor search kNN-grid kNN-grid Show how to construct a sparse data structure on the GPU Show how to construct a sparse data structure on the GPU Bitonic merge sort with binary search Bitonic merge sort with binary search Stencil routing Stencil routing
12
Configuring the GPU for Computing GPU as data parallel compute engine GPU as data parallel compute engine Fragment programs execute compute kernels Fragment programs execute compute kernels Screen sized quad initializes computation Screen sized quad initializes computation SIMD execution SIMD execution Floating point texture memory Floating point texture memory Render-to-texture for intermediate results Render-to-texture for intermediate results Data structure storage Data structure storage Pointer dereferencing via dependent fetches Pointer dereferencing via dependent fetches
13
Computational Challenge #1 Building a Sparse Data Structure
14
Requires scatter Requires scatter Dependent texture write Dependent texture write Why don’t we have fragment scatter? Why don’t we have fragment scatter? Fragment processing has highly coherent blocked memory writes Fragment processing has highly coherent blocked memory writes Extra hardware support would be needed Extra hardware support would be needed Write hazards Write hazards Memory latencies Memory latencies
15
Scatter on the GPU Sort photons into grid cells Sort photons into grid cells Grid cell is sort key Grid cell is sort key Simulate scatter with fragment programs Simulate scatter with fragment programs Bitonic merge sort followed by binary search Bitonic merge sort followed by binary search Compact grid Compact grid O(log 2 n) rendering passes O(log 2 n) rendering passes
16
Bitonic Merge Sort 1 3 2 4 7 6 8 5 2 3 1 4 7 5 8 6 3 2 4 1 7 5 8 6 3 7 4 8 2 5 1 6 3 8 4 7 2 6 1 5 1 2 3 4 5 6 7 8 3 8 7 4 5 6 1 2 O(log 2 n) rendering passes
17
Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps
18
Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v2 Searching for first v5 photon initialize
19
Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v2 v5 Searching for first v5 photon initialize step 1
20
v5 Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v0v0v2v2v5v0 v2 v5 v2 Searching for first v5 photon initialize step 1 step 2
21
v5 Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v0v0v2v2v5v0 v0v0v2v2v2v0v5 v2 v5 v2 v5 Searching for first v5 photon initialize step 1 step 2 step 3
22
v5 Binary Search Grid cell searches for self in photon list Grid cell searches for self in photon list If none, find first element in next cell If none, find first element in next cell Empty grid cells waste compute Empty grid cells waste compute Log(n) + 1 steps Log(n) + 1 steps v0v0v2v2v5v0v5 Sorted Photon List v0v0v2v2v2v0v5 v0v0v2v2v5v0 v0v0v2v2v2v0v5 v0v0v2v2v2v0v5 v2 v5 v2 v5 v5 Searching for first v5 photon initialize step 1 step 2 step 3 step 4
23
Scatter on the GPU Vertex programs can scatter Vertex programs can scatter Draw point to buffer Draw point to buffer Collisions? Collisions?
24
Scatter on the GPU Vertex programs can scatter Vertex programs can scatter Draw point to buffer Draw point to buffer Collisions? Collisions? Stencil routing Stencil routing Limit photon count per grid cell Limit photon count per grid cell –Pre-allocate grid cell space Draw photons as points Draw photons as points –Vertex program computes grid cell Stencil buffer controls location within cell Stencil buffer controls location within cell Single rendering pass Single rendering pass
25
Stencil Routing Fix each grid cell size to n 2 pixels Fix each grid cell size to n 2 pixels Draw fat points to cover each fat cell Draw fat points to cover each fat cell glPointSize(n) glPointSize(n) Vertex ( photon_pos ) Vertex Program Flattened Grid 4 pixels
26
Stencil Routing Control location written to with stencil Control location written to with stencil Pass when stencil is n 2 -1 Pass when stencil is n 2 -1 Stencil always increments Stencil always increments Location written depends on draw order Location written depends on draw order Vertex ( photon_pos ) Vertex Program Flattened Grid 1 pixel Stencil 4 pixels Stencil Values 01 23 12 34 01 23 01 23
27
Computational Challenge #2 Adaptive Nearest Neighbor Search
28
Iterative algorithm Iterative algorithm Accept or reject photons in cell visit order Accept or reject photons in cell visit order
29
kNN-grid Algorithm sample point photons in estimate candidate photon Want a 4 photon estimate
30
kNN-grid Algorithm Candidate photons must be within max search radius Candidate photons must be within max search radius Visit voxels in order of distance to sample point Visit voxels in order of distance to sample point sample point photons in estimate candidate photon Want a 4 photon estimate
31
kNN-grid Algorithm If current number of photons in estimate is less than number requested, grow search radius If current number of photons in estimate is less than number requested, grow search radius 1 sample point photons in estimate candidate photon Want a 4 photon estimate
32
kNN-grid Algorithm If current number of photons in estimate is less than number requested, grow search radius If current number of photons in estimate is less than number requested, grow search radius 2 sample point photons in estimate candidate photon Want a 4 photon estimate
33
kNN-grid Algorithm Don’t add photons outside maximum search radius Don’t add photons outside maximum search radius Don’t grow search radius when photon is outside maximum radius Don’t grow search radius when photon is outside maximum radius 2 sample point photons in estimate candidate photon Want a 4 photon estimate
34
kNN-grid Algorithm Add photons within search radius Add photons within search radius 3 sample point photons in estimate candidate photon Want a 4 photon estimate
35
kNN-grid Algorithm Add photons within search radius Add photons within search radius 4 sample point photons in estimate candidate photon Want a 4 photon estimate
36
kNN-grid Algorithm Don’t expand search radius if enough photons already found Don’t expand search radius if enough photons already found 4 sample point photons in estimate candidate photon Want a 4 photon estimate
37
kNN-grid Algorithm Add photons within search radius Add photons within search radius 5 sample point photons in estimate candidate photon Want a 4 photon estimate
38
kNN-grid Algorithm Visit all other voxels accessible within determined search radius Visit all other voxels accessible within determined search radius Add photons within search radius Add photons within search radius 6 sample point photons in estimate candidate photon Want a 4 photon estimate
39
kNN-grid Algorithm Finds all photons within a sphere centered about sample point Finds all photons within a sphere centered about sample point May locate more than requested k-nearest neighbors May locate more than requested k-nearest neighbors 6 sample point photons in estimate candidate photon Want a 4 photon estimate
40
System Implementation NVIDIA GeForce FX 5900 Ultra (NV35) NVIDIA GeForce FX 5900 Ultra (NV35) Cg compiler 1.1 Cg compiler 1.1 Trace Photons Build Photon Map Ray Trace Scene Compute Radiance Estimate Compute LightingRender Image
41
Demos
42
Glass Ball – Bitonic Sort 18s @ 512x384, 5K photons
43
Glass Ball – Stencil Routing 11s @ 512x384, 5K photons
44
Ring – Bitonic Sort 9s @ 512x384, 16K photons
45
Ring – Stencil Routing 8s @ 512x384, 16K photons
46
Cornell Box – Bitonic Sort 64s @ 512x512, 65K photons
47
Cornell Box – Stencil Routing 47s @ 512x512, 65K photons
48
Cornell Box – Increased Search Radius
49
Open Issues (1) How to prevent program execution over a subset of pixels? How to prevent program execution over a subset of pixels? Non-uniform pixel computation distribution Non-uniform pixel computation distribution Radiance estimate Radiance estimate KILL is only a write mask KILL is only a write mask Early-z occlusion culling Early-z occlusion culling No pixel level control No pixel level control Compute mask, branching, or stream buffer? Compute mask, branching, or stream buffer? Improve radiance estimate speed by 30-70% over tiling Improve radiance estimate speed by 30-70% over tiling
50
Open Issues (2) Scatter Scatter Makes (a programmer’s) life easier Makes (a programmer’s) life easier Is it worth implementing? Is it worth implementing? Gain factor of log 2 n avoiding sort Gain factor of log 2 n avoiding sort
51
Future Work Kd-trees Kd-trees Photon power redistribution Photon power redistribution Adaptive sampling Adaptive sampling Progressive refinement Progressive refinement
52
Conclusions The GPU can compute an entire global illumination solution The GPU can compute an entire global illumination solution Nearly interactive Nearly interactive Implemented an adaptive k-nearest neighbor query for the GPU Implemented an adaptive k-nearest neighbor query for the GPU kNN-grid kNN-grid Shown how to construct sparse data structures on the GPU Shown how to construct sparse data structures on the GPU Bitonic merge sort and binary search Bitonic merge sort and binary search Stencil routing Stencil routing Sorting and searching algorithms applicable to other computations Sorting and searching algorithms applicable to other computations
53
Acknowledgments Stanford FlashG Stanford FlashG Ian Buck, Mike Houston, Kekoa Proudfoot Ian Buck, Mike Houston, Kekoa Proudfoot Stencil routing Stencil routing Kurt Akeley, Matt Papakipos Kurt Akeley, Matt Papakipos Hardware and drivers Hardware and drivers David Kirk, Nick Triantos David Kirk, Nick Triantos Funding Funding NVIDIA, DARPA, NSF, 3Com NVIDIA, DARPA, NSF, 3Com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.