Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA
Small Sampling of GI on GPUs Much more detail in the included papers Lots of other ‘global illumination on GPUs’ in the literature –The Ray Engine [Carr et al. 2002] –GPU Algorithms for Radiosity and Subsurface Scattering [Carr et al. 2003] –Radiosity on Graphics Hardware [Coombe et al. 2004] –Lots and lots of shadow papers…
Radiosity Radiosity on Graphics Hardware [Coombe et al. 2004]
Subsurface Scattering GPU Algorithms for Radiosity and Subsurface Scattering [Carr et al. 2003]
Ray Tracing
Specular Diffuse Diffuse P T T S S S Occluder Point Light R Material Material Material Camera
Implementation Options GPU as a ray-triangle intersection engine [Carr et al. 2002] –Rays and geometry streamed to GPU –Intersection calculation results read back –Acceleration structure traversal done on host CPU GPU as a ray tracing engine [Purcell et al. 2002] –Scene geometry and acceleration structure stored on GPU –GPU performs ray generation, acceleration structure traversal, intersection, and shading –Host provides camera info
Streaming Ray Tracer Generate Eye Rays Traverse Acceleration Structure Intersect Triangles Shade Hits and Generate Shading Rays Camera Grid Triangles Materials
Techniques Used Data structure navigation –Texture memory stores data structures –Dependent texture fetches walk through data Flow control –Kernel binding based on occlusion query results –Efficient selective execution of kernels using early-z occlusion culling –Difficulty in flow control disappearing with newest graphics cards PS 3.0
Texture Memory Organization xyzxyzxyzxyzxyzxyz…xyz … … xyzxyzxyzxyzxyzxyz…xyz xyzxyzxyzxyzxyzxyz…xyz Uniform Grid 3D Luminance Texture Triangle List 1D Luminance Texture Triangles 3x 1D RGB Textures vox0 vox1vox2vox3vox4vox5voxM vox0vox2 tri0 tri1tri2tri3tri4tri5triN v0 v1 v2
Efficient Selective Execution Rendering giant screen filling quad not ideal –Not all pixels need to process every rendering pass Proposed low-overhead early fragment kill –Computation mask –Controllable early-Z occlusion culling Trade computation for bandwidth
Original System Implementation ATI Radeon 9700 Pro (R300) ATI Fragment Program
Cornell Box – Ray Traced Shadows Rendered using a Radeon 9700 Pro
Teapotahedron Rendered using a Radeon 9700 Pro
Quake 3 – Ray Traced Shadows Rendered using a Radeon 9700 Pro
Quake 3 – Ray Traced Shadows Rendered using a Radeon 9700 Pro
Performance Results Radeon 9700 Pro –100M ray-triangle intersections/s –300K to 4.0M rays/s –Between 3 – x256 pixels CPU implementation –20M intersections/s P3 800 MHz [Wald et al. 2001] –800K to 7.1M ray/s 2.5 GHz P4 [Wald et al. 2003] With simple shading: 1.8M to 2.3M rays/s
Photon Mapping
Photon Mapping Algorithm Review Photon tracing –Emission, scattering, storing into k-d tree –Similar to ray tracing Rendering –Ray tracing for direct illumination –Photon map visualization Indirect bounce
Computational Challenge for GPUs #1 Constructing a irregular or sparse data structure
Computational Challenge for GPUs #2 Adaptive nearest neighbor search –Noise vs. blur
Computational Challenge for GPUs #2 Adaptive nearest neighbor search –Noise vs. blur
Scatter on the GPU Sort photons into grid cells –Grid cell is sort key Two solutions –Simulate scatter with fragment programs Bitonic merge sort followed by binary search Multiple rendering passes –Vertex program with stencil buffer Fixed number of photons per grid cell Single rendering pass
Adaptive Nearest Neighbor Search Iterative algorithm Accept or reject photons in cell visit order –No priority queue! –kNN-grid
Original System Implementation NVIDIA GeForce FX 5900 Ultra (NV35) Cg compiler 1.1 Trace Photons Build Photon Map Ray Trace Scene Compute Radiance Estimate Compute LightingRender Image
Glass Ball – Bitonic Sort 512x384, 5K photons
Glass Ball – Stencil Routing 512x384, 5K photons
Ring – Bitonic Sort 512x384, 16K photons
Ring – Stencil Routing 512x384, 16K photons
Cornell Box – Bitonic Sort 512x512, 65K photons
Cornell Box – Stencil Routing 512x512, 65K photons
Cornell Box – Increased Search Radius
Summary GPU can perform global illumination calculations –Lots of options for splitting computation between CPU and GPU Global illumination calculations require many techniques useful to GPGPU computations –Data structure navigation –Sort, search –Data dependent looping and branching