Ray Tracing on Programmable Graphics Hardware SIGGRAPH 2002 Timothy J. Purcell Ian Buck William R. Mark Pat Hanrahan Sungbae Kim Master Student in CS
Contributions Abstract streaming graphics processor Implement ray tracing using GPU Help guide the design of future architectures
Why ray tracing? Ray tracing for the movie “Cars”
Ray tracing pros and cons Good quality Simple algorithm Expensive cost How to accelerate ray tracing
Basic Ray tracing algorithm Shoot eye rays Find the first intersection Calculates the illumination Reflection? Calls itself Transmission? Calls itself Combines the resulting illumination
Note that each pixel is independent Basic Ray tracing ? = RayTrace( ) = Shade( ) + refl1*RayTrace( ) RayTrace( ) = Shade( ) + refl2*RayTrace( ) + refr2*RayTrace( ) … Note that each pixel is independent
Find intersection
Uniform Grid
How to accelerate Many algorithms Use parallel systems Massively parallel shared memory supercomputer : *-ray Cluster of commodity PCs : RTRT Special hardware for ray tracing Streaming Programming Model : this paper’s approach
Streaming Programming Model A stream is a set of data records Kernels operate on each element of an input stream independently Kernels can read from global memory Streams connect kernels
Streaming Ray Tracer Eye Ray Generator Traverser Intersector Shader Screen Pixels Eye Ray Generator Camera Rays Traverser Grid (Ray, Voxel) It is a result of our desire to eventually run the ray tracer on a GPU which lack branching within a fragment program Intersector Triangles Hits Materials Shader Pixel color updates
Programmable Graphics Processor Abstraction We can use GPU as Streaming Processor But limited streaming processor
GPU Ray Tracer Limited How to map Global Memory Kernel to Kernel connection No loop, No branch It is a result of our desire to eventually run the ray tracer on a GPU which lack branching within a fragment program
Texture Memory Mapping We can use texture as global memory Store non-RGB values in textures For example, The Second triangle of the second grind Also Rays, Materials
Kernel to Kernel Copy output fragments into Texture
No loop No Branch Multipass – operate again until every is done bodyfp(); } while ( donefp() ); State : z-value, stencil bit For performance : early z-culling, stencil mask
Simulation Result SIMD vs. MIMD Horizontal line represents the performance of a graphics card when the author researched. As you see, MIMD structure is better. It means if graphics cards support MIMD structure, ray tracing will be very fast. In right charts show that in MIMD structure compute bandwidth is more important.
GPU vs. CPU Comparable to CPU Radeon 9700 Pro CPU Ray-triangle intersections/s 100M 20M in P3 800 MHz Ray/s 300K to 4.0M 800K to 7.1M 1.8M to 2.3M (With simple shading) in P4 2.5 GHz Comparable to CPU
GPU vs. CPU Graphics processor evolves faster
Demo Shadow rays 10.5 fps Eye rays, Shadow rays, Reflection rays
Appendix: General Purpose GPU(GPGPU) GPU, Shader, GLSL, HLSL, Cg, OpenGL, Direct3D…. BrookGPU (Stanford) CUDA (NVIDIA) There is a research area to use GPU in non-graphics area.
Appendix: Cell Processor IBM, SONY, TOSHIBA Interactive ray tracer : iRT More faster than GPU