Some Things Jeremy Sugerman 22 February 2005
Jeremy Sugerman, FLASHG 22 February 2005 Topics Quick GPU Topics Conditional Execution GPU Ray Tracing
Jeremy Sugerman, FLASHG 22 February 2005 PCI-Express PCI-Express solves data transfer problems… DownUp NV AGP NV PCI ATI AGP ATI PCI DL AGP
Jeremy Sugerman, FLASHG 22 February DLabs Realizm 100 AGP Mediocre Fill Rate (About half a 9800XT) Reasonable Texture Bandwidth Variable Cost Instructions 6 GFLOPS ADD – 0.5 GFLOPS LG2 Remarkable Readback But, No GL_TEXTURE_RECTANGLE_EXT
Jeremy Sugerman, FLASHG 22 February 2005 Conditional Execution Depth and Stencil are classic tools Only effective early All shaders support predication and KIL No savings in execution time KIL does gruesome things to the pipeline Pixel Shader 3.0 has true branching If-Then-Else, Data dependent loops NV4x currently, no ATI until R500
Jeremy Sugerman, FLASHG 22 February 2005 Compute Mask – Z Buffer Clear Z to 1.0 Draw Depth-Only at Z = 0.3 KIL where computation will happen Draw Color at Z = 0.7 Very Effective When it Works Fragile, Easily Disabled Stays Disabled Until glClear!
Jeremy Sugerman, FLASHG 22 February 2005 Compute Mask - EarlyZ NV41X800 Random 2x2 Blocks 3x3 Blocks 4x4 Blocks Wavefront
Jeremy Sugerman, FLASHG 22 February 2005 Compute Mask – PS3.0 Rasterize Normally a shader like: If (pixel is live) { … MOV result.color, } else { MOV result.color, // Or KIL } Easy to Write Must shade all fragments Must write a value or KIL for all fragments
Jeremy Sugerman, FLASHG 22 February 2005 Compute Mask – PS 3.0 Random 64x64 Blocks 32x32 Blocks 16x16 Blocks Wavefront
Jeremy Sugerman, FLASHG 22 February 2005 Pixel Shader 3.0 Not (yet?) a replacement for Early-Z What about loops? What about state machines? If (fragment is in state a) { // Computation 1 } else { // Computation 2 } Will execution time be MAX(a, b) or a + b?
Jeremy Sugerman, FLASHG 22 February 2005 GPU Ray Tracing Tim Purcell left us a Brook raycaster Tim (Foley) et al. beat on it for DARPA Line- of-Sight Early-Z, 2D Addressing Tim and I have forked it again Explore new hardware features Explore new algorithm options Mature, maintainable source base
Jeremy Sugerman, FLASHG 22 February 2005 Demo Break for demo…
Jeremy Sugerman, FLASHG 22 February 2005 GPU Ray Tracing – Brute Force Initialize Scene Parameters, Geometry (CPU) Generate Eye Rays Foreach( triangle in the scene ) Intersect with all rays Record if it hits closer than any prior triangle Shade Hits Ray-Triangle kernel is 39 instructions Over 100 million intersections per second
Jeremy Sugerman, FLASHG 22 February 2005 GPU Ray Tracing – Uniform Grid Initialize Scene Parameters, Geometry (CPU) Generate Eye Rays While (Any Rays Are Live) Traverse the traversing rays Intersect the intersecting rays Shade Hits Equivalent to ~14 million ray-triangles per second on our scenes.
Jeremy Sugerman, FLASHG 22 February 2005 “Any Live Rays?” Fundamentally a reduction Sum across all rays Readback to CPU Many passes to do a GPU reduction Could try occlusion query Kernel that just KIL’s on dead rays Still an extra pass GPU global counter registers would be cool Equivalent to 24 million ray-triangles per second when skipped.
Jeremy Sugerman, FLASHG 22 February 2005 Ping Ponging Buffers No read-modify-write causes copies: intersectTriangle(in ray, in oldHit, in tri, out hit) { if (ray hits tri closer than oldHit) { hit = ; } else { hit = oldHit;No RMW } Memory and Bandwidth Hungry Add conditionals / predication to kernels Complicates Early-Z compute masking
Jeremy Sugerman, FLASHG 22 February 2005 Render to Texture DirectX has it, OpenGL does not DirectX raytracer bluescreens NV4x drivers Every shader draws its results to a pbuffer Copied back to a texture each time Superbuffers offered a fix ATI supported them (broken now) ARB killed them Framebuffer Objects made it through the ARB Only drivers are preliminary NV4x drivers
Jeremy Sugerman, FLASHG 22 February 2005 GPU Ray Tracer Enhancements 2D Addressing (duh) kD-Tree Accelerator Early-Z and/or PS3.0 for the Accelerators Tuning Traverse vs. Intersect vs. Shade Occlusion Queries / Fast Reductions Shadows Tuning Bandwidth Shading…