Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia.

Slides:



Advertisements
Similar presentations
Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.
Advertisements

Sven Woop Computer Graphics Lab Saarland University
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Photon Mapping on Programmable Graphics Hardware Timothy J. Purcell Mike Cammarano Pat Hanrahan Stanford University Craig Donner Henrik Wann Jensen University.
Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Breaking the Frame David Luebke University of Virginia.
GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.
Render Cache John Tran CS851 - Interactive Ray Tracing February 5, 2003.
Week 10 - Monday.  What did we talk about last time?  Global illumination  Shadows  Projection shadows  Soft shadows.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
CAP4730: Computational Structures in Computer Graphics Visible Surface Determination.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Computer Graphics (Fall 2005) COMS 4160, Lecture 21: Ray Tracing
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Some Things Jeremy Sugerman 22 February Jeremy Sugerman, FLASHG 22 February 2005 Topics Quick GPU Topics Conditional Execution GPU Ray Tracing.
Hidden Surface Removal
Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
COMP 175: Computer Graphics March 24, 2015
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Cg Programming Mapping Computational Concepts to GPUs.
Week 2 - Friday.  What did we talk about last time?  Graphics rendering pipeline  Geometry Stage.
Interactive Ray Tracing CS 851 David Luebke University of Virginia.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Shadow Mapping Chun-Fa Chang National Taiwan Normal University.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Computer Graphics II University of Illinois at Chicago Volume Rendering Presentation for Computer Graphics II Prof. Andy Johnson By Raj Vikram Singh.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Motivation Properties of real data sets Surface like structures
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Ray Tracing using Programmable Graphics Hardware
Real-Time Dynamic Shadow Algorithms Evan Closson CSE 528.
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Week 2 - Friday CS361.
Combining Edges and Points for Interactive High-Quality Rendering
A Crash Course on Programmable Graphics Hardware
Deferred Lighting.
Real-Time Ray Tracing Stefan Popov.
The Graphics Rendering Pipeline
Christian Lauterbach GPGPU presentation 3/5/2007
UMBC Graphics for Games
Ray Tracing on Programmable Graphics Hardware
RADEON™ 9700 Architecture and 3D Performance
Presentation transcript:

Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia

Besides Parallelization l Besides parallelizing the algorithm, what else can we do to accelerate ray tracing? –Amortize the cost of shooting rays –Use ray tracing selectively

Amortize the cost of rays l The Render Cache –Work by Bruce Walters, currently at Cornell; also by Reinhard et al (Utah) –Basic idea: n Cache ray “hits” as shaded 3D points n Reproject points for new viewpoint n Now many pixels already have color! n Shoot rays for newly uncovered pixels n Shoot rays to update stale pixels –Show demo(?) –Web page w/ good examples, source:

Amortize the cost: Tole et al. l Tole et al. extend these ideas to path tracing –Cache ray hits in object space as Gouraud- shaded vertices –Designed for very slow sampling schemes (full bidirectional path tracing) n Pick pixels to sample carefully n Use OpenGL hardware to display current solution as it is gradually updated –Show the Tole video

Amortize The Cost: Frameless Rendering l Eliminate frames altogether –If you can render 1/3 of the pixels in a vertical retrace period: n Double buffering displays a new frame after 3 vertical refreshes n Single buffering causes horizontal tearing artifacts n Frameless rendering updates pixels as soon as they are computed… …but computes them in a randomized order to avoid coherent tearing artifacts –Show the Utah video

Shoot Rays Selectively l Use ray tracing selectively to augment a traditional interactive pipeline –Ex: use rays for shadows only –Ex: Use ray tracing to calculate corrective textures where necessary (e.g., shiny objects)

Summary So Far l Interactive ray tracing is a reality –Parker et al (SGI supercomputer) –Wald et al (Cluster of PCs) l Why IRT? –Complex/realistic shading –Big data –Decoupled sampling

Summary So Far l How IRT? –Ray tracing is embarrassingly parallel n Field of VAX/Cray joke n But memory coherence is a problem –Brute force: shared-memory supercomputer –Slightly smarter: distributed cluster n Fan-in, latency, model sharing are issues –Amortize cost n Cache/reuse samples, frameless rendering –Use selectively n Shadows only, corrective textures

Moving to Hardware l Next topic: moving ray tracing to the GPU –Why do this? l Two papers: –Ray Engine (Carr et al., U. Illinois) –Ray Tracing On Programmable Graphics Hardware (Purcell et al., Stanford) n I stole most of the following slides from this talk

Related Work: The Ray Engine l Nathan Carr, Jesse Hall, John Hart (University of Illinois) l Basic idea: use the fragment hardware! –Ray intersection is a crossbar: n Intersect a bunch of rays with a bunch of triangles, keep closest hit on each ray –Triangle rasterization is a crossbar: n Intersect a bunch of pixels with a bunch of triangles, keep closest hit at each pixel

(a) Naive: intersect all rays w/ all polys (b) Acceleration structures break crossbar grid up into a sparse block structure, but blocks are still dense crossbars (c) Result: a series of points on the crossbar, max 1 per ray (closest wins)

(a) Each pixel potentially intersected with each poly (b) Modern hardware

Ray Engine l Map ray casting crossbar to rasterization crossbar –Distribute rays across pixels n Ray-orgins texture n Ray-directions texture –Broadcast a stream of triangles as the vertex data interpolated across screen-filling quads n Quad color  Triangle id n Quad multi-texture coords: n Triangle vertices a,b, normal n, edges ab, ac, bc –Output: n Color = triangle id, alpha = intersect, z = t value

Ray Engine l Bulk of ray tracing computation is intersection l CPU handles bounding volume traversal, recursion, etc l GPU does ray-intersection on bundles of rays and triangles handed to it by CPU –NV_FENCE to keep both humming l Sometimes the CPU should intersect rays!

Why Ray Tracing? l Global illumination l Good shadows! –Doom 3 will be using shadow volumes n Expensive! –Shadow maps are hard to use and prone to artifacts l Efficient ray tracing based shadows could be the next killer feature for GPUs Doom 3 [id Software]

Why Ray Tracing? l Output-sensitive algorithm –Sublinear in depth complexity l Selective sampling –Frameless rendering [Bishop et al. 1994] –Render Cache [Walter et al. 1995] –Shading Cache [Tole et al. 2002] l Interactive on clusters of PCs [Wald et al. 2001] and supercomputers [Parker et al ] Power Plant [Wald et al. 2001]

Beyond Moore’s Law Yearly growth well above Moore’s Law (1.5) SeasonProductMT/sYr rateMF/sYr rate 2H97Riva H98Riva ZX H98Riva TNT H99Riva TNT H99GeForce H00GeForce2 GTS H00GeForce2 Ultra H01GeForce H02GeForce Courtesy of Kurt Akeley NVIDIA Historicals

Graphics Pipeline Application Geometry Rasterization Texture Fragment Display Command Textures Fragment Program Registers Fragment Input Fragment Output Traditional PipelineProgrammable Fragment Pipeline

Contributions l Map complete ray tracer onto GPU –Ray tracing generally thought to be incompatible with the traditional graphics pipeline l Abstract programmable fragment processor as a stream processor l Map ray tracing to streaming computation l Show that streaming GPU-based ray tracer is competitive with CPU-based ray tracer

Assumptions l Static scenes l Triangle primitives only l Uniform grid acceleration structure

Stream Programming Model Programmable fragment processor is essentially a stream processor l Kernels and streams –Stream is a set of data records –Kernels operate on records –Streams connect kernels together –Kernels can read global memory kernelinputrecordstreamoutputrecordstream kernel globals globals

Streaming Ray Tracer (Simplified) Generate Eye Rays Traverse Acceleration Structure Intersect Triangles Shade Hits and Generate Shading Rays Camera Grid Triangles Materials rays ray-voxel pairs hits pixels

Eye Ray Generator CameraScreen Generate Eye Rays rays Camera Scene

Traverser CameraScreen Traverse Acceleration Structure Grid rays ray-voxel pairs Scene

Intersector CameraScreenScene Intersect Triangles Triangles hits ray-voxel pairs

Intersection Code float4 Intersect( float3 ro, float3 rd, int listpos, float4 h ) { float tri_id = texture( listpos, trilist ); float3 v0 = texture( tri_id, v0 ); float3 v1 = texture( tri_id, v1 ); float3 v2 = texture( tri_id, v2 ); float3 edge1 = v1 – v0; float3 edge2 = v2 – v0; float3 pvec = Cross( rd, edge2 ); float det = Dot( edge1, pvec ); float inv_det = 1/det; float3 tvec = ro – v0; float u = Dot( tvec, pvec ) * inv_det; float3 qvec = Cross( tvec, edge1 ); float v = Dot( rd, qvec ) * inv_det; float t = Dot( edge2, qvec ) * inv_det; // determine if valid hit by checking // u,v > 0 and u+v < 1 // set hit data into h based on valid hit return float4( {t,u,v,id} ); } Intersect Triangles Triangles hits ray-voxel pairs

Ray Tracing on a GPU l Store scene data in texture memory –Dependent texturing is key l Multipass rendering for flow control –Branching would eliminate this need

Scene in Texture Memory xyz … … … xyz … … Uniform Grid 3D Luminance Texture Triangle List 1D Luminance Texture Triangles 3x 1D RGB Textures vox0 vox1vox2vox3vox4vox5voxM vox0vox2 tri0 tri1tri2tri3tri4tri5triN v0 v1 v2

Texture As Memory l Currently limited in size - 128MB –About 3M 36 bytes per triangle l Uniform grid –Maps naturally to 3D textures –Requires 4 levels of dependent texture lookups l 1D textures limited in length –Emulate larger address space with 2D textures l Want integer addressing – not floating point –Efficient access without interpolation l Integer arithmetic

Streaming Flow Control Fragments (Input Stream) Fragment Program (Kernel) Fragment Program Output (Output Stream) Rasterization Texture (Globals) Application and Geometry Stages

Multiple Rendering Passes Pass 1 Generate Eye Rays Draw quad Rasterize

Multiple Rendering Passes Pass 1 Generate Eye Rays Run fragment program

Multiple Rendering Passes Pass 1 Generate Eye Rays Save to offscreen buffer (rays)

Multiple Rendering Passes Pass 2 Traverse Draw quad Rasterize

Multiple Rendering Passes Restore (rays) Pass 2 Traverse Run fragment program

Multiple Rendering Passes Pass 2 Traverse Save to offscreen buffer (ray voxel pr)

Streaming Ray Tracer Generate Eye Rays Traverse Acceleration Structure Intersect Triangles Shade Hits and Generate Shading Rays Camera Grid Triangles Materials

Multipass Optimization l Reduce the number of passes –Choose to traverse or intersect based on work to be done for each type of pass n Connection Machine ray tracer [Delany 1988] n Intersect once 20% of active rays need intersecting l Make each pass less expensive –Most passes involve only a few rays –Early fragment kill based on fragment mask n Saves compute and bandwidth

Scene Statistics v – average number of voxels a ray pierces t – average triangles a ray intersects s – average number of shading evaluations per ray P – number of rendering passes s t v C = R*(Cr + v*Cv + t*Ct + s*Cs) + R*P*Cmask P

Performance Estimates l Pentium III 800 MHz CPU implementation –20M intersections/s [Wald et al. 2001] l Simulated performance –2G instructions/s and 8GB/s bandwidth –Instruction limited n 56M intersections/s –Nearly bandwidth limited n 222M intersections/s l Streaming ray tracing is compute limited!

Demo Analysis l Prototype Performance (ATI R300) –500K – 1.4M raycast/s –94M intersections/s –Only three weeks of coding effort l ATI Radeon 8500 GPU (R200) –114M intersections/s [Carr et al. 2002] –Fixed point operations –Only ray-triangle intersection kernel

Summary l Programmable GPU is a stream processor l Ray tracing can be a streaming computation l Complete ray tracer can map onto the GPU –Ray tracing generally thought to be incompatible with the traditional graphics pipeline l Streaming GPU-based ray tracer is competitive with CPU-based ray tracer

Architectural Results l Fragment mask proposed for efficient multipass –Stream buffer eliminates this need l Stream data should not go through standard texture cache l Triangles cache well for primary rays, secondary less so l Branching architecture –More cache coherence than the multipass architecture for scene data –Reduces memory bandwidth for stream data –But has its own costs…

Final Thoughts l Ray tracing maps into current GPU architecture –Does not require fundamentally different hardware –Hybrid algorithms possible l What else can the GPU do? –Given you can do ray tracing, you can do anything –Fluid flow, molecular dynamics, etc. l GPU performance increase will continue to outpace CPU performance increase