Ray Tracing, Part 1 Dinesh Manocha COMP 575/770.

Slides:



Advertisements
Similar presentations
GR2 Advanced Computer Graphics AGR
Advertisements

Sven Woop Computer Graphics Lab Saarland University
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Restart Trail for Stackless BVH Traversal Samuli Laine NVIDIA Research.
Collision Detection CSCE /60 What is Collision Detection?  Given two geometric objects, determine if they overlap.  Typically, at least one of.
Ray Tracing Ray Tracing 1 Basic algorithm Overview of pbrt Ray-surface intersection (triangles, …) Ray Tracing 2 Brute force: Acceleration data structures.
Computer graphics & visualization Collisions. computer graphics & visualization Simulation and Animation – SS07 Jens Krüger – Computer Graphics and Visualization.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
Latency considerations of depth-first GPU ray tracing
CAP4730: Computational Structures in Computer Graphics Visible Surface Determination.
Visibility Culling. Back face culling View-frustrum culling Detail culling Occlusion culling.
Week 14 - Monday.  What did we talk about last time?  Bounding volume/bounding volume intersections.
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Ray Tracing Acceleration Structures Solomon Boulos 4/16/2004.
Tomas Mőller © 2000 Speeding up your game The scene graph Culling techniques Level-of-detail rendering (LODs) Collision detection Resources and pointers.
Ray Tracing Performance
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
Collision Detection David Johnson Cs6360 – Virtual Reality.
10/11/2001CS 638, Fall 2001 Today Kd-trees BSP Trees.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Spatial Data Structures Jason Goffeney, 4/26/2006 from Real Time Rendering.
1 Speeding Up Ray Tracing Images from Virtual Light Field Project ©Slides Anthony Steed 1999 & Mel Slater 2004.
SPATIAL DATA STRUCTURES Jon McCaffrey CIS 565. Goals  Spatial Data Structures (Construction esp.)  Why  What  How  Designing Algorithms for the GPU.
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Ray Tracing Performance Zero to Millions in 45 Minutes Gordon Stoll, Intel.
Real-Time Rendering SPEEDING UP RENDERING Lecture 04 Marina Gavrilova.
12/4/2001CS 638, Fall 2001 Today Managing large numbers of objects Some special cases.
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
CIS 350 – I Game Programming Instructor: Rolf Lakaemper.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
David Luebke11/26/2015 CS 551 / 645: Introductory Computer Graphics David Luebke
Maths & Technologies for Games Spatial Partitioning 2
01/28/09Dinesh Manocha, COMP770 Visibility Computations Visible Surface Determination Visibility Culling.
Ray Tracing Optimizations
David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke
Ray Tracing Acceleration (5). Ray Tracing Acceleration Techniques Too Slow! Uniform grids Spatial hierarchies K-D Octtree BSP Hierarchical grids Hierarchical.
David Luebke3/12/2016 Advanced Computer Graphics Lecture 3: More Ray Tracing David Luebke
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
Ray Tracing Acceleration (3)
Layout by orngjce223, CC-BY Compact BVH StorageFabianowski ∙ Dingliana Compact BVH Storage for Ray Tracing and Photon Mapping Bartosz Fabianowski ∙ John.
Spatial Data Management
CSE 554 Lecture 5: Contouring (faster)
Bounding Volume Hierarchies and Spatial Partitioning
STBVH: A Spatial-Temporal BVH for Efficient Multi-Segment Motion Blur
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
Collision Detection Spring 2004.
Bounding Volume Hierarchies and Spatial Partitioning
Deformable Collision Detection
Modeliranje kompleksnih modelov
Real-Time Ray Tracing Stefan Popov.
Ray Tracing Acceleration Techniques
3D Object Representations
Parts of these slides are based on
Vector Processing => Multimedia
Accelerated Single Ray Tracing for Wide Vector Units
Other time considerations
Database Design and Programming
Collision Detection.
Deformable Collision Detection
Anti-aliased and accelerated ray tracing
Memory System Performance Chapter 3
Ray Tracing Sung-Eui Yoon (윤성의) CS580: Course URL:
David Johnson Cs6360 – Virtual Reality
Modeliranje kompleksnih modelov
B-Trees.
An Analysis of Region Clustered BVH Volume Rendering on GPU
Presentation transcript:

Ray Tracing, Part 1 Dinesh Manocha COMP 575/770

Overview Acceleration structures Interactive Ray Tracing techniques Spatial hierarchies Object hierarchies Interactive Ray Tracing techniques Optimized hierarchy construction Optimized hierarchy traversal - Ray packets

Last lecture Ray Tracing For each pixel: generate ray For each primitive: Does ray hit primitive? Yes: Test depth / write color That means linear time complexity in number of primitives!

Acceleration structures Goal: fast search operation for ray that returns all intersecting primitives Then test only against those Operation should take sub-linear time In practice: conservative superset

Acceleration structures Broad classification: Spatial hierarchies Grids Octrees Kd-trees, BSP trees Object hierarchies Bounding volume hierarchies

Spatial Hierarchies: Grids Building a grid structure Start with bounding box Resolution: often ~ 3n Overlap or intersection test Traversal 3D-DDA Stop if intersection found in current voxel

Grids Grid traversal Objects in multiple voxels Requires enumeration of voxel along ray  3D-DDA Simple and hardware-friendly Objects in multiple voxels Store only references Need to avoid multiple intersection computations Store (ray, object)-tuple in small cache (e.g. with hashing) Do not intersect if found in cache

Grids Grid resolution Strongly scene dependent Cannot adapt to local density of objects Problem: „Teapot in a stadium“ Possible solution: hierarchical grids

Hierarchical Grids Simple building algorithm Advanced algorithm Recursively create grids in high-density voxels Problem: What is the right resolution for each level? Advanced algorithm Separate grids for object clusters Problem: What are good clusters?

Spatial Hierarchies: Octree Hierarchical space partitioning Adaptively subdivide voxels into 8 equal sub-voxels recursively Result in subdivision Problems Rather complex traversal algorithms Slow to refine complexregions

Spatial Hierarchies: kd-trees Binary tree of space subdivisions Each is axis-aligned plane x y y

kD-Trees

kD-Trees

kD-Trees

kD-Trees

Spatial Hierarchies: kd-trees Traversing a kd-tree: recursive Start at root node For current node: If inner node: Find intersection of ray with plane If ray intersects both children, recurse on near side, then far side Otherwise, recurse on side it intersects If leaf node: Intersect with all object. If hit, terminate.

BSP-Trees Recursive space partitioning with half-spaces Binary Space Partition (BSP): Splitting with half-spaces in arbitrary position Kd-Tree Splitting with axis-aligned half-spaces 1.1.2 1.1.2.1 1 1.2 1.1 1.1.1 1.1.1.1

Object Hierarchies: BVHs Different approach: subdivide objects, not space Hierarchical clustering of objects Each cluster represented by bounding volume Binary tree Each parent node fully contains children

Bounding volumes Practically anything can be bounding volume Just need ray intersection method Typical choices: Spheres Axis-aligned bounding boxes (AABBs) Oriented bounding boxes (OBBs) k-DOPs Trade-off between intersection speed and how closely the BV encloses the geometry

BVH traversal Recursive algorithm: Start with root node For current node: Does ray intersect node’s BV? If no, return Is inner node? Yes, recurse on children Is leaf node? Intersect with object(s) in node, store intersection results Note: can’t return after first intersection!

Overview Acceleration structures Interactive Ray Tracing techniques Spatial hierarchies Object hierarchies Interactive Ray Tracing techniques Optimized hierarchy construction Optimized hierarchy traversal - Ray packets

Optimized hierarchy construction The way the hierarchy is constructed has high impact on traversal performance

Building kD-trees Given: Core operation: axis-aligned bounding box (“cell”) list of geometric primitives (triangles?) touching cell Core operation: pick an axis-aligned plane to split the cell into two parts sift geometry into two batches (some redundancy) recurse

Building kD-trees Given: Core operation: axis-aligned bounding box (“cell”) list of geometric primitives (triangles?) touching cell Core operation: pick an axis-aligned plane to split the cell into two parts sift geometry into two batches (some redundancy) recurse termination criteria!

“Intuitive” kD-Tree Building Split Axis Round-robin; largest extent Split Location Middle of extent; median of geometry (balanced tree) Termination Target # of primitives, limited tree depth

“Hack” kD-Tree Building Split Axis Round-robin; largest extent Split Location Middle of extent; median of geometry (balanced tree) Termination Target # of primitives, limited tree depth All of these techniques stink.

“Hack” kD-Tree Building Split Axis Round-robin; largest extent Split Location Middle of extent; median of geometry (balanced tree) Termination Target # of primitives, limited tree depth All of these techniques stink. Don’t use them.

“Hack” kD-Tree Building Split Axis Round-robin; largest extent Split Location Middle of extent; median of geometry (balanced tree) Termination Target # of primitives, limited tree depth All of these techniques stink. Don’t use them. I mean it.

Building good kD-trees What split do we really want? Clever Idea: The one that makes ray tracing cheap Write down an expression of cost and minimize it Cost Optimization What is the cost of tracing a ray through a cell? Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R)

Splitting with Cost in Mind

Split in the middle Makes the L & R probabilities equal Pays no attention to the L & R costs

Split at the Median Makes the L & R costs equal Pays no attention to the L & R probabilities

Cost-Optimized Split Automatically and rapidly isolates complexity Produces large chunks of empty space

Building good kD-trees Need the probabilities Turns out to be proportional to surface area Need the child cell costs Simple triangle count works great (very rough approx.) Cost(cell) = C_trav + Prob(hit L) * Cost(L) + Prob(hit R) * Cost(R) = C_trav + SA(L) * TriCount(L) + SA(R) * TriCount(R)

Termination Criteria Threshold of cost improvement When should we stop splitting? Another Clever idea: When splitting isn’t helping any more. Use the cost estimates in your termination criteria Threshold of cost improvement

Building good kD-trees Basic build algorithm Pick an axis, or optimize across all three Build a set of “candidates” (split locations) BBox edges or exact triangle intersections Sort them or bin them Walk through candidates or bins to find minimum cost split Characteristics you’re looking for “stringy”, depth 50-100, ~2 triangle leaves, big empty cells

Just Do It Benefits of a good tree are not small not 10%, 20%, 30%... several times faster than a mediocre tree

Overview Acceleration structures Interactive Ray Tracing techniques Spatial hierarchies Object hierarchies Interactive Ray Tracing techniques Optimized hierarchy construction Optimized hierarchy traversal - Ray packets

Acceleration of kD-Tree Traversal Compact node structure Relatively little memory overhead Fast traversal step One FP subtract, one FP multiply

What’s in a node? A kD-tree internal node needs: Am I a leaf? Split axis Split location Pointers to children

Compact (8-byte) nodes kD-Tree node can be packed into 8 bytes Leaf flag + Split axis 2 bits Split location 32 bit float Always two children, put them side-by-side One 32-bit pointer

Compact (8-byte) nodes So close! Sweep those 2 bits under the rug… kD-Tree node can be packed into 8 bytes Leaf flag + Split axis 2 bits Split location 32 bit float Always two children, put them side-by-side One 32-bit pointer So close! Sweep those 2 bits under the rug…

No Bounding Box! kD-Tree node corresponds to an AABB Doesn’t mean it has to *contain* one 24 bytes 4X explosion (!)

Acceleration of kD-Tree Traversal Compact node structure Relatively little memory overhead Fast traversal step One FP subtract, one FP multiply

kD-Tree Traversal Step t_split t_max t_min split

kD-Tree Traversal Step t_max t_split t_min split

kD-Tree Traversal Step t_split t_max t_min split

kD-Tree Traversal Step Given: ray P & iV (1/V), t_min, t_max, split_location, split_axis t_at_split = ( split_location - ray->P[split_axis] ) * ray_iV[split_axis] if t_at_split > t_min need to test against near child If t_at_split < t_max need to test against far child

Optimize Your Inner Loop kD-Tree traversal is the most critical kernel It happens about a zillion times It’s tiny Sloppy coding will show up Optimize, Optimize, Optimize Remove recursion and minimize stack operations Other standard tuning & tweaking

Can it go faster? How do you make fast code go faster? Parallelize it!

Ray Tracing and Parallelism Classic Answer: Ray-Tree parallelism independent tasks # of tasks = millions (at least) size of tasks = thousands of instructions (at least) So this is wonderful, right?

Parallelism in CPUs Instruction-Level Parallelism (ILP) pipelining, superscalar, OOO, SIMD fine granularity (~100 instruction “window” tops) easily confounded by unpredictable control easily confounded by unpredictable latencies So…what does ray tracing look like to a CPU?

No joy in ILP-ville At <1000 instruction granularity, ray tracing is anything but “embarrassingly parallel” kD-Tree traversal (CPU view): 1) fetch a tiny fraction of a cache line from who knows where 2) do two piddling floating-point operations 3) do a completely unpredictable branch, or two, or three 4) repeat until frustrated PS: Each operation is dependent on the one before it. PPS: No SIMD for you! Ha!

Split Personality Coarse-Grained parallelism (TLP) is perfect millions of independent tasks thousands of instructions per task Fine-Grained parallelism (ILP) is awful look at a scale <1000 of instructions sequential dependencies unpredictable control paths unpredictable latencies no SIMD

Options Option #3: Improve the ILP situation directly Option #4: … Option #1: Forget about ILP, go with TLP improve low-ILP efficiency and use multiple CPU cores Option #2: Let TLP stand in for ILP run multiple independent threads (ray trees) on one core Option #3: Improve the ILP situation directly how? Option #4: …

…All of the above! multi-core CPUs are already here (more coming) better performance, better low-ILP performance on the right performance curve multi-threaded CPUs are already here improve well-written ray tracer by ~20-30% packet tracing trace multiple rays together in a packet bulk up the inner loop with ILP-friendly operations

Ray Packets Often have large set of rays close together Idea: trace rays in coherent groups (ray packets) Viewer Screen

Ray coherence How does this change tracing? General idea: Traversal and object intersection work on group of rays at a time Also generate secondary (shadow, …) rays in packets General idea: If one ray intersects, all are intersected

BVH traversal Recursive algorithm: Start with root node For current node: Does ray intersect node’s BV? If no, return Is inner node? Yes, recurse on children Is leaf node? Intersect ray with all object(s) in node, store intersection results

BVH packet traversal Recursive algorithm: Start with root node For current node: Does any ray intersect node’s BV? If no, return Is inner node? Yes, recurse on children Is leaf node? Intersect all rays with all object(s) in node, store intersection results

Ray packet advantages Why does this make things faster? Disadvantage: Less memory bandwidth: nodes/objects only loaded once for rays in packet Allows data parallel processing! Current CPUs: e.g. Intel SSE All GPUs Disadvantage: Rays can be intersected with objects/nodes they would never hit!

Data parallelism + = Essentially vector operations (SIMD) Operations work on all elements in parallel v1 v2 v3 v4 v5 v6 v7 v8 v1 v2 v3 v4 v5 v6 v7 v8 + w1 w2 w3 w4 w5 w6 w7 w8 = v1+w1 v2+w2 v3+w3 v4+w4 v5+w5 v6+w6 v7+w7 v8+w8

Data parallelism Vector operations usually as fast as a single scalar operation Intel SSE: 4-wide vectors GPUs: 8-32 wide Cell: 4-wide Vector processors: up to 64,000!

SIMD ray processing Can use SIMD units to parallelize for rays Each vector has one component from one ray Thus, can process small group at same speed as one ray! trii ray1 ray2 ray3 ray4 ray5 ray6 ray7 ray8  trii

Packet Tracing Old way New way Very, very old idea from vector/SIMD machines Vector masks Old way if the ray wants to go left, go left if the ray wants to go right, go right New way if any ray wants to go left, go left with mask if any ray wants to go right, go right with mask

Key Observations Doesn’t add “bad” stuff What it does add Traverses the same nodes Adds no global fetches Adds no unpredictable branches What it does add SIMD-friendly floating-point operations Some messing around with masks Result: Very robust in relation to single rays

How many rays in a packet? Packet tracing gives us a “knob” with which to adjust computational intensity. Do natural SIMD width first Real answer is potentially much more complex diminishing returns due to per-ray costs lack of coherence to support big packets register pressure, L1 pressure

Why are BVHs slower? Intersection test more costly Up to 6 ray-plane intersections for AABB (slabs test) Just 1 for kd-tree No front-to-back ordering Cannot stop after finding first hit Nodes take more space 32 bytes vs. 8 bytes

On the other hand… AABBs can provide tighter fit automatically No empty leafs, tree does not need to be as deep Primitives only referenced once less nodes in hierarchy #nodes known in advance (2n-1) (if 1 primitive/leaf) Can borrow many tricks we know from kd-Trees SAH build, SIMD, frustum triangle test, interval arithmetic, …

On the other hand… AABBs can provide tighter fit automatically No empty leafs, tree does not need to be as deep Primitives only referenced once less nodes in hierarchy #nodes known in advance (2n-1) (if 1 primitive/leaf) Can borrow many tricks we know from kd-Trees SAH build, SIMD, frustum triangle test, interval arithmetic, … Most important: can be updated for dynamic scenes

Hierarchy updates What does updating mean? Underlying geometry changes Update will ensure correctness of hierarchy without rebuilding it Should be faster than rebuild

Ray Tracing Dynamic Scenes Alternative 1: Handle dynamic objects separately Keep dynamic objects out of ADS, intersect separately Only works for very few ‘dynamic’ objects Alternative 2: Rebuild ADS every frame (Super-)linear in #objects Only applicable to tiny scenes Note: This might change for future HW…

Ray Tracing Dynamic Scenes Alternative 3: Update ADS for dynamic objects Problem 1: How to avoid degradation of ADS ? Problem 2: How to efficiently update ADS ? Kd-trees: Quite problematic Original cost estimates (SAH) wrong/useless after geom. changes Updating often not (easily) possible E.g., can’t move root split: would need to rebuild all sibling nodes But: works reasonably well for BVH

Choosing a structure There is no ‘best’ acceleration structure All have pros and cons Grid: + fast construction - bad for high local detail (teapot/stadium)

Choosing a structure There is no ‘best’ acceleration structure All have pros and cons kd-tree: + fast traversal - expensive build, only static scenes

Choosing a structure There is no ‘best’ acceleration structure All have pros and cons BVH: + can be updated for dynamic scenes - traversal more expensive than kd-tree

Summary Use acceleration structure Spatial hierarchies: grid, kd-trees Object hierarchies: BVH Fast construction and traversal algorithms Other considerations: memory layout, vector processing, …