Real-Time Ray Tracing Stefan Popov.

Real-Time Ray Tracing Stefan Popov

Visibility Queries Ray tracing Binary visibility
Find closest intersected primitive Binary visibility Is X visible from Y ? Same as ray casting X Y Basis of most rendering algorithms

Visibility Queries: Naïve Algorithm
Naïve: Intersect each primitive with each ray Complexity: O(RN). How slow is that? Ray Tracing 1024x1024, 16AA Triangles: ~ 8K Queries/pix: ~ 112 Total: ~ 112 * 106 Path Tracing 1024x1024 Triangles: ~ 64K Queries/pix: ~ 120K Total: ~ 120 * 109 Example Total queries Intersections / Ray Total Operations Left 112 * 106 8,000 0.896 * 1012 Right 120 * 109 64,000 15.2 * 1015

Acceleration Structures
Idea: Only touch potential candidates Regular grid O(N1/3) traversal

Hierarchical Structures
Idea: Divide space using a (binary-) tree Similar to search trees BVH Group primitives KD-trees Space partitioning N2 N0 N0 N2 N1 N0 L1 L2 L3 L4 N1 N2 N1

Traversal Complexity: O(log N) At a node: At leaf:
Order children by entry distance Search for intersection recursively At leaf: Intersect with contained primitives Similar traversal for KD trees and BVHs Far Near

Construction of Acceleration Structures
Bottom up – O(N) Works for BVHs only and builds suboptimal trees Top-down recursive – O(N logN) Start with AABB around scene, set of all primitives Split the set of primitives into two Split space into two Construct recursively But how to split?

Construction Details Choose a split plane KD Trees BVHs
Partition space of node using the split plane Form the 2 sets of objects (might overlap) BVHs Partition the set of objects according to centroids Compute tight bounds Left Right L R B Split plane Split plane

Construction: SAH Split plane choice
Split in the middle, median split, … Cost model: Surface area heuristic Gives the expected cost E for traversing node N NL and NR: children of N SA(N) surface area of the AABB of N Derived from geometric probability

Real-time Rendering Real-time: At least 30FPS
Most algorithms build on Visibility Queries Need to build trees and ray trace in << 16ms Bottleneck: Tree construction Dynamic scenes only Relatively large complexity: O(N logN) Complex implementation leading to low speeds Next talk Bottleneck: Traversal Need to process millions to billions rays per frame 1sec/60 = 16ms

Making it fast Faster hardware Fast traversal Dynamic scenes
But today's hardware only gets more parallel Fast traversal Better single ray traversal algorithms Parallelism Amortizing work (packets) Optimize structures for better RT performance Dynamic scenes Rigid body animations Faster Builds

Parallelism Rendering algorithms: embarrassingly parallel
Per-pixel computations are usually independent Speed up by using multi-core and clusters Linear speedup (in theory) Need good load balancing algorithms However Speedup limited by latencies and bandwidth Efficient parallel construction of acceleration structures is hard

Packet Traversal Slowest operation in modern processors is reading memory Can be 1000x slower than an arithmetic instruction Blocks processor Idea: Process a packet of rays together If any ray wants to visit a node – visit with all rays Relies on coherence Loading of node from memory is amortized Can also amortize other costs

SIMD Packets Modern CPUs have SIMD vector units
SSE2 – SSE5, AltiVec, … Execute one operation on all components of vector Idea: Do SIMD-wide packets on the vector units Same idea as packet traversal Use a mask to specify which components are active As fast as single ray in the worst case Rather hard to program by hand Research into auto-vectorizing compilers

Rasterization Efficiently solve queries for primary rays
For projective cameras only Does not require an acceleration structure Idea: Project primitives and keep depth data The basic operation in all current GPUs

GPUs I GPUs – not so much about graphics anymore
Evolving into general purpose super computers Not every problem can benefit from the GPU

GPUs II Multi-core multi-threaded wide SIMD machines CUDA: C for HPC
SIMD with automatic masking (aka SIMT) Many cores (up to 30 currently) on the same chip Cover latencies by hardware multi-threading Thousands of threads run simultaneously Vary large memory bandwidth But also very wide memory interface CUDA: C for HPC Write programs in single thread (similar to PRAM) Hardware takes care of the rest

Ray tracing on the GPU Mostly impossible before DX9 hardware
Limited programming model Biggest challenge: No memory writes Stackless traversal required CUDA and DX10 hardware Simply program your ray tracer and optimize a bit Biggest challenge: Coherence Fastest ray tracing platform at the moment NVIDIA has its own ray tracer: OptiX

Better Acceleration Structures
Optimize better for ray tracing SAH is not perfect Trees optimized for the general case BVH construction only looks at a limited set of splits More optimizations = slower construction Not much work recently Spatial split BVHs Idea: Combine KD-tree construction and BVH Speedup: 2-6x

Thank you!

Real-Time Ray Tracing Stefan Popov.

Similar presentations

Presentation on theme: "Real-Time Ray Tracing Stefan Popov."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Real-Time Ray Tracing Stefan Popov.

Similar presentations

Presentation on theme: "Real-Time Ray Tracing Stefan Popov."— Presentation transcript:

Similar presentations

About project

Feedback