Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.

Slides:



Advertisements
Similar presentations
A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.
Advertisements

Sven Woop Jörg Schmittler Philipp Slusallek Computer Graphics Lab Saarland University, Germany RPU: A Programmable Ray Processing Unit for Realtime Ray.
Sven Woop Computer Graphics Lab Saarland University
† Saarland University, Germany ‡ University of Utah, USA Estimating Performance of a Ray-Tracing ASIC Design Sven Woop † Erik Brunvand ‡ Philipp Slusallek.
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Restart Trail for Stackless BVH Traversal Samuli Laine NVIDIA Research.
Ray Tracing Ray Tracing 1 Basic algorithm Overview of pbrt Ray-surface intersection (triangles, …) Ray Tracing 2 Brute force: Acceleration data structures.
Computer graphics & visualization Collisions. computer graphics & visualization Simulation and Animation – SS07 Jens Krüger – Computer Graphics and Visualization.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität.
Two Methods for Fast Ray-Cast Ambient Occlusion Samuli Laine and Tero Karras NVIDIA Research.
Latency considerations of depth-first GPU ray tracing
Two-Level Grids for Ray Tracing on GPUs
RT06 conferenceVlastimil Havran On the Fast Construction of Spatial Hierarchies for Ray Tracing Vlastimil Havran 1,2 Robert Herzog 1 Hans-Peter Seidel.
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany.
Ray Tracing Acceleration Structures Solomon Boulos 4/16/2004.
Computer Graphics (Fall 2005) COMS 4160, Lecture 21: Ray Tracing
GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University.
1 View Coherence Acceleration for Ray Traced Animation University of Colorado at Colorado Springs Master’s Thesis Defense by Philip Glen Gage April 19,
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Ray Tracing Dynamic Scenes using Selective Restructuring Sung-eui Yoon Sean Curtis Dinesh Manocha Univ. of North Carolina at Chapel Hill Lawrence Livermore.
1 Advanced Scene Management System. 2 A tree-based or graph-based representation is good for 3D data management A tree-based or graph-based representation.
Interactive k-D Tree GPU Raytracing Daniel Reiter Horn, Jeremy Sugerman, Mike Houston and Pat Hanrahan.
RT08, August ‘08 Large Ray Packets for Real-time Whitted Ray Tracing Ryan Overbeck Columbia University Ravi Ramamoorthi Columbia University William R.
RAY TRACING ON GPU By: Nitish Jain. Introduction Ray Tracing is one of the most researched fields in Computer Graphics A great technique to produce optical.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Accelerating Ray Tracing using Constrained Tetrahedralizations Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday,
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Real-Time Rendering SPEEDING UP RENDERING Lecture 04 Marina Gavrilova.
Introduction to Realtime Ray Tracing Course 41 Philipp Slusallek Peter Shirley Bill Mark Gordon Stoll Ingo Wald.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Stefan Popov Space Subdivision for BVHs Stefan Popov, Iliyan Georgiev, Rossen Dimov, and Philipp Slusallek Object Partitioning Considered Harmful: Space.
Supporting Animation and Interaction Ingo Wald SCI Institute, University of Utah
Introduction to Realtime Ray Tracing Course 41 Philipp Slusallek Peter Shirley Bill Mark Gordon Stoll Ingo Wald.
Hierarchical Penumbra Casting Samuli Laine Timo Aila Helsinki University of Technology Hybrid Graphics, Ltd.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough.
Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.
Ray Tracing Animated Scenes using Motion Decomposition Johannes Günther, Heiko Friedrich, Ingo Wald, Hans-Peter Seidel, and Philipp Slusallek.
Interactive Ray Tracing of Dynamic Scenes Tomáš DAVIDOVIČ Czech Technical University.
Ray Tracing II. HW1 Part A due October 10 Camera module Object module –Read from a file –Sphere and Light only Ray tracer module: –No shading. No reflection.
Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.
Advanced topics Advanced Multimedia Technology: Computer Graphics Yung-Yu Chuang 2006/01/04 with slides by Brian Curless, Zoran Popovic, Mario Costa Sousa.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Ray Tracing using Programmable Graphics Hardware
Virtual Light Field Group University College London Ray Tracing with the VLF (VLF-RT) Jesper Mortensen
Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.
Ray Tracing Optimizations
Ray Tracing Acceleration (5). Ray Tracing Acceleration Techniques Too Slow! Uniform grids Spatial hierarchies K-D Octtree BSP Hierarchical grids Hierarchical.
1 Advanced Scene Management. 2 This is a game-type-oriented issue Bounding Volume Hierarchies (BVHs) Binary space partition trees (BSP Trees) “Quake”
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
1 The Method of Precomputing Triangle Clusters for Quick BVH Builder and Accelerated Ray Tracing Kirill Garanzha Department of Software for Computers Bauman.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
STBVH: A Spatial-Temporal BVH for Efficient Multi-Segment Motion Blur
Modeliranje kompleksnih modelov
Real-Time Ray Tracing Stefan Popov.
Ray Tracing Acceleration Techniques
Christian Lauterbach GPGPU presentation 3/5/2007
Accelerated Single Ray Tracing for Wide Vector Units
Modeliranje kompleksnih modelov
Presentation transcript:

Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek

Outline Previous Work B-KD Tree as new Spatial Index Structure DynRT Architecture Traveral Processing Unit Update Processor Prototype Implementation Live Demo Conclusion

Previous Work Ray Tracers for Static Scenes CPU based: [OpenRT], [MLRT SIGGRAPH05] GPU based: Purcell (Grids) [SIGGRAPH02], Foley et al. (KD Trees) [GH05] Custom Hardware: Commercial Hardware (ART-VPS) Schmittler (KD Trees) [GH04] RPU (KD Trees) [SIGGRAPH05] Ray Tracers for Dynamic Scenes CPU based: Wald (Grids) [SIGGRAPH06] Wald (AABVHs) [TOG / Tech. Rep. 2006] Custom Hardware: Woop (B-KD Trees) [GH06]

Definition of B-KD Trees B-KD Tree (Bounded KD-Tree) Binary Tree 1D bounding intervalls for each child Leaf nodes point to a single primitive

B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap  Primitives in single leaf nodes  More traversal steps as for KD Tree  Support for dynamic scenes

B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap  Primitives in single leaf nodes  More traversal steps as for KD Tree  Support for dynamic scenes

B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap  Primitives in single leaf nodes  More traversal steps as for KD Tree  Support for dynamic scenes

B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap  Primitives in single leaf nodes  More traversal steps as for KD Tree  Support for dynamic scenes

B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap  Primitives in single leaf nodes  More traversal steps as for KD Tree  Support for dynamic scenes

B-KD Tree Construction If #primitives > 1 then

B-KD Tree Construction If #primitives > 1 then Compute center of mass

B-KD Tree Construction If #primitives > 1 then Compute center of mass Spatial Median Object Median

B-KD Tree Construction If #primitives > 1 then Compute center of mass Spatial Median Object Median

B-KD Tree Construction If #primitives > 1 then Compute center of mass Sort geometry along all three dimensions

B-KD Tree Construction If #primitives > 1 then Compute center of mass Sort geometry along all three dimensions Partitionings can be determined by splitting a list at a position

B-KD Tree Construction If #primitives > 1 then Compute center of mass Sort geometry along all three dimensions Partitionings can be determined by splitting a list at a position Build all possible partitionings in all three dimensions

B-KD Tree Construction If #primitives > 1 then Compute center of mass Sort geometry along all three dimensions Partitionings can be determined by splitting a list at a position Build all possible partitionings in all three dimensions Find the partitioning with smallest SAH cost

B-KD Tree Construction If #primitives > 1 then Compute center of mass Sort geometry along all three dimensions Partitionings can be determined by splitting a list at a position Build all possible partitionings in all three dimensions Find the partitioning with smallest SAH cost Create node and recurse

B-KD Tree Construction If #primitives > 1 then Compute center of mass Sort geometry along all three dimensions Partitionings can be determined by splitting a list at a position Build all possible partitionings in all three dimensions Find the partitioning with smallest SAH cost Create node and recurse Else if #primitives = 1 then Create leaf node

B-KD Tree Construction Rendering Performance 20% to 100% better than center splitting approaches Two-level B-KD Trees Top-level B-KD tree over object instances Bottom-level B-KD tree for each object

B-KD Trees for Dynamic Scenes On changed object geometry B-KD tree bounds are updated from bottom up B-KD tree structure remains constant  Linear updating complexity

Examples Bounding approaches perform well for Continous motion Structure of motion must match tree structure E.g. skinned meshes, characters, water surfaces,...

Examples Bounding approaches perform well for Continous motion Structure of motion must match tree structure E.g. skinned meshes, characters, water surfaces,...

Examples Bounding approaches perform well for Continous motion Structure of motion must match tree structure E.g. skinned meshes, characters, water surfaces,...

Examples Bounding approaches perform well for Continous motion Structure of motion must match tree structure E.g. skinned meshes, characters, water surfaces,...

Examples Bounding approaches perform well for Continous motion Structure of motion must match tree structure E.g. skinned meshes, characters, water surfaces,...

Examples Bounding approaches perform well for Continous motion Structure of motion must match tree structure E.g. skinned meshes, characters, water surfaces,...

Examples Bounding volume approaches are less efficient for Non-continous motion Structure of motion does not match tree structure High traversal cost due to large overlapping boxes

Examples Bounding volume approaches fail for Non-continous motion Structure of motion does not match tree structure High traversal cost due to large overlapping boxes

Examples Bounding volume approaches fail for Non-continous motion Structure of motion does not match tree structure High traversal cost due to large overlapping boxes

Examples Bounding volume approaches fail for Non-continous motion Structure of motion does not match tree structure High traversal cost due to large overlapping boxes

Examples Bounding volume approaches fail for Non-continous motion Structure of motion does not match tree structure High traversal cost due to large overlapping boxes

Comparison for Gael Scene 52k triangles Index typeIndex size# trav-cost# tri-ints KD1.4 MB314.8 B-KD1.1 MB AABVH2.2 MB KD treeB-KD tree AABVH

DynRT Architecture Extension of RPU approach vertices from memory

DynRT Architecture Rendering Units Highly multi-threaded Higher hardware usage Synchronous execution of packets of 4 rays Memory bandwidth reduction First level caches Memory bandwidth reduction vertices from memory

DynRT Architecture Programmable Shading Unit Similar to RPU shading processor Ray generation tasks Material shading Calls Ray Casting Units to cast rays vertices from memory

DynRT Architecture Programmable Shading Unit Ray Casting Units vertices from memory

DynRT Architecture Programmable Shading Unit Ray Casting Units Traversal Processing Unit Efficient traversal of B-KD trees Two level B-KD trees supported vertices from memory

DynRT Architecture Programmable Shading Unit Ray Casting Units Traversal Processing Unit Efficient traversal of B-KD trees Two level B-KD trees supported Geometry Unit Ray transformations Vertex-based ray/triangle intersection [Möller Trumbore] Shared vertices save memory 6x vertices from memory

DynRT Architecture Programmable Shading Unit Ray Casting Units Scene Changes Skinning Processor (see paper) Skeleton Subspace Deformation Re-uses Geometry Unit Pure stream architecture vertices from memory

DynRT Architecture Programmable Shading Unit Ray Casting Units Scene Changes Skinning Processor (see paper) Skeleton Subspace Deformation Re-uses Geometry Unit Pure stream architecture Update Processor Stream-like architecture Partial breadth-first execution One B-KD node update per clock cycle peak vertices from memory

Traversal of B-KD Trees Early ray termination Clipping of near/far interval against both bounding intervalls Take closer child, push farther child to stack Traversal order does not affect correctness Complexity 4x computational cost of KD tree traversal step 2x stack memory

Traversal Processing Unit Stack control computes next address

Traversal Processing Unit Stack control computes next address Next node is fetched from cache

Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes

Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes 4 Decision Units compute per ray traversal decision

Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes 4 Decision Units compute per ray traversal decision Packet Decision Unit computes packet traversal decision Packet goes left if exists a that ray goes left Packet goes right if exists a ray that goes right Packet goes from left to right if exists a ray that goes into both children from left to right

Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes 4 Decision Units compute per ray traversal decision Packet Decision Unit computes packet traversal decision Packet goes left if exists a that ray goes left Packet goes right if exists a ray that goes right Packet goes from left to right if exists a ray that goes into both children from left to right  Incoherent packets possible

vertices from memory

Update of B-KD Trees Leaf Node x y leaf

Update of B-KD Trees Leaf Node Fetch vertices x y leaf

Update of B-KD Trees Leaf Node Fetch vertices Compute leaf boxes x y leaf

Update of B-KD Trees Leaf Node Fetch vertices Compute leaf boxes Inner Node Update 1D node bounds x y leaf

Update of B-KD Trees Leaf Node Fetch vertices Compute leaf boxes Inner Node Update 1D node bounds Merge boxes of both children x y leaf

Update of B-KD Trees Leaf Node Fetch vertices Compute leaf boxes Inner Node Update 1D node bounds Merge boxes of both children y leaf x

Update of B-KD Trees Leaf Node Fetch vertices Compute leaf boxes Inner Node Update 1D node bounds Merge boxes of both children y leaf x

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Update Processor ¼ more memory for instructions Optimized Instruction Set Load vertex Merge 3 vertices to a box Merge 2 boxes (plus update node) 64 Vertex and 64 Box Registers Optimal re-use of data Stream Based Reads one instruction stream Writes a sequential node stream Vertices are accessed as sequential as possible

Prototype Implementation Hardware FPGA board from Alpha Data Xilinx Virtex4 LX MB DDR Memory Implementation Packets of 4 rays 32 packets of rays 24 bit floating point 66 MHz Virtex4 Board

Results Update Performance 66 million B-KD tree node updates 200 updates per second for characters with 80k triangles 1 to 15.0 % of rendering time Ray Casting Performance 2 to 8 million rays per second 10 to 40 fps at 512x386

Conclusions and Future Work Ray Tracing Hardware Design Efficient for coherent dynamic scenes Less efficient for non-continous scene changes Working Prototype Implementation Even FPGA achieves high performance 2x - 3x OpenRT on Pentium 4 2,6 GHz Post layout ASIC Results [RT06] 90nm, 400 MHz, 200mm^2, 19.5 GB/s Performs up to 40x faster ( fps at 1024x768)

Live Demo

Questions? Project Homepage: Computer Graphics Lab at Saarland University: