Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research.

Slides:

Advertisements

Similar presentations

Fast Parallel Construction of High-Quality Bounding Volume Hierarchies

Advertisements

Sven Woop Computer Graphics Lab Saarland University

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.

Restart Trail for Stackless BVH Traversal Samuli Laine NVIDIA Research.

IIIT Hyderabad Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad,

Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.

Two Methods for Fast Ray-Cast Ambient Occlusion Samuli Laine and Tero Karras NVIDIA Research.

Latency considerations of depth-first GPU ray tracing

Two-Level Grids for Ray Tracing on GPUs

RT06 conferenceVlastimil Havran On the Fast Construction of Spatial Hierarchies for Ray Tracing Vlastimil Havran 1,2 Robert Herzog 1 Hans-Peter Seidel.

Efficient Sparse Voxel Octrees

Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.

©Wen-mei W. Hwu and David Kirk/NVIDIA 2010 ECE 498HK Computational Thinking for Many-core Computing Lecture 15: Dealing with Dynamic Data.

RT 08 Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Symposium on Interactive Ray Tracing 2008 Los Angeles, California Kirill Garanzha.

Parallel Prefix Sum (Scan) GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from GPU Gems III.

Overview Class #10 (Feb 18) Assignment #2 (due in two weeks) Deformable collision detection Some simulation on graphics hardware... –Thursday: Cem Cebenoyan.

Ray Tracing Dynamic Scenes using Selective Restructuring Sung-eui Yoon Sean Curtis Dinesh Manocha Univ. of North Carolina at Chapel Hill Lawrence Livermore.

CS6500 Adv. Computer Graphics © Chun-Fa Chang, Spring 2003 Space Partitions.

Apex Point Map for Constant-Time Bounding Plane Approximation Samuli Laine Tero Karras NVIDIA.

1 Advanced Scene Management System. 2 A tree-based or graph-based representation is good for 3D data management A tree-based or graph-based representation.

FAST AND SIMPLE AGGLOMERATIVE LBVH CONSTRUCTION

RT08, August ‘08 Large Ray Packets for Real-time Whitted Ray Tracing Ryan Overbeck Columbia University Ravi Ramamoorthi Columbia University William R.

Samuli Laine: A General Algorithm for Output-Sensitive Visibility PreprocessingI3D 2005, April 3-6, Washington, D.C. A General Algorithm for Output- Sensitive.

Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.

Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.

Spatial Data Structures Jason Goffeney, 4/26/2006 from Real Time Rendering.

Identifying Reversible Functions From an ROBDD Adam MacDonald.

PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET

Architecture Considerations for Tracing Incoherent Rays Timo Aila, Tero Karras NVIDIA Research.

Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.

SPATIAL DATA STRUCTURES Jon McCaffrey CIS 565. Goals  Spatial Data Structures (Construction esp.)  Why  What  How  Designing Algorithms for the GPU.

Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,

1 Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications Jeff Diamond 1, Martin Burtscher 2, John D. McCalpin.

On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.

By Mahmoud Moustafa Zidan Basic Sciences Department Faculty of Computer and Information Sciences Ain Shams University Under Supervision of Prof. Dr. Taymoor.

CS261 – Recitation 5 Fall Outline Assignment 3: Memory and Timing Tests Binary Search Algorithm Binary Search Tree Add/Remove examples 1.

RACBVHs: Random-Accessible Compressed Bounding Volume Hierarchies Tae-Joon Kim Bochang Moon Duksu Kim Sung-Eui Yoon KAIST (Korea Advanced Institute of.

Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.

Click to edit Master title style HCCMeshes: Hierarchical-Culling oriented Compact Meshes Tae-Joon Kim 1, Yongyoung Byun 1, Yongjin Kim 2, Bochang Moon.

Understanding the Efficiency of Ray Traversal on GPUs Timo Aila Samuli Laine NVIDIA Research.

Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)

IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.

Hierarchical Penumbra Casting Samuli Laine Timo Aila Helsinki University of Technology Hybrid Graphics, Ltd.

Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough.

© David Kirk/NVIDIA, Wen-mei W. Hwu, and John Stratton, ECE 498AL, University of Illinois, Urbana-Champaign 1 CUDA Lecture 7: Reductions and.

Photo-realistic Rendering and Global Illumination in Computer Graphics Spring 2012 Hybrid Algorithms K. H. Ko School of Mechatronics Gwangju Institute.

Ray Tracing II. HW1 Part A due October 10 Camera module Object module –Read from a file –Sphere and Light only Ray tracer module: –No shading. No reflection.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

Presented by Paul Phipps

Advanced topics Advanced Multimedia Technology: Computer Graphics Yung-Yu Chuang 2006/01/04 with slides by Brian Curless, Zoran Popovic, Mario Costa Sousa.

Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.

David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke

1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.

1 The Method of Precomputing Triangle Clusters for Quick BVH Builder and Accelerated Ray Tracing Kirill Garanzha Department of Software for Computers Bauman.

Layout by orngjce223, CC-BY Compact BVH StorageFabianowski ∙ Dingliana Compact BVH Storage for Ray Tracing and Photon Mapping Bartosz Fabianowski ∙ John.

Fabianowski · DinglianaInteractive Global Photon Mapping1 / 22 Interactive Global Photon Mapping Bartosz Fabianowski · John Dingliana Trinity College Dublin.

David Kauchak cs062 Spring 2010

Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad, India.

Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss

Henri Ylitie, Tero Karras, Samuli Laine

Two-Level Grids for Ray Tracing on GPUs

Real-Time Ray Tracing Stefan Popov.

External Methods Chapter 15 (continued)

Architecture Considerations for Tracing Incoherent Rays

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

ECE 498AL Lecture 15: Reductions and Their Implementation

A Small and Fast IP Forwarding Table Using Hashing

Presentation transcript:

Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees Tero Karras NVIDIA Research

Trees 2

Trees Path tracing Pharr & Humphreys Real-time ray tracing NVIDIA Collision detection Ageia Particle simulation NVIDIA Voxel-based global illumination Crassin et al. Surface reconstruction Amenta et al. Photon mapping Uchida BetterFaster 3

Outline  Fastest existing methods are sequential  Parallelize within each hierarchy level  But not between levels 4

Outline  Fastest existing methods are sequential  Parallelize within each hierarchy level  But not between levels  Lack of parallelism  Small workloads bottlenecked by top levels  Sub-linear scaling of performance 5

Outline  Novel way to build the entire tree in parallel  Two algorithmic “building blocks”  Fast, scalable 6

Outline  Novel way to build the entire tree in parallel  Two algorithmic “building blocks”  Fast, scalable  Main focus: BVHs  Point-based octrees and k-d trees also covered in the paper 7

Bounding volume hierarchy 8

9

LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes p p x = 0. p y = 0. p z =

0 1 0 LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes p p x = 0. p y = 0. p z =

LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes p p x = 0. p y = 0. p z = code = 12

LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes 13

LBVH - Lauterbach et al. [2009] 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes 14

Binary radix tree

Binary radix tree

Binary radix tree

Binary radix tree

Binary radix tree n n-1 19

Longest common prefix

Longest common prefix δ(5,6) =

Longest common prefix δ(5,6) = δ(0,3) = ? δ(0,3) =

Garanzha et al. [2011] Level 31 node Level 2 3 nodes Level 01 node Level 12 nodes

Our method

Our method  Define a numbering scheme for the nodes  Gain some knowledge of their identity  Establish a connection with the keys 25

Our method  Define a numbering scheme for the nodes  Gain some knowledge of their identity  Establish a connection with the keys  Find the children of a given node  Only look at node index and nearby keys 26

Our method  Define a numbering scheme for the nodes  Gain some knowledge of their identity  Establish a connection with the keys  Find the children of a given node  Only look at node index and nearby keys  Do this for all nodes in parallel 27

Numbering scheme

Numbering scheme

Numbering scheme

Numbering scheme

Algorithm

δ(2,3) = 4 Algorithm δ(3,4) = 0 33 δ(2,3) = 4 33

Algorithm δ(3,4) = 0 33 δ(2,3) = 4 δ(1,3) = 2 δ(0,3) = 2 34

Algorithm δ(0,3) = 2 δ(2,3) = 4δ(1,3) = 2 35

Algorithm δ(0,3) =

For each node i=0..n-2 in parallel: 1.Determine direction of the range 2.Expand the range as far as possible 3.Find where to split the range 4.Identify children Algorithm Binary search 37

Duplicate keys  The algorithm only works with unique keys  Duplicates are common in practice 38

Duplicate keys  The algorithm only works with unique keys  Duplicates are common in practice  Trick: Augment each key with its index  Distinguishes between duplicates  Keys are still in lexicographical order 39

Duplicate keys  The algorithm only works with unique keys  Duplicates are common in practice  Trick: Augment each key with its index  Distinguishes between duplicates  Keys are still in lexicographical order  Tie-break when evaluating δ(i,j) 40

LBVH 1.Assign Morton codes 2.Sort primitives 3.Generate hierarchy 4.Fit bounding boxes 41

Lauterbach et al. [2009] 42

Our method  Need a different approach  How many levels are there?  Which nodes are located on a given level? 43

Our method  Need a different approach  How many levels are there?  Which nodes are located on a given level?  Traverse paths in the tree in parallel  Start from leaves, advance toward the root  Terminate threads using per-node atomic flags 44

Our method 45

Results  Evaluate performance on GTX 480 (Fermi)  CUDA, 30-bit Morton codes 46

Results  Evaluate performance on GTX 480 (Fermi)  CUDA, 30-bit Morton codes  Compare against Garanzha et al. [2011]  Identical tree (top-level SAH splits disabled) 47

Results  Evaluate performance on GTX 480 (Fermi)  CUDA, 30-bit Morton codes  Compare against Garanzha et al. [2011]  Identical tree (top-level SAH splits disabled)  Simulate large GPUs  N times as many cores  N times the memory bandwidth 48

Results 49 Our method Garanzha et al. Fairy Forest 174K triangles milliseconds

Results 50 Our method Garanzha et al. Fairy Forest 174K triangles milliseconds

Results 51 Our method Turbine Blade 1.77M triangles milliseconds Garanzha et al.

Results 52 Stanford Dragon 871K triangles milliseconds Our method milliseconds

Acknowledgements  Timo Aila  Samuli Laine  David Luebke  Jacopo Pantaleoni  Jaakko Lehtinen For helpful suggestions and proofreading. 53

Thank You  Questions

Backup slides

Pseudocode 56

Results 57

Algorithm

Algorithm δ(1,2) = 2δ(2,3) = 4 59

δ(2,4) = 0 Algorithm δ(2,3) = 4 δ(1,2) =

δ(2,3) = 4 Algorithm

δ(2,3) = 4 Algorithm