Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.

Slides:



Advertisements
Similar presentations
Sven Woop Computer Graphics Lab Saarland University
Advertisements

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
L15: Tree-Structured Algorithms on GPUs CS6963L15: Tree Algorithms.
Two-Level Grids for Ray Tracing on GPUs
RT06 conferenceVlastimil Havran On the Fast Construction of Spatial Hierarchies for Ray Tracing Vlastimil Havran 1,2 Robert Herzog 1 Hans-Peter Seidel.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
K-structure, Separating Chain, Gap Tree, and Layered DAG Presented by Dave Tahmoush.
Paper Presentation - An Efficient GPU-based Approach for Interactive Global Illumination- Rui Wang, Rui Wang, Kun Zhou, Minghao Pan, Hujun Bao Presenter.
Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Numerical geometry of non-rigid shapes
RT 08 Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Symposium on Interactive Ray Tracing 2008 Los Angeles, California Kirill Garanzha.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Fast Isocontouring For Improved Interactivity Chandrajit L. Bajaj Valerio Pascucci Daniel R. Schikore.
Fast Agglomerative Clustering for Rendering Bruce Walter, Kavita Bala, Cornell University Milind Kulkarni, Keshav Pingali University of Texas, Austin.
Heapsort CIS 606 Spring Overview Heapsort – O(n lg n) worst case—like merge sort. – Sorts in place—like insertion sort. – Combines the best of both.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Institute of C omputer G raphics, TU Braunschweig Hybrid Scene Structuring with Application to Ray Tracing 24/02/1999 Gordon Müller, Dieter Fellner 1 Hybrid.
PRESENTED BY – GAURANGI TILAK SHASHANK AGARWAL Collision Detection.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
Efficiency of Alignment-based algorithms B.F. van Dongen Laziness! (Gu)estimation! Implementation effort?
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
A Study of Balanced Search Trees: Brainstorming a New Balanced Search Tree Anthony Kim, 2005 Computer Systems Research.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Stefan Popov Space Subdivision for BVHs Stefan Popov, Iliyan Georgiev, Rossen Dimov, and Philipp Slusallek Object Partitioning Considered Harmful: Space.
Hierarchical Penumbra Casting Samuli Laine Timo Aila Helsinki University of Technology Hybrid Graphics, Ltd.
Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough.
Ray Tracing Animated Scenes using Motion Decomposition Johannes Günther, Heiko Friedrich, Ingo Wald, Hans-Peter Seidel, and Philipp Slusallek.
Interactive Ray Tracing of Dynamic Scenes Tomáš DAVIDOVIČ Czech Technical University.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Algorithms 2005 Ramesh Hariharan. Divide and Conquer+Recursion Compact and Precise Algorithm Description.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.
Compact, Fast and Robust Grids for Ray Tracing
Exploiting Multithreaded Architectures to Improve Data Management Operations Layali Rashid The Advanced Computer Architecture U of C (ACAG) Department.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2015.
CSE554Contouring IISlide 1 CSE 554 Lecture 3: Contouring II Fall 2011.
CSE554Contouring IISlide 1 CSE 554 Lecture 5: Contouring (faster) Fall 2013.
David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Visibility-Driven View Cell Construction Oliver Mattausch, Jiří Bittner, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University.
Algorithmic complexity: Speed of algorithms
STBVH: A Spatial-Temporal BVH for Efficient Multi-Segment Motion Blur
Two-Level Grids for Ray Tracing on GPUs
Real-Time Ray Tracing Stefan Popov.
Query Processing in Databases Dr. M. Gavrilova
Spatial Online Sampling and Aggregation
CPSC 531: System Modeling and Simulation
Communication and Memory Efficient Parallel Decision Tree Construction
B+-Trees and Static Hashing
Hash-Based Indexes Chapter 10
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Algorithmic complexity: Speed of algorithms
Algorithmic complexity: Speed of algorithms
Chapter 11 Instructor: Xin Zhang
Presentation transcript:

Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek

Streaming Construction of KD TreesStefan Popov Motivation  Large speed-up of ray tracing lately  Better algorithms (packet tracing [Wald04, Reshetov05] )  Optimized spatial index structures  Best known: KD trees [Havran00]  Faster hardware  Research concentrated mainly on static scenes  Dynamic scenes  Building – slow for SAH based KD trees  Done in a pre-processing step

Streaming Construction of KD TreesStefan Popov Dynamic Scenes Approaches  Embed dynamics in the index structure  Use a two level approach [Wald 03 ]  Fuzzy KD trees [Günther06]  Update index structure  Grids, BVHs and KD tree hybrids  Faster build/update  Lower traversal performance  No efficient approach for KD trees  Rebuild entire KD tree  Need to make it fast  Lazy build

Streaming Construction of KD TreesStefan Popov SAH Algorithm  Extract & sort events in advance  Abstract objects with AABBs  Events given by AABB boundaries  Recursive top-down construction  Find split plane using SAH  Compute minimum cost  Distribute objects to children  By distributing the events  Keep them sorted

Streaming Construction of KD TreesStefan Popov SAH Cost Function   Piecewise linear  Discontinuities at object boundaries  Evaluate only before opening and after closing event

Streaming Construction of KD TreesStefan Popov Distribution Along the Split Axis  Given: event list & split position  Sweep event list and classify  Open event  Before split  label object “both”  After split  label object “right”  Close event  Before split  re-label object “left”  Copy event to corresponding child’s list  Might have to insert new events  Random memory access

Streaming Construction of KD TreesStefan Popov Distribution Along the Other Axes  Sweep event lists. Copy event to  Left, if corresponding object labeled “left” or “both”  Right, if corresponding object labeled “right” or “both”  Look up in object array  Random memory access

Streaming Construction of KD TreesStefan Popov Problems of KD Tree Construction  Random memory accesses  Expensive cost function evaluation  Initial sorting – inefficient for lazy builds

Streaming Construction of KD TreesStefan Popov Streaming Algorithm Overview  Work with unsorted lists of AABBs  Avoid initial sorting  Sweep list once to locate initial split plane  In a single sweep  Distribute objects (straightforward)  Determine split positions of children  Once data fits in caches, switch to conventional build

Streaming Construction of KD TreesStefan Popov SAH Cost Estimation  Cost function typically varies only slowly  No need to evaluate SAH at every event  Use sampling!  Naïve approach  For every event: check all samples  O(kN)  How to sample efficiently?

Streaming Construction of KD TreesStefan Popov Efficient Sampling  Two step approach  #Objects to left of sample = #Opening events to its left  #Objects to right of sample = #Closing events to its right  Count opening/closing events between samples  Regular sampling  index computation in O(1)  Reconstruct left/right object counts at samples  Using two partial sums from left and right  O(k+N)

Streaming Construction of KD TreesStefan Popov Refining of Samples  SAH – sum of two monotone functions – C l and C r  Cost between two samples a < b is bounded from below  C  C min = min(C l ) + min(C r ) = C l (a) + C r (b)  Resample areas where C min < current minimum  Typically only few intervals need to be re-sampled (< 5%)

Streaming Construction of KD TreesStefan Popov Algorithm properties  Streaming memory accesses  SAH cost function estimated by sampling  No initial sorting required  Refining of Samples

Streaming Construction of KD TreesStefan Popov Improvements  Conventional Algorithm  Use radix sort – O(N)  Fastest algorithm if data set fits into caches  No need to order events at same position  Count opening/closing events instead  Removes one radix sort pass  Multiple cores  parallelize build  Most time spent in the lower tree levels  One sub-tree  one core

Streaming Construction of KD TreesStefan Popov Results  Speed-up up to 50%  Only effective in the upper levels  Limited by copying of object/events  The larger the scene, the higher the speedup  Performance independent of triangle order  Small decrease in traversal performance (< 2%)  With 1024 samples  Multi-threading  4 cores (no local memory management)

Streaming Construction of KD TreesStefan Popov Future Work  Fully multi-threaded implementation  Carefully memory management on NUMA architectures  Extend to other spatial index structures  BVHs, BKD trees, SKD trees, …

Streaming Construction of KD TreesStefan Popov Conclusion  Streaming construction algorithm  50% speedup  Cost function sampling  Very low quality degradation  Refining of samples

Streaming Construction of KD TreesStefan Popov Thank you!

Streaming Construction of KD TreesStefan Popov Advantages  Sequential memory access in the upper levels  Small data foot print in conventional build  Fits in caches  Radix sort is efficient  Less computations needed for split plane position estimation  But, what about the tree cost?

Streaming Construction of KD TreesStefan Popov Memory Managment  Use two arrays and alternate them

Streaming Construction of KD TreesStefan Popov SAH tree cost  Optimal KD tree for ray tracing  SAH based  Minimize average expected traversal cost of an arbitrary ray  

Streaming Construction of KD TreesStefan Popov SAH computation  Efficient computation – extract & sort events in advance  Compute incrementally. Keep track of objects on left/right  Evaluate after close, before an open events

Streaming Construction of KD TreesStefan Popov Alternative Multi-Threading  required on NUMA architectures)  Sub-tree  core not suitable for the first log(#cores) levels  Also unsuitable for some architecture (Cell)  Alternative  Bring data to cores from sequential pages  Gather event counts in bins at each core  Merge counts before actual cost evaluation

Streaming Construction of KD TreesStefan Popov Extension: Multi-Threading  Multiple cores  parallelize build  Most time spent in the lower tree levels  One sub-tree  one core