Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek
Streaming Construction of KD TreesStefan Popov Motivation Large speed-up of ray tracing lately Better algorithms (packet tracing [Wald04, Reshetov05] ) Optimized spatial index structures Best known: KD trees [Havran00] Faster hardware Research concentrated mainly on static scenes Dynamic scenes Building – slow for SAH based KD trees Done in a pre-processing step
Streaming Construction of KD TreesStefan Popov Dynamic Scenes Approaches Embed dynamics in the index structure Use a two level approach [Wald 03 ] Fuzzy KD trees [Günther06] Update index structure Grids, BVHs and KD tree hybrids Faster build/update Lower traversal performance No efficient approach for KD trees Rebuild entire KD tree Need to make it fast Lazy build
Streaming Construction of KD TreesStefan Popov SAH Algorithm Extract & sort events in advance Abstract objects with AABBs Events given by AABB boundaries Recursive top-down construction Find split plane using SAH Compute minimum cost Distribute objects to children By distributing the events Keep them sorted
Streaming Construction of KD TreesStefan Popov SAH Cost Function Piecewise linear Discontinuities at object boundaries Evaluate only before opening and after closing event
Streaming Construction of KD TreesStefan Popov Distribution Along the Split Axis Given: event list & split position Sweep event list and classify Open event Before split label object “both” After split label object “right” Close event Before split re-label object “left” Copy event to corresponding child’s list Might have to insert new events Random memory access
Streaming Construction of KD TreesStefan Popov Distribution Along the Other Axes Sweep event lists. Copy event to Left, if corresponding object labeled “left” or “both” Right, if corresponding object labeled “right” or “both” Look up in object array Random memory access
Streaming Construction of KD TreesStefan Popov Problems of KD Tree Construction Random memory accesses Expensive cost function evaluation Initial sorting – inefficient for lazy builds
Streaming Construction of KD TreesStefan Popov Streaming Algorithm Overview Work with unsorted lists of AABBs Avoid initial sorting Sweep list once to locate initial split plane In a single sweep Distribute objects (straightforward) Determine split positions of children Once data fits in caches, switch to conventional build
Streaming Construction of KD TreesStefan Popov SAH Cost Estimation Cost function typically varies only slowly No need to evaluate SAH at every event Use sampling! Naïve approach For every event: check all samples O(kN) How to sample efficiently?
Streaming Construction of KD TreesStefan Popov Efficient Sampling Two step approach #Objects to left of sample = #Opening events to its left #Objects to right of sample = #Closing events to its right Count opening/closing events between samples Regular sampling index computation in O(1) Reconstruct left/right object counts at samples Using two partial sums from left and right O(k+N)
Streaming Construction of KD TreesStefan Popov Refining of Samples SAH – sum of two monotone functions – C l and C r Cost between two samples a < b is bounded from below C C min = min(C l ) + min(C r ) = C l (a) + C r (b) Resample areas where C min < current minimum Typically only few intervals need to be re-sampled (< 5%)
Streaming Construction of KD TreesStefan Popov Algorithm properties Streaming memory accesses SAH cost function estimated by sampling No initial sorting required Refining of Samples
Streaming Construction of KD TreesStefan Popov Improvements Conventional Algorithm Use radix sort – O(N) Fastest algorithm if data set fits into caches No need to order events at same position Count opening/closing events instead Removes one radix sort pass Multiple cores parallelize build Most time spent in the lower tree levels One sub-tree one core
Streaming Construction of KD TreesStefan Popov Results Speed-up up to 50% Only effective in the upper levels Limited by copying of object/events The larger the scene, the higher the speedup Performance independent of triangle order Small decrease in traversal performance (< 2%) With 1024 samples Multi-threading 4 cores (no local memory management)
Streaming Construction of KD TreesStefan Popov Future Work Fully multi-threaded implementation Carefully memory management on NUMA architectures Extend to other spatial index structures BVHs, BKD trees, SKD trees, …
Streaming Construction of KD TreesStefan Popov Conclusion Streaming construction algorithm 50% speedup Cost function sampling Very low quality degradation Refining of samples
Streaming Construction of KD TreesStefan Popov Thank you!
Streaming Construction of KD TreesStefan Popov Advantages Sequential memory access in the upper levels Small data foot print in conventional build Fits in caches Radix sort is efficient Less computations needed for split plane position estimation But, what about the tree cost?
Streaming Construction of KD TreesStefan Popov Memory Managment Use two arrays and alternate them
Streaming Construction of KD TreesStefan Popov SAH tree cost Optimal KD tree for ray tracing SAH based Minimize average expected traversal cost of an arbitrary ray
Streaming Construction of KD TreesStefan Popov SAH computation Efficient computation – extract & sort events in advance Compute incrementally. Keep track of objects on left/right Evaluate after close, before an open events
Streaming Construction of KD TreesStefan Popov Alternative Multi-Threading required on NUMA architectures) Sub-tree core not suitable for the first log(#cores) levels Also unsuitable for some architecture (Cell) Alternative Bring data to cores from sequential pages Gather event counts in bins at each core Merge counts before actual cost evaluation
Streaming Construction of KD TreesStefan Popov Extension: Multi-Threading Multiple cores parallelize build Most time spent in the lower tree levels One sub-tree one core