1 The Method of Precomputing Triangle Clusters for Quick BVH Builder and Accelerated Ray Tracing Kirill Garanzha Department of Software for Computers Bauman Moscow State Technical University Tech. talk at University of Nizhniy Novgorod
2 Current Ray Tracing challenges Faster rendering in large dynamic scenes with complex motion (we do the step here) Better algorithmic time complexity for tracing incoherent GI rays ( now it is O(n), where n = # of rays ) Better algorithmic time complexity for tracing incoherent GI rays ( now it is O(n), where n = # of rays ) Shading computation should be coherent and fast Shading computation should be coherent and fast Important note: all proposed algorithms should be mapped efficiently on current or upcoming hardware
3 Problem with dynamic scenes ……… … Acceleration Structure (BVH, kd-tree, grid, …) for dynamic scene primitives (triangles, …) should be rebuilt in every frame of animation. The time complexity of AS builder is O(N log N) for hierarchies and O(N) for grids, where N – is usually the number of triangles. log N N – the number of objects being repartitioned High quality O(N log N) acceleration structure builder for N > 10 5 is not yet real-time on desktop PCs
4 Solution: triangle pre-clustering can be used for efficiency purposes [Garland et al 2001], [Sander et al 2001, 2007], [Lauterbach et al 2008]. Triangle clusters can be used for efficiency purposes [Garland et al 2001], [Sander et al 2001, 2007], [Lauterbach et al 2008]. groups of connected triangles remain connected throughout the course of animation. We assume that groups of connected triangles remain connected throughout the course of animation. once considering the geometry of the first animation frame (every cluster contains ~10 connected triangles). We precompute densely packed triangle clusters once considering the geometry of the first animation frame (every cluster contains ~10 connected triangles). to build the BVH quickly in every frame of animation (~10x faster than base ‘brute-force’ builder). We use the clusters’ AABBs to build the BVH quickly in every frame of animation (~10x faster than base ‘brute-force’ builder).. Exploding simulations can be supported. ( we utilize shallow SAH-BVHs, SIMD, Vertex Culling [Reshetov 2007] and constant connectivity within a cluster ). Packet ray tracing performance is not sacrificed ( we utilize shallow SAH-BVHs, SIMD, Vertex Culling [Reshetov 2007] and constant connectivity within a cluster ).
5 Clustering heuristic In order to build high-quality SAH-based BVH in every frame of animation 3 requirements are considered during cluster precomputation: The shape of the cluster should be similar to a sphere or disk The density of triangles’ connectivity within each cluster should be high Geometric size of a cluster should be limited
6 Sphere/disk shape requirement that an arbitrary ray intersects the convex spatial region Y assuming it intersects the convex region X: Geometric probability P(Y | X) that an arbitrary ray intersects the convex spatial region Y assuming it intersects the convex region X: where SA(X) is the surface area of X Arbitrary ray In a BVH region X corresponds to the AABB of the leaf-node Region Y corresponds to a primitive within the leaf The higher value of P(Y | X) corresponds to the better ray-hit probability within a leaf and early ray traversal termination For arbitrary oriented triangle in 3D-space:
7 Sphere/disk shape requirement It is beneficial to precompute the cluster with a sphere/disk- like shape as the value of P(Y | X) is higher for arbitrary oriented spheres or disks than for triangles or rectangles. For arbitrary oriented sphere or disk in 3D-space:
8 Density of connectivity requirement The density of triangles connectivity within a cluster should be as high as possible ( the value of VerticesCount / TrianglesCount should be lower ) This is done in order to reduce the probability of cluster shape disruption during the course of vertex animation ‘bad’ cluster ‘good’ cluster
9 Geometric size limitation requirement If the geometric size of a cluster is not limited during cluster generation then AABBs of ‘big’ and ‘small’ clusters may overlap significantly (ray tracing slowdown) If the geometric size of a cluster limited then the probability of such overlaps is reduced
10 Clustering heuristic formula The set of k of triangles is accepted to form a cluster if Acc(k) > 0: S(k) – bounding sphere for k triangles SA(X) is the surface area of X CountDistinctVertices(k) is the number of distinct vertex indices within the cluster AvgSA is the surface area for the average triangle within the 3D-model Heuristic parameters: MaxSize / MaxCount – the rough desired cluster size / the number of vertices within the cluster where n i is the normal for i-th triangle
11 Clustering: iterative contraction Dual-graph for the mesh of triangles is created [Garland et al 2001]: At every iteration step the dual- graph edge of max(Acc) is contracted For every dual-graph edge Acc(k1 + k2) is assigned. Acc evaluates the possible merging of k1 and k2 clusters Iterative contraction continues while Iterative contraction continues while max( Acc ) > 0
12 Clustering: iterative growing At every iteration step for the cluster of k-1 triangles a new triangle is added that corresponds to the max( Acc(k) ) Cluster growing continues while Acc(k) > 0 For some clusters there may be no available ‘building material’ that was occupied earlier. This method consumes less memory than iterative clustering with dual-graph.
13 Clustering example Triangles Iterative growing result Iterative contraction result
14 Clustering example ‘Easy’ model for clustering ‘Hard’ model (the sizes of triangles vary significantly)
15 Acceleration structure builder … Every cluster contains the list of densely packed triangles and the list of distinct vertex indices In every frame of animation for each cluster the AABB is computed based on new vertex 3D positions In every frame of animation the AABBs of all clusters are used as the input set of acceleration structure builder (BVH, [Wald 2007]) Every leaf may contain a few clusters if they are proximate or contain the sum of triangles that is less than some threshold (e.g. 32 triangles) No branches within the clusters For such shallow trees the packet ray tracer, SIMD instructions, Vertex Culling [Reshetov 2007] are used
16 Useful constant connectivity V0V0 V1V1 V2V2 V3V3 V4V4 V5V5 V6V6 V0V0 V1V1 V2V2 V3V3 V6V6 V4V4 V5V5 T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 V0V0 V1V1 V2V2 T0T0 V1V1 V2V2 V3V3 T1T1 V2V2 V3V3 V5V5 T2T2 V2V2 V4V4 V5V5 T3T3 V2V2 V4V4 V6V6 T4T4 V0V0 V2V2 V6V6 T5T5 VP 0 VP 1 VP 2 VP 3 VP 4 VP 5 VP T0T0 123 T1T1 235 T2T2 245 T3T3 246 T4T T5T5 unsigned int * Triangles → unsigned char * TriCompressed → Old triangles storage: unsigned int * ClusterVertexGlobalIndices → VERTEX * VCache → Vertex 3D positions are gathered from the global array to the vertex cache based on vertex indices within the cluster The cluster of triangles with no more than 256 vertices CullC 0 CullC 1 CullC 2 CullC 3 CullC 4 CullC 5 CullC 6 int * VCullCodes → Bit-codes for amortized Vertex Frustum culling tests New triangles storage: Vertex prefetching: References to the Vertex Cache
17 UNC Exploding Dragon (252K triangles, 192K vertices): Image 1024 × 1024 (Core 2 Duo Heuristic parameters (each cluster size): MaxSize = MaxCount = X Ray Tracing time / the clusters produced
18 Utah Fairy Forest (174K triangles, 97K vertices): Image 1024 × 1024 (Core 2 Duo Heuristic parameters (each cluster size): MaxSize = MaxCount = X Ray Tracing time / the clusters produced
19 BVH-quality evaluation for animation frames Detailed comparison factors for the BVHs produced by using Acc(k) 0,0 (no clustering) and Acc(k) 50,50 (MaxSize = MaxCount = 50): Factor = (RT time for Acc(k) 0,0 ) / (RT time for Acc(k) 50,50 ) RT – 2-bounce reflections (everything is reflective) Factor > 1 denotes the higher quality of BVH produced by Acc(k) 50,50
20 Method advantages The method is applicable for scenes where triangles maintain the constant connectivity. Even exploding simulations can be programmed with this. The method is applicable for scenes where triangles maintain the constant connectivity. Even exploding simulations can be programmed with this. It is possible to precompute the ‘best’ possible set of clusters that are applicable for accelerated ray tracing in dynamic scenes It is possible to precompute the ‘best’ possible set of clusters that are applicable for accelerated ray tracing in dynamic scenes For the clusters of reasonable sizes ray tracing timings are not affected and the BVH-builder is accelerated For the clusters of reasonable sizes ray tracing timings are not affected and the BVH-builder is accelerated Constant connectivity within a cluster is useful for vertex prefetching and reduced ray-triangle intersection computations Constant connectivity within a cluster is useful for vertex prefetching and reduced ray-triangle intersection computations
21 Method limitations Explicit: this method is an overhead for 3D-models without connected triangles (where ) Explicit: this method is an overhead for 3D-models without connected triangles (where VerticesCount / TrianglesCount = 3) Implicit: the size and regularity of triangles within a cluster produced should remain reasonably coherent through out the course of animation. Implicit: the size and regularity of triangles within a cluster produced should remain reasonably coherent through out the course of animation. Ray tracing performance is likely to be affected if all the clusters undergo severe stretching Ray tracing performance is likely to be affected if all the clusters undergo severe stretching
22 Plans for future Probably: implementation of Oriented Bounding Boxes for BVH leaf-nodes and AABBs for inner-nodes Probably: implementation of Oriented Bounding Boxes for BVH leaf-nodes and AABBs for inner-nodes Probably: implementation of asynchronous repair for disrupted clusters Probably: implementation of asynchronous repair for disrupted clusters Support for smooth surface primitives Support for smooth surface primitives R&D in direction of packet based Path Tracing and other GI algorithms for dynamic scenes R&D in direction of packet based Path Tracing and other GI algorithms for dynamic scenes
23 Demo on Core 2 Quad 2.4GHz