Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of North Carolina-Chapel Hill
UNC Chapel Hill Avneesh Sud Goal Interactive Walkthrough of complex 3D environments at high fidelity –Models from CAD, VR –High primitive count –Heterogeneous geometry –Irregular distribution –No large occluders
UNC Chapel Hill Avneesh Sud DoubleEagle Tanker Model 82 million triangles 127,000 objects
UNC Chapel Hill Avneesh Sud SWITCH A parallel algorithm and system for interactive rendering of large complex environments Integrates Hierarchical LODs and conservative Occlusion Culling Generic models –No assumptions on model, distribution Computation done on GPUs
UNC Chapel Hill Avneesh Sud Previous Work Geometric Simplification Occlusion Culling Parallel Approaches Hybrid Approaches
UNC Chapel Hill Avneesh Sud Previous Work UNC MMR System [Aliaga99] –Used image based imposters, occlusion culling, LODs UNC GigaWalk [Baxter02] –Uses 2 graphics pipelines and multiple CPUs
UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work
UNC Chapel Hill Avneesh Sud Overview A parallel algorithm and system for interactive rendering of large complex environments Integrates Hierarchical LODs and conservative Occlusion Culling Parallel Occlusion Culling on separate GPUs Graphics hardware optimizations Low network bandwidth requirements General and automatic preprocessing algorithm
UNC Chapel Hill Avneesh Sud Overview: Parallel Occlusion Culling Two pass version of Hierarchical Z- Buffer [Greene93] Exploits temporal coherence Works on generic models, conservative to image precision Avoid readback by ‘switching’ between 2 GPUs
UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work
UNC Chapel Hill Avneesh Sud Scene Representation Computing appropriate spatial representation from a functional representation is non-trivial An object varies from a small bolt to a large pipe structure Redefine objects by partitioning and clustering
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization Object size Balanced trees Minimal bounding box overlap of sibling nodes SWITCH : A hybrid approach combining top- down partitioning with bottom-up clustering is used
UNC Chapel Hill Avneesh Sud Partitioning and Clustering Partitioning splits large objects into multiple objects –Do not split polygons Clustering groups objects with low polygon counts based on spatial proximity The combination redistributes geometry with good localization and object size
UNC Chapel Hill Avneesh Sud Partitioning & Clustering: Results Powerplant: Original Objects Powerplant: Clustered Objects
UNC Chapel Hill Avneesh Sud Partitioning & Clustering: Results DoubleEagle: Original Objects DoubleEagle: Clustered Objects
UNC Chapel Hill Avneesh Sud Unified Hierarchy Objects are organized into a scene graph hierarchy Single unified hierarchy used for occlusion culling and LOD-based rendering –Low storage overhead –Simple conservative occlusion culling algorithm SWITCH: A top-down AABB bounding volume hierarchy is constructed from redefined objects
UNC Chapel Hill Avneesh Sud HLOD Generation Construct Hierarchical LODs of the AABB scene graph as in [Erikson01] Use GAPS simplification algorithm [Erikson99] HLOD generation is done out-of-core –Store only the LODs of current node and immediate children in main memory
UNC Chapel Hill Avneesh Sud Hierarchical Occluders A hierarchical occluder associated with a node is an approximation of the group of occluders in its subtree HLODs provide an lower polygon count approximation of a group of occluders – serve as hierarchical occluders Perform object space occluder fusion Conservative occlusion culling
UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work
UNC Chapel Hill Avneesh Sud Parallel Algorithm and Architecture Three processes in parallel 1.Occluder Rendering (OR): Renders occluder set to depth buffer on GPU1 2.Hardware Culling (HC): Computes visible geometry using hardware occlusion query on GPU2 3.Render Visible Geometry (RVG): Renders visible geometry on GPU3
UNC Chapel Hill Avneesh Sud GPU 1 GPU 2 GPU 3 Display Geometry For Frame i RVG Render Occluders For Frame i+1 OR Hardware Cull For Frame i HC Hardware Cull For Frame i+1 HC Hardware Cull For Frame i+2 HC Render Occluders For Frame i+2 OR Render Occluders For Frame i+3 OR Display Geometry For Frame i+1 RVG Display Geometry For Frame i+2 RVG Frame iFrame i+1Frame i+2 System Timing/Data Flow Z-Buffer SWITCH
UNC Chapel Hill Avneesh Sud Conservative Occlusion Culling Underlying HZB algorithm used for occlusion culling is conservative to image precision Exactly same set of LODs is used for both OR and STC stages –Z buffer used for culling is consistent with the geometry
UNC Chapel Hill Avneesh Sud Hardware Culling Use GL_NV_OCCLUSION_QUERY to determine visible pixels Traverse scene hierarchy rendering bounding boxes of nodes
UNC Chapel Hill Avneesh Sud LOD Selection Pixel Error Metric: Max normal deviation of silhouette in image Traverse down scene graph till error satisfied Upper Bound: Highly conservative DE Engine Room 1K x 1K, 20 PEError Image
UNC Chapel Hill Avneesh Sud GPU Optimizations Multiple Occlusion Tests –Occlusion Query ‘counter’ for each node –Traverse scene graph bread first –Bunch queries for all nodes at a level –40% faster than testing one node with GL_HP_OCCLUSION_TEST
UNC Chapel Hill Avneesh Sud GPU Optimizations Visibility for LOD selection –Visible pixels of bounding box > visible pixels of geometry –No. of visible pixels less than error metric => early termination condition –Provides looser bounds – reduces polygon count
UNC Chapel Hill Avneesh Sud Bandwidth Requirements
UNC Chapel Hill Avneesh Sud Load Balancing Trade off between cluster size and culling efficiency Smaller clusters lead to deeper scene graph but improve culling performance Balances load between culling and rendering
UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work
UNC Chapel Hill Avneesh Sud Implementation 3 Dell Precision Workstations with dual 2GHz Pentium4 CPUs, GeForce4 GPU, and 2GB main memory Network: –Implementation 1 : TCP/IP over 100Mbps Fast Ethernet –Implementation 2: TCP/IP over Myrinet
UNC Chapel Hill Avneesh Sud Test Model: Powerplant Original 0.5 Gigabyte dataset 13 Million Polygons 1200 objects Preprocessing 7 hours 1.2 Gigabytes 13 Million Polygons 38,000 objects
UNC Chapel Hill Avneesh Sud Test Model: DoubleEagle Tanker Original 4 Gigabyte dataset 82 Million polygons 127,000 objects Preprocessing 34 hours 8 Gigabytes 82 Million polygons 61,000 objects
UNC Chapel Hill Avneesh Sud Video
UNC Chapel Hill Avneesh Sud Video
UNC Chapel Hill Avneesh Sud Results: Frame Rate Powerplant Model 1024 x 1024 with 10 pixels of error using Ethernet
UNC Chapel Hill Avneesh Sud Results: Frame Rate DoubleEagle Model 1024 x 1024 with 20 pixels of error using Ethernet
UNC Chapel Hill Avneesh Sud Results: Culling Performance DoubleEagle Tanker Model: Object Count
UNC Chapel Hill Avneesh Sud Results: Culling Performance DoubleEagle Tanker Model: Polygon Count
UNC Chapel Hill Avneesh Sud Conclusions Able to interactively render large complex environments with good fidelity Integrates LODs and Occlusion Culling in a general, automatic parallel rendering algorithm A parallel architecture to balance load between 3 GPUs Efficient use of graphics hardware to solve geometric queries A unified scene hierarchy and automatic preprocessing for a generic model Introduces an end-to-end latency of 1 frame
UNC Chapel Hill Avneesh Sud Lessons Learned Parallelism –2 pipelines provide a speedup greater than factor of 2 for complex scenes Load Times –Asynchronous on-demand loading of geometry vastly improves system development and testing
UNC Chapel Hill Avneesh Sud Limitations and Future Work Static LODs lead to popping. Extend to a view-dependent framework An out-of-core algorithm to reduce main memory overhead as in [Varadhan02] Improve performance by reducing network latencies Make more novel uses of graphics hardware Target frame-rate rendering mode Drive large immersive displays
UNC Chapel Hill Avneesh Sud Wish List Multiple graphics cards on one motherboard NV_OCCLUSION_QUERY to also return completely visible / partially visible / completely occluded
UNC Chapel Hill Avneesh Sud Association with NVIDIA Obtained pre-release versions of drivers with NV_OCCLUSION_QUERY Addressed NV_OCCLUSION_QUERY bug in Linux drivers fast
UNC Chapel Hill Avneesh Sud Acknowledgements US ONR US ARO US DOE US NSF NVIDIA Corporation Intel Corporation NNS for the DoubleEagle model UNC Walkthrough group
The End
UNC Chapel Hill Avneesh Sud (a) Original d (b) Partitioned-I 3b 3a 3c 3e 2 (c) Clustered 2*2* 1*1* (d) Partitioned-II 2*a2*a 2*b2*b Hierarchy Generation (e) Compute a top-down AABB tree hierarchy on redefined objects
UNC Chapel Hill Avneesh Sud Performance Tuning Using visible geometry from 2 frames previous avoids bubbles in pipeline Tradeoff between fidelity and frame rate by adjusting pixels of error Asynchronous rendering pipeline Nth farthest Z buffer values Lower HZB resolution for occluder rendering
UNC Chapel Hill Avneesh Sud Video
UNC Chapel Hill Avneesh Sud Results: Frame Rate Powerplant Model 640 x 480 with 10 pixels of error on SGI
UNC Chapel Hill Avneesh Sud Results: Culling Performance Powerplant Model : Object Count
UNC Chapel Hill Avneesh Sud Results: Culling Performance Powerplant Model : Polygon Count
UNC Chapel Hill Avneesh Sud Outline Previous Work SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work
UNC Chapel Hill Avneesh Sud Previous Work: Geometric Simplification Surveyed in [Leubke01] Static Vs View-Dependent Trouble with high depth complexity
UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms
UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments –Cells and Portals [Airey90] –Urban Datasets [Wonka00, Coorg97] –Large Occluders [Schaufler00] General Algorithms
UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms –HZB [Greene93], HOM [Zhang97]
UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms Performing exact visibility on large general datasets in real time is difficult Trouble with highly tessellated scenes
UNC Chapel Hill Avneesh Sud Previous Work: Parallel Approaches Object-Parallel, Screen-Parallel, Frame- Parallel Interactive ray tracing [Wald01] Perform culling in parallel with rendering –VFC in [Garlick90] –Occlusion Culling by occluder shrinking in [Wonka01] Scalable clusters, WireGL [Humphreys01]
UNC Chapel Hill Avneesh Sud Previous Work: Hybrid Approaches Combine LOD and Occlusion Culling techniques –UC Berkeley Walkthrough [Funkhouser96] –Synthetic convex occluders [Andujar01] –Approximate visibility using prioritized layer projections with view dependent rendering [ElSana01] –UNC MMR system [Aliaga99] Not demonstrated in high fidelity on complex CAD models
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Object size –Too large : loose bounding boxes, poor culling performance –Too small : very deep trees
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Balanced trees
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Balanced trees
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Minimal bounding box overlap of sibling nodes
UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Minimal bounding box overlap of sibling nodes
UNC Chapel Hill Avneesh Sud Clustering Clustering algorithm adapted from an image segmentation technique [FH98] MST’s to represent clusters Similar to Kruskal’s algorithm –Euclidean distance between clusters denotes edge weights –Edge weights represent variation in a cluster –2 clusters combined based on Hausdorff metric