Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of.

Slides:



Advertisements
Similar presentations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware.
Advertisements

Sven Woop Computer Graphics Lab Saarland University
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.
Introduction to Massive Model Visualization Patrick Cozzi Analytical Graphics, Inc.
Visibility in Games Harald Riegler. 2 / 18 Visibility in Games n What do we need it for? u Increase of rendering speed by removing unseen scene data from.
Occlusion Culling Fall 2003 Ref: GamasutraGamasutra.
Multi-Layered Impostors for Accelerated Rendering Xavier Decoret, iMAGIS This is joint work with Gernot Schaufler and Julie Dorsey at MIT and François.
View-Dependent Simplification of Arbitrary Polygonal Environments David Luebke.
HLODs: Hierarchical Levels of Detail Hierarchical Simplifications for Faster Display of Massive Geometric Environments Carl Erikson, Dinesh Manochahttp://
Visibility Culling. Back face culling View-frustrum culling Detail culling Occlusion culling.
Visibility Culling using Hierarchical Occlusion Maps Hansong Zhang, Dinesh Manocha, Tom Hudson, Kenneth E. Hoff III Presented by: Chris Wassenius.
Haptic Rendering using Simplification Comp259 Sung-Eui Yoon.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Chapter 6: Vertices to Fragments Part 2 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
Tomas Mőller © 2000 Speeding up your game The scene graph Culling techniques Level-of-detail rendering (LODs) Collision detection Resources and pointers.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Quick-VDR: Interactive View-Dependent Rendering of Massive Models Sung-Eui Yoon Brian Salomon Russell Gayle.
Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.
Interactive Shadow Generation in Complex Environments Naga K. Govindaraju, Brandon Lloyd, Sung-Eui Yoon, Avneesh Sud, Dinesh Manocha Speaker: Alvin Date:
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
Visualization and graphics research group CIPIC January 21, 2003Multiresolution (ECS 289L) - Winter Dynamic View-Dependent Simplification for Polygonal.
Bounding Volume Hierarchies and Spatial Partitioning Kenneth E. Hoff III COMP-236 lecture Spring 2000.
1 From-Point Occlusion Culling From-Point Occlusion Culling Chapter 23.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Collision Detection for Deformable Objects Xin Huang 16/10/2007.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
1 A Novel Page-Based Data Structure for Interactive Walkthroughs Behzad Sajadi Yan Huang Pablo Diaz-Gutierrez Sung-Eui Yoon M. Gopi.
Hidden Surface Removal
Afrigraph 2004 Massive model visualization Tutorial A: Part I Rasterization Based Approaches Andreas Dietrich Computer Graphics Group, Saarland University.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Realtime Caustics using Distributed Photon Mapping Johannes Günther Ingo Wald * Philipp Slusallek Computer Graphics Group Saarland University ( * now at.
The Visibility Problem In many environments, most of the primitives (triangles) are not visible most of the time –Architectural walkthroughs, Urban environments.
Culling Techniques “To cull” means “to select from group” In graphics context: do not process data that will not contribute to the final image The “group”
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Computer Graphics 2 Lecture 8: Visibility Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Visibility Culling III: Image-Space Occlusion David Luebke Computer Science Department University of Virginia
Real-Time Rendering SPEEDING UP RENDERING Lecture 04 Marina Gavrilova.
Quick-CULLIDE: Efficient Inter- and Intra- Object Collision Culling using Graphics Hardware Naga K. Govindaraju, Ming C. Lin, Dinesh Manocha University.
Interactive Visualization of Exceptionally Complex Industrial CAD Datasets Andreas Dietrich Ingo Wald Philipp Slusallek Computer Graphics Group Saarland.
Real-time Graphics for VR Chapter 23. What is it about? In this part of the course we will look at how to render images given the constrains of VR: –we.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
1Computer Graphics Implementation II Lecture 16 John Shearer Culture Lab – space 2
Implementation II Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico.
Implementation II.
David Luebke11/26/2015 CS 551 / 645: Introductory Computer Graphics David Luebke
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
Recap: General Occlusion Culling l When cells and portals don’t work… –Trees in a forest –A crowded train station l Need general occlusion culling algorithms:
Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna.
Graphics Graphics Korea University cgvr.korea.ac.kr 1 7. Speed-up Techniques Presented by SooKyun Kim.
Computer Graphics I, Fall 2010 Implementation II.
DPL2/10/2016 CS 551/651: Final Review David Luebke
Rendering Large Models (in real time)
CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Presented by Marcus Parker By Naga K. Govindaraju,
Hierarchical Occlusion Map Zhang et al SIGGRAPH 98.
Occlusion Culling David Luebke University of Virginia.
CHC ++: Coherent Hierarchical Culling Revisited Oliver Mattausch, Jiří Bittner, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University.
SHADOW CASTER CULLING FOR EFFICIENT SHADOW MAPPING JIŘÍ BITTNER 1 OLIVER MATTAUSCH 2 ARI SILVENNOINEN 3 MICHAEL WIMMER 2 1 CZECH TECHNICAL UNIVERSITY IN.
Computer Graphics Implementation II
Bounding Volume Hierarchies and Spatial Partitioning
Scalability of Intervisibility Testing using Clusters of GPUs
Bounding Volume Hierarchies and Spatial Partitioning
Real-Time Ray Tracing Stefan Popov.
Hybrid Ray Tracing of Massive Models
Implementation II Ed Angel Professor Emeritus of Computer Science
Conservative Visibility Preprocessing using Extended Projections Frédo Durand, George Drettakis, Joëlle Thollot and Claude Puech iMAGIS-GRAVIR/IMAG-INRIA.
CS 551 / 645: Introductory Computer Graphics
Introduction to Computer Graphics with WebGL
Implementation II Ed Angel Professor Emeritus of Computer Science
Presentation transcript:

Parallel Occlusion Culling for Interactive Walkthrough using Multiple GPUs Naga K Govindaraju, Avneesh Sud, Sun-Eui Yoon, Dinesh Manocha University of North Carolina-Chapel Hill

UNC Chapel Hill Avneesh Sud Goal Interactive Walkthrough of complex 3D environments at high fidelity –Models from CAD, VR –High primitive count –Heterogeneous geometry –Irregular distribution –No large occluders

UNC Chapel Hill Avneesh Sud DoubleEagle Tanker Model 82 million triangles 127,000 objects

UNC Chapel Hill Avneesh Sud SWITCH A parallel algorithm and system for interactive rendering of large complex environments Integrates Hierarchical LODs and conservative Occlusion Culling Generic models –No assumptions on model, distribution Computation done on GPUs

UNC Chapel Hill Avneesh Sud Previous Work Geometric Simplification Occlusion Culling Parallel Approaches Hybrid Approaches

UNC Chapel Hill Avneesh Sud Previous Work UNC MMR System [Aliaga99] –Used image based imposters, occlusion culling, LODs UNC GigaWalk [Baxter02] –Uses 2 graphics pipelines and multiple CPUs

UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

UNC Chapel Hill Avneesh Sud Overview A parallel algorithm and system for interactive rendering of large complex environments Integrates Hierarchical LODs and conservative Occlusion Culling Parallel Occlusion Culling on separate GPUs Graphics hardware optimizations Low network bandwidth requirements General and automatic preprocessing algorithm

UNC Chapel Hill Avneesh Sud Overview: Parallel Occlusion Culling Two pass version of Hierarchical Z- Buffer [Greene93] Exploits temporal coherence Works on generic models, conservative to image precision Avoid readback by ‘switching’ between 2 GPUs

UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

UNC Chapel Hill Avneesh Sud Scene Representation Computing appropriate spatial representation from a functional representation is non-trivial An object varies from a small bolt to a large pipe structure Redefine objects by partitioning and clustering

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization Object size Balanced trees Minimal bounding box overlap of sibling nodes SWITCH : A hybrid approach combining top- down partitioning with bottom-up clustering is used

UNC Chapel Hill Avneesh Sud Partitioning and Clustering Partitioning splits large objects into multiple objects –Do not split polygons Clustering groups objects with low polygon counts based on spatial proximity The combination redistributes geometry with good localization and object size

UNC Chapel Hill Avneesh Sud Partitioning & Clustering: Results Powerplant: Original Objects Powerplant: Clustered Objects

UNC Chapel Hill Avneesh Sud Partitioning & Clustering: Results DoubleEagle: Original Objects DoubleEagle: Clustered Objects

UNC Chapel Hill Avneesh Sud Unified Hierarchy Objects are organized into a scene graph hierarchy Single unified hierarchy used for occlusion culling and LOD-based rendering –Low storage overhead –Simple conservative occlusion culling algorithm SWITCH: A top-down AABB bounding volume hierarchy is constructed from redefined objects

UNC Chapel Hill Avneesh Sud HLOD Generation Construct Hierarchical LODs of the AABB scene graph as in [Erikson01] Use GAPS simplification algorithm [Erikson99] HLOD generation is done out-of-core –Store only the LODs of current node and immediate children in main memory

UNC Chapel Hill Avneesh Sud Hierarchical Occluders A hierarchical occluder associated with a node is an approximation of the group of occluders in its subtree HLODs provide an lower polygon count approximation of a group of occluders – serve as hierarchical occluders Perform object space occluder fusion Conservative occlusion culling

UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

UNC Chapel Hill Avneesh Sud Parallel Algorithm and Architecture Three processes in parallel 1.Occluder Rendering (OR): Renders occluder set to depth buffer on GPU1 2.Hardware Culling (HC): Computes visible geometry using hardware occlusion query on GPU2 3.Render Visible Geometry (RVG): Renders visible geometry on GPU3

UNC Chapel Hill Avneesh Sud GPU 1 GPU 2 GPU 3 Display Geometry For Frame i RVG Render Occluders For Frame i+1 OR Hardware Cull For Frame i HC Hardware Cull For Frame i+1 HC Hardware Cull For Frame i+2 HC Render Occluders For Frame i+2 OR Render Occluders For Frame i+3 OR Display Geometry For Frame i+1 RVG Display Geometry For Frame i+2 RVG Frame iFrame i+1Frame i+2 System Timing/Data Flow Z-Buffer SWITCH

UNC Chapel Hill Avneesh Sud Conservative Occlusion Culling Underlying HZB algorithm used for occlusion culling is conservative to image precision Exactly same set of LODs is used for both OR and STC stages –Z buffer used for culling is consistent with the geometry

UNC Chapel Hill Avneesh Sud Hardware Culling Use GL_NV_OCCLUSION_QUERY to determine visible pixels Traverse scene hierarchy rendering bounding boxes of nodes

UNC Chapel Hill Avneesh Sud LOD Selection Pixel Error Metric: Max normal deviation of silhouette in image Traverse down scene graph till error satisfied Upper Bound: Highly conservative DE Engine Room 1K x 1K, 20 PEError Image

UNC Chapel Hill Avneesh Sud GPU Optimizations Multiple Occlusion Tests –Occlusion Query ‘counter’ for each node –Traverse scene graph bread first –Bunch queries for all nodes at a level –40% faster than testing one node with GL_HP_OCCLUSION_TEST

UNC Chapel Hill Avneesh Sud GPU Optimizations Visibility for LOD selection –Visible pixels of bounding box > visible pixels of geometry –No. of visible pixels less than error metric => early termination condition –Provides looser bounds – reduces polygon count

UNC Chapel Hill Avneesh Sud Bandwidth Requirements

UNC Chapel Hill Avneesh Sud Load Balancing Trade off between cluster size and culling efficiency Smaller clusters lead to deeper scene graph but improve culling performance Balances load between culling and rendering

UNC Chapel Hill Avneesh Sud Outline SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

UNC Chapel Hill Avneesh Sud Implementation 3 Dell Precision Workstations with dual 2GHz Pentium4 CPUs, GeForce4 GPU, and 2GB main memory Network: –Implementation 1 : TCP/IP over 100Mbps Fast Ethernet –Implementation 2: TCP/IP over Myrinet

UNC Chapel Hill Avneesh Sud Test Model: Powerplant Original 0.5 Gigabyte dataset 13 Million Polygons 1200 objects Preprocessing 7 hours 1.2 Gigabytes 13 Million Polygons 38,000 objects

UNC Chapel Hill Avneesh Sud Test Model: DoubleEagle Tanker Original 4 Gigabyte dataset 82 Million polygons 127,000 objects Preprocessing 34 hours 8 Gigabytes 82 Million polygons 61,000 objects

UNC Chapel Hill Avneesh Sud Video

UNC Chapel Hill Avneesh Sud Video

UNC Chapel Hill Avneesh Sud Results: Frame Rate Powerplant Model 1024 x 1024 with 10 pixels of error using Ethernet

UNC Chapel Hill Avneesh Sud Results: Frame Rate DoubleEagle Model 1024 x 1024 with 20 pixels of error using Ethernet

UNC Chapel Hill Avneesh Sud Results: Culling Performance DoubleEagle Tanker Model: Object Count

UNC Chapel Hill Avneesh Sud Results: Culling Performance DoubleEagle Tanker Model: Polygon Count

UNC Chapel Hill Avneesh Sud Conclusions Able to interactively render large complex environments with good fidelity Integrates LODs and Occlusion Culling in a general, automatic parallel rendering algorithm A parallel architecture to balance load between 3 GPUs Efficient use of graphics hardware to solve geometric queries A unified scene hierarchy and automatic preprocessing for a generic model Introduces an end-to-end latency of 1 frame

UNC Chapel Hill Avneesh Sud Lessons Learned Parallelism –2 pipelines provide a speedup greater than factor of 2 for complex scenes Load Times –Asynchronous on-demand loading of geometry vastly improves system development and testing

UNC Chapel Hill Avneesh Sud Limitations and Future Work Static LODs lead to popping. Extend to a view-dependent framework An out-of-core algorithm to reduce main memory overhead as in [Varadhan02] Improve performance by reducing network latencies Make more novel uses of graphics hardware Target frame-rate rendering mode Drive large immersive displays

UNC Chapel Hill Avneesh Sud Wish List Multiple graphics cards on one motherboard NV_OCCLUSION_QUERY to also return completely visible / partially visible / completely occluded

UNC Chapel Hill Avneesh Sud Association with NVIDIA Obtained pre-release versions of drivers with NV_OCCLUSION_QUERY Addressed NV_OCCLUSION_QUERY bug in Linux drivers fast

UNC Chapel Hill Avneesh Sud Acknowledgements US ONR US ARO US DOE US NSF NVIDIA Corporation Intel Corporation NNS for the DoubleEagle model UNC Walkthrough group

The End

UNC Chapel Hill Avneesh Sud (a) Original d (b) Partitioned-I 3b 3a 3c 3e 2 (c) Clustered 2*2* 1*1* (d) Partitioned-II 2*a2*a 2*b2*b Hierarchy Generation (e) Compute a top-down AABB tree hierarchy on redefined objects

UNC Chapel Hill Avneesh Sud Performance Tuning Using visible geometry from 2 frames previous avoids bubbles in pipeline Tradeoff between fidelity and frame rate by adjusting pixels of error Asynchronous rendering pipeline Nth farthest Z buffer values Lower HZB resolution for occluder rendering

UNC Chapel Hill Avneesh Sud Video

UNC Chapel Hill Avneesh Sud Results: Frame Rate Powerplant Model 640 x 480 with 10 pixels of error on SGI

UNC Chapel Hill Avneesh Sud Results: Culling Performance Powerplant Model : Object Count

UNC Chapel Hill Avneesh Sud Results: Culling Performance Powerplant Model : Polygon Count

UNC Chapel Hill Avneesh Sud Outline Previous Work SWITCH Overview Scene Representation Parallel Algorithm and Architecture Implementation & Results Conclusions & Future Work

UNC Chapel Hill Avneesh Sud Previous Work: Geometric Simplification Surveyed in [Leubke01] Static Vs View-Dependent Trouble with high depth complexity

UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms

UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments –Cells and Portals [Airey90] –Urban Datasets [Wonka00, Coorg97] –Large Occluders [Schaufler00] General Algorithms

UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms –HZB [Greene93], HOM [Zhang97]

UNC Chapel Hill Avneesh Sud Previous Work: Occlusion Culling Surveyed in [Cohen-Or01] Specific Environments General Algorithms Performing exact visibility on large general datasets in real time is difficult Trouble with highly tessellated scenes

UNC Chapel Hill Avneesh Sud Previous Work: Parallel Approaches Object-Parallel, Screen-Parallel, Frame- Parallel Interactive ray tracing [Wald01] Perform culling in parallel with rendering –VFC in [Garlick90] –Occlusion Culling by occluder shrinking in [Wonka01] Scalable clusters, WireGL [Humphreys01]

UNC Chapel Hill Avneesh Sud Previous Work: Hybrid Approaches Combine LOD and Occlusion Culling techniques –UC Berkeley Walkthrough [Funkhouser96] –Synthetic convex occluders [Andujar01] –Approximate visibility using prioritized layer projections with view dependent rendering [ElSana01] –UNC MMR system [Aliaga99] Not demonstrated in high fidelity on complex CAD models

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Good spatial localization

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Object size –Too large : loose bounding boxes, poor culling performance –Too small : very deep trees

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Balanced trees

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Balanced trees

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Minimal bounding box overlap of sibling nodes

UNC Chapel Hill Avneesh Sud Criteria for Hierarchy Minimal bounding box overlap of sibling nodes

UNC Chapel Hill Avneesh Sud Clustering Clustering algorithm adapted from an image segmentation technique [FH98] MST’s to represent clusters Similar to Kruskal’s algorithm –Euclidean distance between clusters denotes edge weights –Edge weights represent variation in a cluster –2 clusters combined based on Hausdorff metric