Roger A. Crawfis CIS 781 The Ohio State University


Visibility Culling Roger A. Crawfis CIS 781 The Ohio State University

Interactive Frame Rates Are Difficult To Achieve

The Problem Two keys for an interactive system Interactive rendering speed: too many polygons – difficult!! Uniform frame rate: varied scene complexity – difficult!!

Possible Solutions Visibility Culling – back face culling, frustum culling, occlusion culling (might not be sufficient) Levels of Detail (LOD) – hierarchical structures and choose one to satisfy the frame rate requirement

LOD Selections How to pick the Optimal ones??!!

Occlusion Culling Hidden Surface Removal methods are not fast enough for massive models on current hardware Occlusion Culling avoids rendering primitives that are occluded by another part of the scene Occlusion Culling techniques are ideally output sensitive – runtime is proportional to the size of exact visibility set

Related Work Hierarchical Z-Buffer Hierarchical LODs Image space occlusion culling method [Greene’93] Build a layered Z-pyramid with a different resolution of the Z-buffer at each level Allows quick accept/reject Hierarchical LODs Simplification Culling : Approximate entire branch of the scene graph by an HLOD Can we use HLODs as occluders/occludees?

Visibility in Games What do we need it for? Increase of rendering speed by removing unseen scene data from the rendering pipeline as early as possible Reduction of data transfers to the graphics hardware Current games would not be possible without visibility calculations

Visibility methods 2 very different categories: Visibility from a region (Portals, PVS) (Quake, Unreal, Severance and co.) Visibility from a point (Z-Buffer, BFC,...) Racing games, outdoor scenes, sports games etc.

Point-Visibility Occlusion Traditionally used: Back-Face culling Z-Buffering View frustum culling Octree Quadtree

Only culling to the left and right sides of the viewing frustum. A PSX Example Iron Soldier 3 on PSX: View frustum culling based on a quad-tree Back-face culling Painters algorithm Only culling to the left and right sides of the viewing frustum.

New Occlusion Methods Image-space occlusion culling Hierarchical Z-Buffering Hierarchical Occlusion Maps Object-space occlusion culling Hierarchical View Frustum culling Hierarchical Back-Face culling

Visibility Culling We will look at these: Hierarchical Back-face culling View-frustum culling Occlusion culling Detail culling

Hierarchical Back-Face Culling Partitions each model into clusters Primitives in one cluster are: Facing into similar directions Lie close to each other If the cluster fails the visibility test, all primitives in this cluster are culled

Hierarchical Back-Face Culling

Normal Maps Create a data structure that places each polygon in the space according to its normal direction. Partition this space and then simply look at those partitions that might have visible polygons. phi theta

View-Frustum Culling Remove objects that are outside the viewing frustum Construct bounding volumes (BVs) Create hierarchy BV/V-F intersection tests Mostly done in “Application Stage”

View-Frustum Culling Culling against bounding volumes to save time Bounding volumes – AABB, OBB, Spheres, etc. – easy to compute, as tight as possible AABB Sphere OBB

View-Frustum Culling Often done hierarchically to save time In-order, top-down traversal and test

View-Frustum Culling Two popular hierarchical data structures – BSP Tree and Octree Axis-Aligned BSP Polygon-Aligned BSP Intersecting?

View-Frustum Culling Octree A parent has 8 childrens Subdivide the space until the number of primitives within each leaf node is less than a threshold In-order, top-down traversal

Hierarchical Z-Buffer Z-Buffer is arranged in an image pyramid. Scene is partitioned in an octree. Octree nodes are tested against the Z-Pyramid where pixels have the same size. Visible nodes serve as input for the next frame. Relies on HW visibility query.

HZB/Hierarchical occlusion maps

Hierarchical occlusion maps Potential occluders are pre-selected These occluders are rendered to the occlusion map. The hierarchy can be built with MIP-Mapping HW Depth test after occlusion test Separate depth estimation buffer

Hierarchical View Frustum Culling Speeds up VFC by testing only 2 box corners of a bounding box first. Plane coherency during frame advancing Test against VF-octants. BB-Child masking

Detail Culling A technique that sacrifices quality for speed Base on the size of projected BV – if it is too small, discard it. Also often done hierarchically. Always helps to create a hierarchical structure, or scene graph.

Occlusion Culling Discard objects that are occluded Z-buffer is not the smartest algorithm in the world (particularly for high depth- complexity scenes) We want to avoid the processing of invisible objects

Occlusion Culling G: input graphics data OcclusionCulling (G) Or = empty For each object g in G if (isOccluded(g, Or)) skip g else render (g) update (Or) end End G: input graphics data Or: occlusion representation The problem: algorithms for isOccluded() Fast update Or

Hierarchical Visibility Object-space octree Primitives in a octree node are hidden if the octree node (cube) is hidden A octree cube is hidden if its 6 faces are hidden polygons Hierarchical visibility test:

Hierarchical Visibility (obj-sp.) From the root of octree: View-frustum culling Scan conversion each of the 6 faces and perform z-buffering If all 6 faces are hidden, discard the entire node and sub-branches Otherwise, render the primitives here and traverse the front-to-back children recursively A conservative algorithm – why?

Hierarchical Visibility (obj-sp.) Scan conversion the octree faces can be expensive – cover a large number of pixels (overhead) How can we reduce the overhead? Goal: quickly conclude that a large polygon is hidden Method: use hierarchical z-buffer !

Hierarchical Z-buffer An image-space approach Create a Z-pyramid 1 value ¼ resolution ½ resolution Original Z-buffer

Hierarchical Z-buffer (2) 1 0 6 0 3 1 2 3 9 1 2 9 1 2 2 6 9 2 9 Keep the maximum value

Hierarchical Z-buffer update Visibility (OctreeNode N) if (isOccluded (N, Zp) then return; for each primitive p in N render and update Zp end for each child node C of N in front-to-back order Visibility ( C )

Some Practical Issues A fast software algorithm Lack of hardware support Scan conversion Efficient query of if a polygon is visible (without render it) Z feedback

Combining with hardware Utilizing frame-to-frame coherence First frame – regular HZ algorithm (software) Remember the visible octree nodes Second frame (view changes slightly) Render the previous visible nodes using OpenGL Read back the Z-buffer and construct Z-pyramid Perform regular HZ (software) What about the third frame? Utilizing hardware to perform rendering and Z-buffering – considerably faster

Hierarchical Occlusion Map Zhang et al SIGGRAPH 98

Basic Ideas Choose a set of graphics objects from the scene as Occluders Use the occluders to define an Occlusion Map (hierarchically) Compare the rest of scene against the occlusion map

Example Blue: Occluders Red: Occludees

Algorithm Pipeline Occluder Viewing Frustum Occluder Rendering Database Culling Selection Build Occlusion Map Hierarchy Real Viewing Frustum Occlusion Test Scene Culling

2-Step Occlusion Test Overlap Test Overlap + Depth = Occlusion

Why decomposition? The occlusion test is done approximately (conservatively) We can afford to be more conservative in depth test than overlap test

Why Decomposition?

Overlap Test – Occlusion Map Representation of projection for overlap test: occlusion map A gray scale image – each pixel represents one block of screen region Generate by rendering occluders

Occlusion Map (OM) Each pixel of the occlusion map has an opacity, which represents the ratio of the sum of the opaque areas in the block to the total area. If fully covered, p= 1, if anti-alised pixel, p <1) Occlusion map: the alpha channel of an image

Overlap Test using OM For each potential occludee, we can scan-convert it and compare against the opacity of the pixels it overlaps Expensive!! Conservative Approximation: use the screen-space bounding box of the occludee (a superset of the actual covered pixels) If all the pixels inside the bounding box are opaque, the object is occluded.

Hierarchical Occlusion Map Like hierarchical Z-buffer, we can create a hierachy to speed up the comparison (for large objects) The low resolution pixel is an average of the high resolution pixels

Overlap Test using HOM Basic Algorithm Start from the lowest resolution If the pixel cover the bounding rectangle has a value 1, the object is occluded Otherwise traverse down the hierarchy: If all children =1: occluded If all children =0; not occluded Otherwise, traverse down further

Approximate Overlap Test Instead of concluding an object is occluded only when the bounding box is within pixels with opacity 1, we can use an threshold between [0,1] Early termination in the high level of the hierarchy What does it mean when a block has high opacity but not one? This is the unique feature of HOM !!

Depth Test Approximate Z (depth) test: A single Z Plane to separate the occluders from occludees.

Depth Test Break the screen into small regions Build at each frame Instead of using Z-buffer, use the occluder’s bounding volume’s farthest Z Compare each potential occludee’s nearest Z (con- servative test)

Occluder Selection Ideal occluder: the visible objects – it’s a joke View-dependent occluder: too expensive Solution: Estimate and build an occluder database Discard objects that do not server as good occluders

Occluder Selection Size: not too small Redundant: detail polygons (clock on the wall) Complexity: Complex polygons are not preferred (why?) Done at run time – sort the occluders in depth, add them in order until reach the polygon count.

OPS View-independent Occluders X Z

OPS View-dependent Occluders

Occludders In practice, use traditional, static LOD’s More restrictive view-independent OPS Well-studied and available Low run-time overhead Shared with final rendering, no extra memory Area-preserving [Erikson 98]

Occluder selection At run time Visibility sampling Distance-based selection with a polygon budget Temporal coherence Visibility sampling Pre-compute visible objects on a 3-D grid Facilitates run-time selection

Implementation A two-pass framework Occluder LOD Rendering Selection View Scene Frustum Build Occlusion Database Culling Representation Occlusion LOD Culling

Results The city model

Results The city model 312,524 polygons Single CPU 5,000 occluder polygons Depth estimation buffer Opacity thresholds 1.0 Lighting; display lists; no triangle strips



Results Auxiliary Machine Room (AMR)

Results AMR 632,252 polygons 3 CPUs 25,000 occluder polygons No-background z-buffer Approximate culling (0.85 for level 64x64) LOD Lighting; display lists; no triangle strips




Results The power plant model

Results The power plant model 15 million triangles 3 CPUs Visibility pre-processing on a 20x20 grid (~15min) No-background z-buffer 18,000 occluder polygons opacity thresholds from 0.85 and up LOD


Conclusion Goals achieved Generality Speed-up Ease of implementation Any model, any occluder Occluder fusion Speed-up Accelerate interactive graphics Ease of implementation Configurability Robustness

HP hardware occlusion Extend OpenGL – add an OCCLUSION_MODE The bounding box of an object is scan converted A flag is set if any pixel of the BB faces is visible Only need to read back one flag, instead of the entire frame buffer Tradeoff – valuable rendering time is used to render useless BB faces (need to be used wisely) Reportedly 25%-100% speedup were observed

The Real World Scientific approaches often too complicated Science often uses models with hundreds of thousands of vertices, games don’t. (LOD) Game developers “pick” ideas from different algorithms Research has impact on hardware design!

Gaming Industry Parts of the Hierarchical Z-Buffer (HZB) are used sometimes Runtime-LOD is used as input for a simple HZB View Frustum Culling (VFC) is almost always used. Hierarchical Occlusion Maps introduce too much overhead for games, and the z-buffer is there anyway

The Real World (3) PSX-One doesn’t even have a z-buffer ATI’s Radeon has parts of a HZB (Called Hyper-Z) GForce2 only has a z-buffer GForce3 similar to Radeon, but supports HZB visibility query Dreamcasts Power-VR2 works pretty different (Infinite planes)

Conclusions Visibility algorithms are used in many different applications Occlusion culling Shadow calculations Radiosity Volumetric lights All these fields benefit from advances in visibility techniques

Recap Visibility culling: don’t render what can’t be seen Off-screen: view-frustum culling Z-buffered away: occlusion culling Cells and portals Works well for architectural models Teller: accurate, complex, a bit slow pfPortals: fast, cheap, easy

Hierarchical Z-Buffer Q: What do you think this is? Replace Z-buffer with a Z-pyramid Lowest level: full-resolution Z-buffer Higher levels: each pixel represents what? A: Maximum distance of geometry visible to the four pixels “underneath” it Q: How is this going to help?

Hierarchical Z-Buffer Idea: test polygon against highest level first If polygon is further than distance recorded in pixel, stop--it’s occluded If polygon is closer, recursively check against next lower level Amounts to hierarchical rasterization of the polygon, with early termination Must update higher levels as we go

Hierarchical Z-Buffer Z-pyramid exploits image-space coherence: polygon occluded in one pixel is probably occluded nearby HZB also exploits object-space coherence: polygons near an occluded polygon are probably occluded Q: How might you use object-space coherence?

Hierarchical Z-Buffer Subdivide scene with an octree All geometry in an octree node is contained by a cube Before rendering the contents of a node, “render” the faces of its cube If cube faces are occluded, ignore the entire node Query Z-pyramid to “render” cubes

Hierarchical Z-Buffer Exploit temporal coherence (What?) HZB operates at max efficiency when Z-pyramid is already built Idea: most polygons affecting Z-buffer (“nearest polygons”) are the same from frame to frame So start by rendering the polygons (octree nodes) visible last frame

Early Splat Elimination Need: splat visibility test a voxel is only visible if the volume material in front is not opaque screen occluded voxel: does not pass visibility test wall of occluding voxels occlusion map = opacity image

Visibility Test - Naive Check opacity of every pixel within footprint number of pixels to be checked is large voxel footprint opaque area voxel kernel opacity buffer

Visibility Test - Efficient IEEE Trans. Vis. and Comp. Graph. ‘99 Compute occlusion map after each sheet-buffer compositing project do not project opacity  threshold opacity < threshold occlusion map opacity = 0

Early Splat Elimination - Results