Practical logarithmic rasterization for low error shadow maps

Practical logarithmic rasterization for low error shadow maps
Brandon Lloyd UNC-CH Naga Govindaraju Microsoft Steve Molnar NVIDIA Dinesh Manocha UNC-CH

Shadows Shadows are important
aid spatial reasoning enhance realism can be used for dramatic effect High quality shadows for real-time applications remains a challenge Shadows are important part of a user’s experience in real-time applications such as games and simulators because they aid in spatial reasoning, enhance realism and can be used for dramatic effects. Generating high-quality shadow for real-time applications, however, remains a challenge

Shadow approaches Raytracing [Whitted 1980] Shadow volumes [Crow 1977]
not yet real-time for complex, dynamic scenes at high resolutions Shadow volumes [Crow 1977] can exhibit poor performance on complex scenes There are several algorithms that are commonly used to generate shadows. Raytracing can produce high quality shadows, but despite many recent improvements, it does not yet have real time performance for complex dynamic scenes at high resolutions. Shadow volumes can also produce high quality shadows, but can exhibit poor performance on complex scenes // As an image based method it is less sensitive to geometric complexity. Unfortunately shadow maps suffer from aliasing artifacts.

Shadow maps [Williams 1978]
Light Eye Shadow maps are another popular algorithm A shadow map is a depth buffer rendered from the view point of the light. To determine if an image fragment lies in shadow, the fragment is transformed back into the light’s view and its depth is compared to that stored in the shadow map. Shadow maps are quite flexible and easy to implement but they can suffer from aliasing artifacts.

Logarithmic perspective shadow maps (LogPSMs) [Lloyd et al. 2007]
The standard shadow map on the left exhibits bad aliasing On the right we use an algorithm that we recently proposed called logarithmic perspective shadow maps, or LogPSMs Standard shadow map LogPSM

Logarithmic perspective shadow maps (LogPSMs) [Lloyd et al. 2007]
LogPSMs use logarithmic rasterization to warp the shadow map to obtain higher quality. Standard shadow map LogPSM

Goal Here are shadow maps from a street in a town scene We can see that logarithmic rasterization causes straight lines and planar primitives to become curved. The goal of this paper is to perform logarithmic rasterization at rates comparable to linear rasterization. linear rasterization logarithmic rasterization Perform logarithmic rasterization at rates comparable to linear rasterization

Outline Background Hardware enhancements Conclusion and Future work
Handling aliasing error LogPSMs Hardware enhancements Conclusion and Future work

High resolution shadow maps
Requires more bandwidth Decreases shadow map rendering performance Requires more storage Increased contention for limited GPU memory Decreased cache coherence Decreases image rendering performance Here is the view frustum as seen from an overhead directional light. With a standard shadow map areas near the viewer are undersampled leading to aliasing One solution to this problem is to increase the shadow map resolution. But this has several drawbacks... … Finally, the poor spatial locality of shadow map queries in over sampled regions can decrease cache coherence and affect image rendering performance So rather than relying on high resolution alone, it is better to place samples more carefully Poor shadow map query locality

[Aila and Laine 2004; Johnson et al. 2004]
Irregular z-buffer [Aila and Laine 2004; Johnson et al. 2004] Sample at shadow map query positions No aliasing Uses irregular data structures requires fundamental changes to graphics hardware [Johnson et al. 2005] Tthe irregular z-buffer uses the locations of the shadow map queries as the sample positions, thus eliminating aliasing completely. The main drawback of this approach is that it uses irregular data structures that …

Adaptive partitioning
Adaptive shadow maps [Fernando et al. 2001] Queried virtual shadow maps [Geigl and Wimmer 2007] Fitted virtual shadow maps [Geigl and Wimmer 2007] Resolution matched shadow maps [Lefohn et al. 2007] Multiple shadow frusta [Forsyth 2006] There are a number of adaptive techniques that can locally refine the shadow map where needed.

Adaptive partitioning
Requires scene analysis Uses many rendering passes But these usually require a scene analysis that is potentially expensive and they typically require many rendering passes. .

Scene-independent schemes
Match spacing between eye samples Faster than adaptive partitioning no scene analysis few render passes A number of approaches ignore the scene and try to match the spacing between eye samples, which generally increases linearly along the length of the view frustum These methods are typically faster than adaptive partitioning because they do not require a scene analysis and they use fewer render passes. eye sample spacing

Cascade shadow maps Cascaded shadow maps [Engel 2007]
Parallel split shadow maps [Zhang et al. 2006] These methods include algorithms like cascade shadow maps which uses multiple shadow maps of increasing spatial extent to better adapt to the shape of the view frustum, To reduce the error to acceptable levels many partitions can be required

Projective warping Perspective shadow maps (PSMs) [Stamminger and Drettakis 2002] Light-space perspective shadow maps (LiSPSMs) [Wimmer et al. 2004] Trapezoidal shadow maps (TSMs) [Martin and Tan 2004] Lixel for every pixel [Chong and Gortler 2004] Perspective shadow maps and their variants use a projective matrix to warp a single shadow map to the view frustum.

Projective warping Not necessarily the best spacing distribution
PSM LiSPSM Unfortunately, the projective warping does not necessarily produce the best spacing distribution A PSM can produce a good fit to the view frustum, but the spacing in y increases quadratically instead of linearly, leading to high error. LiSPSMs reduce the error in y, but at the expense of the error in x. high error moderate error y x low error moderate error

Logarithmic+perspective parameterization
Perspective projection Logarithmic transform LogPSMs start with a perspective projection and then apply this logarithmic transformation, which we denote as F, to correct the spacing the length of the view frustum, making it linear. This gives LogPSMs a good sample distribution in both directions. We can equalize the spacing in both directions by redistributing the shadow map resolution. Resolution redistribution

Bandwidth/storage savings
Size of the shadow map* required to remove aliasing error (ignoring surface orientation) Uniform Perspective Logarithmic + perspective This table shows the size of the shadow map required to remove aliasing error (ignoring the effects of surface orientation) The size depends mostly on the far to near plane depth ratios. For high depth ratios the LogPSM parameterization can lead to significant bandwidth and storage savings. - near and far plane distances of view frustum *shadow map texels / image pixels

Single shadow map LogPSM
LogPSMs have lower maximum error more uniform error Image resolution: 5122 Shadow map resolution: f/n = 300 Grid lines for every 10 shadow map texels Color coding for maximum texel extent in image These images compare LogPSMs to a perspective warping technique. The bottom row is a color coding of aliasing error LogPSMs have lower maximum error an more uniform error distribution. LiSPSM LogPSM LiSPSM LogPSM

Comparisons This video highlights the difference between the methods

Logarithmic perspective shadow maps
More details Logarithmic perspective shadow maps UNC TR07-005

Outline Background Hardware enhancements Conclusion and Future work
rasterization to nonuniform grid generalized polygon offset depth compression Conclusion and Future work We will now discuss the enhancements to support logarithmic rasterization

Graphics pipeline vertex processor memory interface clipping rasterizer setup One of the advantages of formulating LogPSMs the way we did is that we only need to modify the rasterizer and depth compression. Let’s briefly review standard rasterization fragment processor alpha, stencil, & depth tests depth compression color compression blending

Rasterization Coverage determination Attribute interpolation
coarse stage – compute covered tiles fine stage – compute covered pixels Attribute interpolation interpolate from vertices depth, color, texture coordinates, etc. Rasterization performs two main functions. The first is to determine which pixels are covered by a primitive. This usually proceeds in two stages, a coarse stage that computes covered tiles and a fine stage that computes covered pixels within a tile. The second function is to interpolate attribute values from the vertices such as depth or color.

Edge equations Signs used to compute coverage
Water-tight rasterization Use fixed-point fixed-point “snaps” sample locations to an underlying uniform grid +-+ -++ +++ Rasterization is usually performed using linear edge equations of this form The signs the edge equations are used to determine if the point lies inside a primitive. To obtain a water-tight rasterization of a mesh without holes or double hitting, the edge equations are typically computed with fixed point precision using enough bits for intermediate values to avoid truncation error. Fixed point can be viewed as snapping sample locations to an underlying uniform grid. ++-

Attribute interpolation
Same form as edge equations: The equations for attribute interpolation have the same form as the edge equations, but can be computed using floating point or lower precision

Logarithmic rasterization
y x y' x So how can we extend an existing rasterizer to perform logarithmic rasterization? Here we show a single edge in the view frustum Under a perspective projection the frustum is mapped to a square and the edge remains linear Under the subsequent logarithmic transformation F the line becomes curved. If we take the uniform grid in this warped space and map it back into the linear space using G, the inverse of the logarithmic transformation, the grid becomes nonuniform in the y direction. We can achieve a water tight logarithmic rasterization simply by performing linear rasterization with these nonuniform grid locations snapped to an underlying uniform grid of sufficiently fine resolution. light space linear shadow map space warped shadow map space Linear rasterization with nonuniform grid locations.

Edge and interpolation equations
Algebraically, we are just plugging in G(y’) for y in the linear edge equations computed by the existing setup hardware. We do the same for the interpolation equations. Because the equations are monotonic, existing tile traversal algorithms and optimizations like z-min and z-max culling can be used without modification. We’ll now consider how to evaluate these equations for a 4x4 tile of pixels Monotonic existing tile traversal algorithms still work optimizations like z-min/z-max culling still work

Coverage determination for a tile
Full parallel implementation The brute force approach is to use an array of edge equation evaluators to compute the samples in parallel. To conserve die area, we exploit the linearity in the x direction Full evaluation

Incremental in x 1 2 And break the evaluation into two steps. Wet perform a full evaluation along the first column and then propogate the values in parallel to the rest of the samples with simple additions. These calculations can be pipelined, so that like linear rasterization, a sustained rate of one tile of samples can be computed per clock. Full evaluation Incremental x Per-triangle constants

Generalized polygon offset
light - depth slope - smallest representable depth difference Another part of the rasterizer that needs to be modified is the polygon offset. Polygon offset uses the depth slope and a constant bias to displace the depth values enough so that locations on the surface are not considered shadowed The depth slope is commonly computed with this approximation – the maximum of the depth slopes in either direction. Since both of these values are constant over a planar primitive, so is the offset, and it can be simply baked into the depth interpolation equation. constant texel width

light - depth slope - smallest representable depth difference With logarithmic rasterization, primitives are no longer planar, and the depth slope is no longer constant in y. constant texel width not constant

The depth slope in x is always constant along the length of the polygon It turns out that the depth slope decreases linearly in y. When the depth slope in either direction dominates over the entire polygon we can just bake the offset into the depth equation as before. But what if there is a transition as shown here? One appoach is to perform a max per pixel, but this adds complexity to the rasterizer. Another possibility is to split the polygon during setup at the transition point. We use a slightly conservative approximation and linearly interpolate the offset using the maximum depth slopes at the end points. This can be implemented simply by changing the depth equation constants. That completes the changes that needed for the rasterizer. Let’s move on to depth compression. (Now there is some error associated with this approximation…) Do max per pixel Split polygon Interpolate max at end points

Depth compression Important for reducing memory bandwidth requirements
tile table store compressed yes fits in bit budget? depth compressor tile store uncompressed no Depth compression is an important optimization for reducing memory bandwidth Here is a simplified schematic of how depth compression typically works. The compressor exploits the planarity of depth values to represent a tile with fewer bits. If the tile fits within the allotted bit budget it is stored in compressed form Otherwise it is stored uncompressed. More details can be found in the excellent survey presented by Jon and Thomas in GH last year Important for reducing memory bandwidth requirements Exploits planarity of depth values Depth compression survey [Hasselgren and Möller 2006]

Depth compression - Standard
compressed untouched clamped Resolution: 512x512 Here is a visual comparison of a linear depth compression scheme with ours for a standard shadow map. Compressed tiles are tinted red Linear depth compression Our depth compression

Depth compression - LiSPSM
compressed untouched clamped Resolution: 512x512 Here we apply a perspective warping method. Both methods show good compression Linear depth compression Our depth compression

Depth compression - LogPSM
compressed untouched clamped Resolution: 512x512 But with a logarithmic warping the existing depth compression schemes fail because the primitives are not longer planar Linear depth compression Our depth compression

Depth compression – LogPSM Higher resolution
compressed low curvature untouched clamped Resolution: 1024x1024 At higher resolution, the linear depth compression starts to kick in for some areas with low curvature, but it is still not very effective. Our depth compression is designed to handle this curvature better Linear depth compression Our depth compression

Our compression scheme
z0 Δx z0 d Δy d 24 3 6 10 16 18 9 Δy d d a1 Δx Δy Δy a0 d First we store the depth at the corner with full 24-bit precision. We compute differentials for the first column using the value of the previous row … and differentials in x using the values of the previous column. We then compute an anchor encoding for these two regions separately. We store the differential here as an anchor or base value. and compute a second differential. These two values form a line. We store the offset from this line for the remaining value. In the other section we store the anchor and compute a differential in x and y. These values form a plane. We compute the offsets from this plane for the first row. But in contrast to other anchor encoding schemes, we compute a different y differential for each row to better capture the curvature. We encode the tile with 128 bits, allocating more bits to the anchor points and y-differentials. Δy d Differential encoding Anchor encoding 128-bit allocation table

Test scenes Town model Robots model Power plant
We tested our algorithm over paths through several models of varying complexity Town model 58K triangles Robots model 95K triangles Power plant 13M triangles 2M rendered

Compression methods tested
Anchor encoding [Van Dyke and Margeson 2005] Differential differential pulse code modulation (DDPCM) [DeRoo et al. 2002] Plane and offset [Ornstein et al. 2005] Hasselgren and Möller [2006] and compared against a number of existing techniques

Compression results Average compression over paths through all models
Varying light and view direction We first tested linear rasterization using a perspective warping method. This graph shows the average compressed size of the shadow map over all the paths for varying resolutions These results are similar to what Hasselgren and Moller report in their tests, except for the anchor encoding, which performed worse. The main reason for this is that anchor encoding fails completely when the depth slope gets to steep. This occurs for parts of our paths which have the light at low angles. Compared to the other algorithms, ours falls somewhere in the middle.

Compression results Anchor encoding best linear method
higher tolerance for curvature With logarithmic rasterization, however, our algorithm clearly outperforms the others. Interestingly, of all the linear methods the anchor encoding performs the best for logarithmic rasterization. This is because it has a higher tolerance for curvature.

Summary of hardware enhancements
Apply F(y) to vertices in setup log and multiply-add operations Evaluators for G(y’) exponential and multiply-add operations Possible increase in bit width for rasterizer Generalized polygon offset New depth compression unit can be used for both linear and logarithmic rasterization Here is a summary of the hardware enhancements required for logarithmic rasterization… We apply the logarithmic transformation to vertices in setup to get a starting position for tile traversal, which involves log and multiply-add instructions We need evaluators for the G(y’) terms in the rasterization, which requires an exponential and multiply add. We may need additional bit width to accommodate the fine-resolution grid The generalized polygon offset can be taken care of in setup and we need a new depth compression unit, which incidentally works well for both linear and logarithmic rasterization.

Feasibility Leverages existing designs
Trades computation for bandwidth Aligns well with current hardware trends computation cheap bandwidth expensive One of the main advantages of these enhancements is that they leverage existing hardware designs. Modern GPUs are typically the result of years of careful tuning and optimizations. These enhancements can integrated without major changes. Logartihmic rasterization requires a modest increase in computation but enables LogPSMs which for the same error as other algorithms require less bandwidth and storage Therefore these enhancements align well with current hardware trends where computation is increasingly cheaper, but bandwidth is expensive. OR … where for the same cost computational power considerably outpaces bandwidth

Conclusion Shadow maps Logarithmic rasterization
Handling errors requires high resolution Logarithmic rasterization significant savings in bandwidth and storage Incremental hardware enhancements Rasterization to nonuniform grid Generalized polygon offset Depth compression Feasible leverages existing designs aligns well with hardware trends In conclusion, we have discussed the importance of shadows for real-time applications and of shadow maps as one of the principle algorithms used today. The problem with shadow maps is that high resolution is required to hide the errors. Using logarithmic rasterization it is possible to reduce these errors, leading to significant savings in bandwidth and storage We have presented incremental enhancements to support log rasterization in hardware which we believe to be quite feasible

Future work Prototype and more detailed analysis
Greater generalization for the rasterizer reflections, refraction, caustics, multi-perspective rendering [Hou et al. 2006; Liu et al. 2007] paraboloid shadow maps for omnidirectional light sources [Brabec et al. 2002] programmable rasterizer? It would be good to build a prototype and perform a more detailed analysis Log rasterization is part of a larger context of greater generalization for the rasterizer. A number of important lighting and rendering effects can be implemented with nonlinear rasterization Since we have seen increased programmability in just about every other part of the GPU, It would be interesting to look at the possibility of a programmable rasterizer that could handle these effects in a more general way.

Acknowledgements Jon Hasselgren and Thomas Akenine-Möller for depth compression code Corey Quammen for help with video Ben Cloward for robots model Aaron Lefohn and Taylor Holiday for town model

Acknowledgements Funding agencies NVIDIA University Fellowship
NSF Graduate Fellowship ARO Contracts DAAD and W911NF NSF awards , and DARPA/RDECOM Contract N C-0043 Disruptive Technology Office.

Questions

Precision depends on resolution in y and So just how fine does the underlying uniform grid have to be? It has to be fine enough to represent the smallest delta between the nonuniform scanlines. That depends on two things – the resolution of the shadow map and the far to near plane depth ratio The minimum number of bits needed to represent this delta is given by this expression, … which can be tightly bounded by this expression over the typical range of depth ratios Now the 32 bit floating point vertex locations are limited to only 24 bits of precision. This puts an upper bound on b_min. If we plug in 24 bits for b_min and solve for the depth ratio, we get the maximum depth ratio that we can handle Which for 4K resolution is quite large. Larger depth ratios can be handled by splitting the shadow map. This effectively squares the depth ratio that can be handled. (If we don’t split, then the quality degrades gracefully at the far end of the view frustum) Inputs limited to 24 bits of precision (floating point mantissa)

The maximum relative difference is 1 for a polygon that covers the entire y range. Even this is not bad, but for most polygons it is much less. We performed some measurements for flythroughs in our test scenes We found that this split situation only occurred for 1 % of polygons For these polygons, the average maximum relative difference was .001 f For flythroughs in our test scenes: Split situation occured for 1% of polygons Average was .001

Incremental in y 2 3 Per-frame constants Another alternative is to compute the G(y’) values along the first column incrementally. The main issue here is the possibility of accumulating error. With enough precision, though, this technique could be made to work 1 Full evaluation Incremental x Incremental y

Look up table 1 2 Per-frame constants Because the G(y’) values are shared across an entire scan line, it is also possible to precompute them and store them in a look up table. The main disadvantage here is the size of the table. Full evaluation Incremental x Incremental y LUT

Look up table 2 3 Per-frame constants One possible way to reduce the size of the table could be to store every fourth value and to compute the intermediate values incrementally 1 Full evaluation Incremental x Incremental y LUT

Practical logarithmic rasterization for low error shadow maps

Similar presentations

Presentation on theme: "Practical logarithmic rasterization for low error shadow maps"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Practical logarithmic rasterization for low error shadow maps

Similar presentations

Presentation on theme: "Practical logarithmic rasterization for low error shadow maps"— Presentation transcript:

Similar presentations

About project

Feedback