1.

Efficient PCF shadowmap filtering
Kees van Kooten Virtual Proteins

Aliasing

PCF Instead of tracing one ray to the light source

PCF PCF traces multiple rays to the light source and calculates a percentage of rays hitting shadow casters. The percentage of hits determines the amount of shadow. 40%

PCF In realtime graphics, the PCF trick is employed to create the illusion (i.e. not physically correct) of soft shadows, by choosing a PCF area exceeding the size of a pixel when projected onto the camera image

Usually, the positions of the additional pcf samples are offset by 1 shadowmap texel from their neighbours, such that the pcf mask covers a contiguous area in the shadowmap.

PCF nearest Viewed from the perspective of the shadowmap: one way to perform PCF is to perform the depth comparison at every pixel neareast to the sample position, and average the results.

PCF nearest 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 This corresponds to looking at the shadowmap as a function with values 0 or 1 at every shadowmap texel, and a weighting of 1 for every texel covered by the pcf mask (note that in case the pcf mask’s sample weights differ from 1, the nearest shadowmap texels receive weights different from 1 as well) 1 1 1 1 1 1 1 1 1

PCF bilinear 0.56 0.19 Hardware accelerated bilinear pcf exists for quite a while now. Instead of a nearest texel depth comparison, the four nearest shadowmap texels are compared with the sample depth, and the four results are bilinearly interpolated on the basis of the sample position. 0.19 0.06

PCF bilinear 0.56 0.75 0.19 This happens for all shadowmap samples. In fact, since the depth comparisons are the same between neighbouring samples, the weights can be summed 0.19 0.38 0.06

PCF bilinear 0.56 0.75 0.75 0.19 0.19 0.38 0.38 0.06

PCF bilinear 0.56 0.75 0.75 0.75 0.75 0.19 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 The end result is a much smoother shadowmap boundary, as per rendered pixel of an image the samples tend to move slowly from one shadowmap texel to the next, without sudden jumps between shadowmap texel comparisons (in case of shadowmap magnification). However, the texels with a weight of 1 are compared to the same depth in 4 bilinear PCF lookups. 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.19 0.25 0.25 0.25 0.25 0.06

PCF bilinear A naïve way to reduce the number of lookups (and with it, texture bandwidth) is to sample texels using the bilinear depth test, with an offset of 2 times the texel size between samples (as proposed in GPU Gems)

0.56 0.19 0.56 0.19 0.56 0.19 0.19 0.06 0.19 0.06 0.19 0.06 0.56 0.19 0.56 0.19 0.56 0.19 0.19 0.06 0.19 0.06 0.19 0.06 0.56 0.75 0.75 0.75 0.75 0.19 0.56 0.19 0.56 0.19 0.56 0.19 0.75 1 1 1 1 0.25 0.19 0.06 0.19 0.06 0.19 0.06 0.75 1 1 1 1 0.25 When comparing the weights, the result is not the same. 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.19 0.25 0.25 0.25 0.25 0.06

PCF efficient bilinear
However, we can have the best of both worlds, as long as we accept some small restrictions. Using only about a fourth of the number of samples of the original bilinear PCF mask, offset slightly, and then scaled individually, we can achieve the same result.

0.75 1 0.25 A one-dimensional perspective of the shadowmap with weights

S = aF1 + bF2 + cF3 + dF4 + eF5 + fF6 a b c d e f F1 F2 F3 F4 F5 F6
In general, we have a number of depth comparisons with shadowmap texels (the function Fx), and the corresponding weights resulting from the type of shadowmap mask + filter used

aF1 + bF2 Isolating two depth comparisons with their weights, we get the function af1+bf2.

aF1 + bF2 lerp(F1,F2,o) Now we can always choose a linear interpolation of f1 and f2 with a certain offset o...

aF1 + bF2 s∙lerp(F1,F2,o) lerp(F1,F2,o)
...scaled by a certain factor s, such that the result is equivalent to aF1 + bF2 s∙lerp(F1,F2,o)

(a+b) s = b o = s aF1+bF2 = s∙lerp(F1,F2,o)

aF1 bF2

aF1+bF2

aF1+bF2 The line through F1, F2, yields a linear interpolation with arbitrary offset o lerp(F1,F2,o)

aF1+bF2 A certain s puts our line through aF1+bF2. So we’re looking for the exact combination of o and s that yields aF1 bF2 s∙lerp(F1,F2,o)

aF1+bF2 s∙(1-o)F1+ s∙oF2

s = a+b

b o = a+b

s = (a+b)lerp(F1,F2, ) + (c+d)lerp(F1,F2, ) + (e+f)lerp(F1,F2, ) b a+b

~1/2 #lookups ub(m/2)(m+1) = ub(m/2)m+ub(m/2) <= m^2/2 + m/2 + ub( m/2 ) <= ub(m^2/2)+m

PCF in 2D aF1 bF2 dF4 cF3

s1∙lerp(F1,F2,o1) aF1 bF2 cF3 dF4 s2∙lerp(F3,F4,o2)

s1∙G1 aF1 bF2 cF3 dF4 s2∙lerp(F3,F4,o2)

s1∙G1 aF1 bF2 cF3 dF4 s2∙G2

= (s1+s2)lerp(G1,G2, ) s∙lerp( G1, G2, y) s∙lerp(lerp(F1,F2,x),

= = (s1+s2)lerp(G1,G2, ) s∙lerp(s1∙lerp(F1,F2,o1), s2∙lerp(F3,F4,o2),
y) = s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y)

Doomed? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Are we therefore constrained to uniform weights? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Separability

W horizontal 0.75 1 1 1 1 0.25

W horizontal 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25

W vertical 0.75 1 1 1 1 0.25 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 0.25

W vertical 0.75 1 1 1 1 0.19 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.06 0.25

W vertical 0.75 1 1 1 0.75 0.19 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 0.25 0.06 0.25

W vertical 0.56 0.75 0.75 0.75 0.75 0.19 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.19 0.25 0.25 0.25 0.25 0.06 0.25

Examples Uniform Gaussian Higher Order; Bicubic, Biquintic

aF1 bF2 cF3 dF4 s∙lerp(F1,F2,o) s∙lerp(F3,F4,o)
After the horizontal pass, every sample across the y dimension has the same offset and scale s∙lerp(F3,F4,o)

aF1 bF2 cF3 dF4 w1y∙sx∙lerp(F1,F2,ox) w2y∙sx∙lerp(F3,F4,ox)
A vertical weighting factor is added after the vertical pass w2y∙sx∙lerp(F3,F4,ox)

w1y∙G1 aF1 bF2 cF3 dF4 w2y∙G2

(w1y+w2y)lerp(G1,G2, ) s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y) w2y

= (w1y+w2y)lerp(G1,G2, ) sy∙lerp(G1, G2, oy) s∙lerp(lerp(F1,F2,x),

= (w1y+w2y)lerp(G1,G2, ) sy∙lerp(sx∙lerp(F1,F2,ox), sx∙lerp(F3,F4,ox),
oy) s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y)

= = (w1y+w2y)lerp(G1,G2, ) (sy+sx)∙lerp(lerp(F1,F2,ox),
oy) = Ass the offsets in x direction are the same as well, the expression now matches with a bilinear interpolation s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y)

bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
c ( aF1 + bF2 ) + d ( aF3 + bF4 ) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)

c ( aF1 + bF2 ) + d ( aF3 + bF4 ) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy) sx = a+b sy = c+d

c ( aF1 + bF2 ) + d ( aF3 + bF4 ) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy) sx = a+b sy = c+d ox = b/sx oy = d/sy

Examples + d ( aF3 + bF4 ) c ( aF1 + bF2 )
What are the actual values of a, b, c, d? + d ( aF3 + bF4 )

Uniform Grid 1 1 1 1 1 a,b,c,d not equivalent to sample weights!

Uniform Grid x F1 F2 F3 F4 F5 F6

Uniform Grid 1-x x F1 F2 F3 F4 F5 F6 Assume bilinear depth comparisons

Uniform Grid 1-x x 1-x x F1 F2 F3 F4 F5 F6

Uniform Grid 1-x x 1-x x 1-x x 1-x x 1-x x F1 F2 F3 F4 F5 F6

Uniform Grid 1-x 1 1 1 1 x F1 F2 F3 F4 F5 F6

Uniform Grid s1x = (1-x)+1 = 2-x o1x = 1/s1x = 1/(2-x) 1-x 1 F1 F2 F3
A and B s1x = (1-x)+1 = 2-x o1x = 1/s1x = 1/(2-x)

Uniform Grid 1 1 F1 F2 F3 F4 F5 F6 s2x = 1+1 = 2 o2x = 1/s2x = 1/2

Uniform Grid F2 F3 F4 F5 F6 1 x F1 s3x = 1+x o3x = x/s3x = x/(1+x)

Uniform Grid ( s1y , o1y ) ( s2y , o2y ) ( s3y , o3y ) 1-y 1 y y
The y direction follows the same concept as the x direction, using the y offset instead ( s2y , o2y ) ( s3y , o3y )

Uniform Grid 9 combinations bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
This yields 9 combinations of (sx,ox), (sy,oy) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)

= Notice that we have reduced the number of samples from 25 to 9, but as we are calculating exactly the same result, a division by 25 is still in order. 25 25

Uniform Grid

Gaussian Grid g1 g2 g3 g4 g5

Gaussian Grid g2 (1-x) g4 (1-x) + g1 x + g3 x g5 x g1(1-x) g2 x +
F1 F2 F3 F4 F5 F6 g1(1-x) g2 x + g4 x + When calculating offsets and scales, many terms collapse into simpler ones g3 (1-x) g5 (1-x)

Gaussian Grid

Bicubic Interpolation
Up until now, every evaluation sample was a (bi-)linear interpolation, which means that the influence of a certain depth comparison is weighted linearly into lookups in its surrounding area. This weighting is not continuous in its derivative, with as a result that the shadow border’s staircasing effect (transition from light to dark) does not look smooth. A higher order interpolation function eliminates this lack of smoothness.

Take two linear interpolations and multiply each with another linear interpolation across its domain. Adding the results together, obtains a quadratic function

Repeat this procedure once more to get a bicubic interpolation function, interpolating the influence of 1 depth comparison across a 4 texel wide area.

Because every cubic spline covers a 4 texel wide area, a single lookup involves the weights of 4 surrounding cubic splines, centered at 4 depth comparison values

a =-1x3+3x2-3x+1 b = 3x3-6x2+4 c = -3x3+3x2+3x+1 d = x3 x a b c d F1
F1 to F4 are the four depth comparisons, which cubic spline weights have a nonzero influence at the evaluation position x. Every depth comparison contributes a different ‘part’ of the cubic spline to the texel area where the sampling is performed. As we don’t have higher order texture lookups, we have to calculate a,b,c,d ourselves after establishing the texel offset

s1x = a+b o1x = b/s1x s2x = c+d o2x = d/s2x a b c d F1 F2 F3 F4
After establishing the depth comparison weightings a,b,c,d, the scales and offsets can be calculated in a standard fashion using linear lookups.

(o,s)x1y1 (o,s)x2y1 (o,s)x1y2 (o,s)x2y2 In 2 dimensions, the two linear lookups in each dimension can be combined in four ways, giving us the same result as a single theoretical bicubic interpolation.

Biquintic Interpolation
As bicubic interpolation is not smooth enough to eliminate jaggies, one might be tempted to construct a higher order interpolation. This is indeed possible, for example in case of a biquintic interpolation covering a 6x6 texel area

d e f F1 F2 F3 F4 F5 F6 Predictably, this results in 3 texture lookups per dimension, but requires evaluation a quintic function for the weightings a,b,c,d,e,f

However, while the result is smoother, the edges are still preserved

4x4 Quadratic Interpolation
Instead, consider a 2x2 area of samples, each using quadratic interpolation, which is smoother than linear interpolation, but keeps the blurryness intact. (caution should be exerted while calculating the weights; every quadratic kernel covers a 3-texel wide area, which each part bounded at the positions in between texels)

Adding another sample scales certain cubic splines, and introduces new texels of influence at the border

This can be repeated as often as desired. In this case, a 4x4 biquadratic filter is constructed. Still, this only results in a 3 by 3 bilinear lookup filter, after offsets and scales

Problems - Only applicable when samples overlap
- Only separable kernels fully optimized - Not orthogonal to all PCF extensions

Gradient-based depth offset
√

Shadow kernel slope

d1 d2 d3 d4 d5

1-x x F1(d1) F2(d1) (1-x)F1(d1) + x F2(d1)

(1-x)F1(d1) + x F2(d1) + (1-x)F1(d2) + x F2(d2)

Cap slope gradient An unbounded gradient gives rise to aliasing problems at steep surface angles wrt to the light, which can be avoided by clamping the maximum gradient. The artifacts introduced are hidden by the ndotl lighting term

Most of the efficient pcf gradient artifacts are present at steep angles as well, which is convenient

Higher quality shadows
Use fetch4(DX10.1) or textureGather(GLSL), and create your own comparison function

Thanks!

References High Quality Direct3D 10.0 & 10.1 accelerated techniques
J. Story, H. Gruen Gpu Gems – Shadow Map Antialiasing M. Bunnell, F. Pellacini Gpu Gems 2 – Fast Third-Order Texture Filtering C. Sigg, M. Hadwiger

1.

Similar presentations

Presentation on theme: "1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1.

Similar presentations

Presentation on theme: "1."— Presentation transcript:

Similar presentations

About project

Feedback