Download presentation
Presentation is loading. Please wait.
1
1
2
Efficient PCF shadowmap filtering
Kees van Kooten Virtual Proteins
8
Aliasing
10
Aliasing
12
PCF Instead of tracing one ray to the light source
13
PCF PCF traces multiple rays to the light source and calculates a percentage of rays hitting shadow casters. The percentage of hits determines the amount of shadow. 40%
14
PCF In realtime graphics, the PCF trick is employed to create the illusion (i.e. not physically correct) of soft shadows, by choosing a PCF area exceeding the size of a pixel when projected onto the camera image
15
PCF
16
Usually, the positions of the additional pcf samples are offset by 1 shadowmap texel from their neighbours, such that the pcf mask covers a contiguous area in the shadowmap.
17
PCF nearest Viewed from the perspective of the shadowmap: one way to perform PCF is to perform the depth comparison at every pixel neareast to the sample position, and average the results.
18
PCF nearest 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 This corresponds to looking at the shadowmap as a function with values 0 or 1 at every shadowmap texel, and a weighting of 1 for every texel covered by the pcf mask (note that in case the pcf mask’s sample weights differ from 1, the nearest shadowmap texels receive weights different from 1 as well) 1 1 1 1 1 1 1 1 1
19
PCF bilinear 0.56 0.19 Hardware accelerated bilinear pcf exists for quite a while now. Instead of a nearest texel depth comparison, the four nearest shadowmap texels are compared with the sample depth, and the four results are bilinearly interpolated on the basis of the sample position. 0.19 0.06
20
PCF bilinear 0.56 0.75 0.19 This happens for all shadowmap samples. In fact, since the depth comparisons are the same between neighbouring samples, the weights can be summed 0.19 0.38 0.06
21
PCF bilinear 0.56 0.75 0.75 0.19 0.19 0.38 0.38 0.06
22
PCF bilinear 0.56 0.75 0.75 0.75 0.75 0.19 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 The end result is a much smoother shadowmap boundary, as per rendered pixel of an image the samples tend to move slowly from one shadowmap texel to the next, without sudden jumps between shadowmap texel comparisons (in case of shadowmap magnification). However, the texels with a weight of 1 are compared to the same depth in 4 bilinear PCF lookups. 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.19 0.25 0.25 0.25 0.25 0.06
23
PCF bilinear A naïve way to reduce the number of lookups (and with it, texture bandwidth) is to sample texels using the bilinear depth test, with an offset of 2 times the texel size between samples (as proposed in GPU Gems)
24
0.56 0.19 0.56 0.19 0.56 0.19 0.19 0.06 0.19 0.06 0.19 0.06 0.56 0.19 0.56 0.19 0.56 0.19 0.19 0.06 0.19 0.06 0.19 0.06 0.56 0.75 0.75 0.75 0.75 0.19 0.56 0.19 0.56 0.19 0.56 0.19 0.75 1 1 1 1 0.25 0.19 0.06 0.19 0.06 0.19 0.06 0.75 1 1 1 1 0.25 When comparing the weights, the result is not the same. 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.19 0.25 0.25 0.25 0.25 0.06
26
PCF efficient bilinear
However, we can have the best of both worlds, as long as we accept some small restrictions. Using only about a fourth of the number of samples of the original bilinear PCF mask, offset slightly, and then scaled individually, we can achieve the same result.
27
0.75 1 0.25 A one-dimensional perspective of the shadowmap with weights
28
S = aF1 + bF2 + cF3 + dF4 + eF5 + fF6 a b c d e f F1 F2 F3 F4 F5 F6
In general, we have a number of depth comparisons with shadowmap texels (the function Fx), and the corresponding weights resulting from the type of shadowmap mask + filter used
29
aF1 + bF2 Isolating two depth comparisons with their weights, we get the function af1+bf2.
30
aF1 + bF2 lerp(F1,F2,o) Now we can always choose a linear interpolation of f1 and f2 with a certain offset o...
31
aF1 + bF2 s∙lerp(F1,F2,o) lerp(F1,F2,o)
...scaled by a certain factor s, such that the result is equivalent to aF1 + bF2 s∙lerp(F1,F2,o)
32
(a+b) s = b o = s aF1+bF2 = s∙lerp(F1,F2,o)
33
F1 F2
34
aF1 bF2
35
aF1+bF2
36
aF1+bF2 The line through F1, F2, yields a linear interpolation with arbitrary offset o lerp(F1,F2,o)
37
aF1+bF2 A certain s puts our line through aF1+bF2. So we’re looking for the exact combination of o and s that yields aF1 bF2 s∙lerp(F1,F2,o)
38
aF1+bF2 s∙(1-o)F1+ s∙oF2
39
s = a+b
40
b o = a+b
41
s = (a+b)lerp(F1,F2, ) + (c+d)lerp(F1,F2, ) + (e+f)lerp(F1,F2, ) b a+b
42
~1/2 #lookups ub(m/2)(m+1) = ub(m/2)m+ub(m/2) <= m^2/2 + m/2 + ub( m/2 ) <= ub(m^2/2)+m
43
PCF in 2D aF1 bF2 dF4 cF3
44
s1∙lerp(F1,F2,o1) aF1 bF2 cF3 dF4 s2∙lerp(F3,F4,o2)
45
s1∙G1 aF1 bF2 cF3 dF4 s2∙lerp(F3,F4,o2)
46
s1∙G1 aF1 bF2 cF3 dF4 s2∙G2
47
= (s1+s2)lerp(G1,G2, ) s∙lerp( G1, G2, y) s∙lerp(lerp(F1,F2,x),
48
= = (s1+s2)lerp(G1,G2, ) s∙lerp(s1∙lerp(F1,F2,o1), s2∙lerp(F3,F4,o2),
y) = s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y)
49
Doomed? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Are we therefore constrained to uniform weights? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
50
Separability
51
W horizontal 0.75 1 1 1 1 0.25
52
W horizontal 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25 0.75 1 1 1 1 0.25
53
W vertical 0.75 1 1 1 1 0.25 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 0.25
54
W vertical 0.75 1 1 1 1 0.19 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.06 0.25
55
W vertical 0.75 1 1 1 0.75 0.19 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 0.25 0.06 0.25
56
W vertical 0.56 0.75 0.75 0.75 0.75 0.19 0.75 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.75 1 1 1 1 0.25 1 0.19 0.25 0.25 0.25 0.25 0.06 0.25
57
Examples Uniform Gaussian Higher Order; Bicubic, Biquintic
58
aF1 bF2 cF3 dF4 s∙lerp(F1,F2,o) s∙lerp(F3,F4,o)
After the horizontal pass, every sample across the y dimension has the same offset and scale s∙lerp(F3,F4,o)
59
aF1 bF2 cF3 dF4 w1y∙sx∙lerp(F1,F2,ox) w2y∙sx∙lerp(F3,F4,ox)
A vertical weighting factor is added after the vertical pass w2y∙sx∙lerp(F3,F4,ox)
60
w1y∙G1 aF1 bF2 cF3 dF4 w2y∙G2
61
(w1y+w2y)lerp(G1,G2, ) s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y) w2y
62
= (w1y+w2y)lerp(G1,G2, ) sy∙lerp(G1, G2, oy) s∙lerp(lerp(F1,F2,x),
63
= (w1y+w2y)lerp(G1,G2, ) sy∙lerp(sx∙lerp(F1,F2,ox), sx∙lerp(F3,F4,ox),
oy) s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y)
64
= = (w1y+w2y)lerp(G1,G2, ) (sy+sx)∙lerp(lerp(F1,F2,ox),
oy) = Ass the offsets in x direction are the same as well, the expression now matches with a bilinear interpolation s∙lerp(lerp(F1,F2,x), lerp(F3,F4,x), y)
65
bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
c ( aF1 + bF2 ) + d ( aF3 + bF4 ) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
66
bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
c ( aF1 + bF2 ) + d ( aF3 + bF4 ) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy) sx = a+b sy = c+d
67
bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
c ( aF1 + bF2 ) + d ( aF3 + bF4 ) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy) sx = a+b sy = c+d ox = b/sx oy = d/sy
68
Examples + d ( aF3 + bF4 ) c ( aF1 + bF2 )
What are the actual values of a, b, c, d? + d ( aF3 + bF4 )
69
Uniform Grid 1 1 1 1 1 a,b,c,d not equivalent to sample weights!
70
Uniform Grid x F1 F2 F3 F4 F5 F6
71
Uniform Grid 1-x x F1 F2 F3 F4 F5 F6 Assume bilinear depth comparisons
72
Uniform Grid 1-x x 1-x x F1 F2 F3 F4 F5 F6
73
Uniform Grid 1-x x 1-x x 1-x x 1-x x 1-x x F1 F2 F3 F4 F5 F6
74
Uniform Grid 1-x 1 1 1 1 x F1 F2 F3 F4 F5 F6
75
Uniform Grid s1x = (1-x)+1 = 2-x o1x = 1/s1x = 1/(2-x) 1-x 1 F1 F2 F3
A and B s1x = (1-x)+1 = 2-x o1x = 1/s1x = 1/(2-x)
76
Uniform Grid 1 1 F1 F2 F3 F4 F5 F6 s2x = 1+1 = 2 o2x = 1/s2x = 1/2
77
Uniform Grid F2 F3 F4 F5 F6 1 x F1 s3x = 1+x o3x = x/s3x = x/(1+x)
78
Uniform Grid ( s1y , o1y ) ( s2y , o2y ) ( s3y , o3y ) 1-y 1 y y
The y direction follows the same concept as the x direction, using the y offset instead ( s2y , o2y ) ( s3y , o3y )
79
Uniform Grid 9 combinations bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
This yields 9 combinations of (sx,ox), (sy,oy) bilerp(F1,F2,F3,F4,ox,oy)(sx+sy)
80
= Notice that we have reduced the number of samples from 25 to 9, but as we are calculating exactly the same result, a division by 25 is still in order. 25 25
81
Uniform Grid
82
Gaussian Grid g1 g2 g3 g4 g5
83
Gaussian Grid g2 (1-x) g4 (1-x) + g1 x + g3 x g5 x g1(1-x) g2 x +
F1 F2 F3 F4 F5 F6 g1(1-x) g2 x + g4 x + When calculating offsets and scales, many terms collapse into simpler ones g3 (1-x) g5 (1-x)
84
Gaussian Grid
85
Bicubic Interpolation
Up until now, every evaluation sample was a (bi-)linear interpolation, which means that the influence of a certain depth comparison is weighted linearly into lookups in its surrounding area. This weighting is not continuous in its derivative, with as a result that the shadow border’s staircasing effect (transition from light to dark) does not look smooth. A higher order interpolation function eliminates this lack of smoothness.
86
Bicubic Interpolation
87
Bicubic Interpolation
Take two linear interpolations and multiply each with another linear interpolation across its domain. Adding the results together, obtains a quadratic function
88
Bicubic Interpolation
Repeat this procedure once more to get a bicubic interpolation function, interpolating the influence of 1 depth comparison across a 4 texel wide area.
89
Bicubic Interpolation
Because every cubic spline covers a 4 texel wide area, a single lookup involves the weights of 4 surrounding cubic splines, centered at 4 depth comparison values
90
a =-1x3+3x2-3x+1 b = 3x3-6x2+4 c = -3x3+3x2+3x+1 d = x3 x a b c d F1
F1 to F4 are the four depth comparisons, which cubic spline weights have a nonzero influence at the evaluation position x. Every depth comparison contributes a different ‘part’ of the cubic spline to the texel area where the sampling is performed. As we don’t have higher order texture lookups, we have to calculate a,b,c,d ourselves after establishing the texel offset
91
s1x = a+b o1x = b/s1x s2x = c+d o2x = d/s2x a b c d F1 F2 F3 F4
After establishing the depth comparison weightings a,b,c,d, the scales and offsets can be calculated in a standard fashion using linear lookups.
92
Bicubic Interpolation
(o,s)x1y1 (o,s)x2y1 (o,s)x1y2 (o,s)x2y2 In 2 dimensions, the two linear lookups in each dimension can be combined in four ways, giving us the same result as a single theoretical bicubic interpolation.
93
Bicubic Interpolation
94
Biquintic Interpolation
As bicubic interpolation is not smooth enough to eliminate jaggies, one might be tempted to construct a higher order interpolation. This is indeed possible, for example in case of a biquintic interpolation covering a 6x6 texel area
95
Biquintic Interpolation
d e f F1 F2 F3 F4 F5 F6 Predictably, this results in 3 texture lookups per dimension, but requires evaluation a quintic function for the weightings a,b,c,d,e,f
96
Biquintic Interpolation
However, while the result is smoother, the edges are still preserved
97
4x4 Quadratic Interpolation
Instead, consider a 2x2 area of samples, each using quadratic interpolation, which is smoother than linear interpolation, but keeps the blurryness intact. (caution should be exerted while calculating the weights; every quadratic kernel covers a 3-texel wide area, which each part bounded at the positions in between texels)
98
4x4 Quadratic Interpolation
Adding another sample scales certain cubic splines, and introduces new texels of influence at the border
99
4x4 Quadratic Interpolation
This can be repeated as often as desired. In this case, a 4x4 biquadratic filter is constructed. Still, this only results in a 3 by 3 bilinear lookup filter, after offsets and scales
100
4x4 Quadratic Interpolation
103
Problems - Only applicable when samples overlap
- Only separable kernels fully optimized - Not orthogonal to all PCF extensions
104
Gradient-based depth offset
√
105
Shadow kernel slope
107
d1 d2 d3 d4 d5
108
1-x x F1(d1) F2(d1) (1-x)F1(d1) + x F2(d1)
109
(1-x)F1(d1) + x F2(d1) + (1-x)F1(d2) + x F2(d2)
110
Cap slope gradient An unbounded gradient gives rise to aliasing problems at steep surface angles wrt to the light, which can be avoided by clamping the maximum gradient. The artifacts introduced are hidden by the ndotl lighting term
111
Most of the efficient pcf gradient artifacts are present at steep angles as well, which is convenient
113
Higher quality shadows
Use fetch4(DX10.1) or textureGather(GLSL), and create your own comparison function
114
Thanks!
115
References High Quality Direct3D 10.0 & 10.1 accelerated techniques
J. Story, H. Gruen Gpu Gems – Shadow Map Antialiasing M. Bunnell, F. Pellacini Gpu Gems 2 – Fast Third-Order Texture Filtering C. Sigg, M. Hadwiger
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.