Download presentation
Presentation is loading. Please wait.
1
Lecture 5 Probability and Statistics
2
Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks
3
Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance C x x2x2 x1x1 Shown as 2D here, but actually N- dimensional
4
the multivariate normal distribution p(x) = (2 ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } has expectation x covariance C x And is normalized to unit area
5
Special case of C x = p(x) = (2 ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } Note |C x | = 1 2 2 2 … N 2 and (x-x) T C x -1 (x-x) = i (x i -x i ) 2 / i 2 So p(x) = i (2 ) -1/2 i -1 exp{ (x i -x i ) 2 / 2 i 2 } Which is the product of N individual one-variable normal distributions 1 2 0 0 0 … 0 2 2 0 0 … 0 0 3 2 0 … … Uncorrelated case
6
How would you show that the this distribution p(x) = (2 ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } Really has expectation x And covariance C x ???
7
How would you prove this ? Do you remember how to transform a integral from x to y ? … p(x) d N x = … ? d N y =
8
given y(x) then … p(x) d N x = … p[x(y)] |dx/dy| d N y = Jacobian determinant, that is, the determinant of matrix J ij whose elements are dx i /dy j p(y)
9
Here’s how you prove the expectation … Insert p(x) into the usual formula for expectation E(x) = (2 ) -N/2 |C x | -1/2 .. x exp{ -1/2 (x-x) T C x -1 (x-x) } d N x Now use the transformation y=C x -1/2 (x-x) Noting that the Jacobian determinant is |C x | 1/2 E(x) = (2 ) -N/2 .. (x+ C x 1/2 y) exp{ -1/2 y T y } d N y = x .. (2 ) -N/2 exp{ -1/2 y T y } d N y + (2 ) -N/2 C x 1/2 .. y exp{ -1/2 y T y } d N y The first integral is the area under a N-dimensional gaussian, which is just unity The second integral contains an odd function of y times an even function, and so is zero, thus E(x) = x 1 + 0 = x
10
I’ve never tried to prove the covariance … But how much harder could it be ?
11
examples
12
x = 2 C x = 1 0 1 0 1 p(x,y)
13
x = 2 C x = 2 0 1 0 1 p(x,y)
14
x = 2 C x = 1 0 1 0 2 p(x,y)
15
x = 2 C x = 1 0.5 1 0.5 1 p(x,y)
16
x = 2 C x = 1 -0.5 1 -0.5 1 p(x,y)
17
Remember this from last lecture ? x2x2 x1x1 x2x2 x1x1 x1x1 p(x 1 ) p(x 1 ) = p(x 1,x 2 ) dx 2 x2x2 p(x 2 ) p(x 2 ) = p(x 1,x 2 ) dx 1 distribution of x 1 (irrespective of x 2 ) distribution of x 2 (irrespective of x 1 )
18
p(x,y) p(y) y y x p(y) = p(x,y) dx
19
p(x) x p(x,y) y x p(x) = p(x,y) dy
20
Remember p(x,y) = p(x|y) p(y) = p(y|x) p(x) from the last lecture ? we can compute p(x|y) and p(y,x) as follows P(x|y) = P(x,y) / P(y) P(y|x) = P(x,y) / P(x)
21
p(x,y) p(x|y) p(y|x)
22
Any linear function of a normal distribution is a normal distribution p(x) = (2 ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } And y=Mx then p(y) = (2 ) -N/2 |C y | -1/2 exp{ -1/2 (y-y) T C y -1 (y-y) } with y=Mx and C y =MC x M T
23
Proof needs rules [AB] -1 =B -1 A -1 and |AB|=|A||B| and |A -1 |=|A| -1 p(x) = (2 ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } Transformation is p(y) = p[x(y)] |dx/dy| Substitute in x=M -1 y and |dx/dy|=|M -1 | P[x(y)]|dx/dy| = (2 ) -N |C x | -1/2 exp{ -1/2 (x-x) T M T M T-1 C x -1 M -1 M (x-x) }|M -1 | II Jacobian determinant
24
p[x(y)]|dx/dy| = (2 ) -N/2 |C x | -1/2 |M -1 | exp{ -1/2 (x-x) T M T M T-1 C x -1 M -1 M (x-x) } = |M -1/2 ||C x | -1/2 |M -1/2 | [M(x-x)] T [MC x M T ] -1 M(x-x) } |C y | -1/2 (y-y) T C y -1 (y-y) } So p(y) = (2 ) -N/2 |C y | -1/2 exp{ -1/2 (y-y) T C y -1 (y-y) }
25
Note that these rules work for the multivariate normal distribution if y is linearly related to x, y=Mx then y=Mx (rule for means) C y = M C x M T (rule for propagating error)
26
Do you remember this from a previous lecture? then the standard Least-squares Solution is m est = [G T G] -1 G T d if d = G m
27
Let’s suppose the data, d, are uncorrelated and that they all have the same variance, C d = 2 I To compute the variance of m est note that m est =[G T G] -1 G T d is a linear rule of the form m=Md, with M=[G T G] -1 G T so we can apply the rule C m = M C d M T
28
M=[G T G] -1 G T C m = M C d M T = {[G T G] -1 G T } d 2 I {[G T G] -1 G T } T = d 2 [G T G] -1 G T G [G T G] -1T = d 2 [G T G] -1T = d 2 [G T G] -1 G T G is a symmetric matrix, so its inverse it symmetic, too Memorize !
29
Example – all the data assumed to have the same true value, m 1, and each measured with the same variance, d 2 d 1 1 d 2 1 d 3 = 1 m 1 … d N 1 G G T G = N so [G T G] -1 = N -1 G T d = i d i m est =[G T G] -1 G T d = ( i d i ) / N C m = d 2 / N
30
m 1 est = ( i d i ) / N … the traditional formula for the mean! the estimated mean has variance C m = d 2 / N = m 2 note then that m = d / N the estimated mean is a normally-distributed random variable the width of this distribution, m, decreases with the square root of the number of measurements
31
Accuracy grows only slowly with N N=1 N=100 N=10 N=1000
32
Another Example – fitting a straight line, with all the data assumed to have the same variance, d 2 d 1 1 x 1 d 2 1 x 2 d 3 = 1 x 3 m 1 … m 2 d N 1 x N G G T G = N i x i i x i i x i 2 C m = 2 d [G T G ] -1 = N i x i 2 – [ i x i ] 2 2d2d i x i 2 - i x i i x i N
33
C m = 2 d [G T G ] -1 = N i x i 2 – [ i x i ] 2 2d2d i x i 2 - i x i i x i N 2 intercept = N i x i 2 – [ i x i ] 2 2d2d ixi2ixi2 2 slope = 2d2d N i x i 2 – [ i x i ] 2 intercept: m 1 est ± 2 intercept slope: m 2 est ± 2 slope standard error of the intercept 95% confidence intervals
34
Beware! intercept: m 1 ± 2 intercept slope: m 2 ± 2 slope 95% confidence intervals These are probabilities of m 1 irrespective of the value of m 2 And m 2 irrespective of the value of m 1 not the joint probability of m 1 and m 2 taken together
35
p(m 1,m 2 ) m2m2 m 2 est ± 2 2 m1m1 m 1 est ± 2 1 probability m 2 in in this box is 95%
36
p(m 1,m 2 ) m2m2 m 2 est ± 2 2 m1m1 m 1 est ± 2 1 probability m 1 in in this box is 95%
37
p(m 1,m 2 ) m2m2 m 2 est ± 2 2 m1m1 m 1 est ± 2 1 probability that both m 1 and m 2 are in in this box is < 95%
38
Intercept and slope are uncorrelated only when i x i = 0, that is, the mean of the x’s is zero, which occurs when the data straddle the origin remember this discussion from a few lectures ago ? C m = 2 d [G T G ] -1 = N i x i 2 – [ i x i ] 2 2d2d i x i 2 - i x i i x i N
39
What 2 d do you use in these formulas?
40
Prior estimates of d Based on knowledge of the limits of you measuring technique … my ruler has only mm tics, so I’m going to assume that d = 0.5 mm the manufacturer claims that the instrument is accurate to 0.1%, so since my typical measurement is 25, I’ll assume d =0.025
41
posterior estimate of the error Based on error measured with respect to best fit 2 d = (1/N) i (d i obs -d i pre ) 2 = (1/N) i e i 2
42
Dangerous … Because it assumes that the model (“a straight line”) accurately represents the behavior of the data Maybe the data really followed an exponential curve …
43
One refinement to the formula 2 d = (1/N) i (d i obs -d i pre ) 2 having to do with the appearance of N, the number of data x y If there were only two data, then the best fitting straight line would have no error at all. x y If there were only three data, then the best fitting straight line would likely have just a little error.
44
2 d = (1/N) i (d i obs -d i pre ) 2 Therefore this formula very likely underestimates the error An improved formula would replace N with N-2 2 d = i (d i obs -d i pre ) 2 1 N-2 Where the “2” is chosen because two points exactly define a straight line
45
More generally, if there are M model parameters, then the formula would be the quantity N-M is often called the number of degrees of freedom 2 d = i (d i obs -d i pre ) 2 1 N-M
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.