Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks.

Similar presentations


Presentation on theme: "Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks."— Presentation transcript:

1 Lecture 5 Probability and Statistics

2 Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks

3 Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance C x x2x2 x1x1 Shown as 2D here, but actually N- dimensional

4 the multivariate normal distribution p(x) = (2  ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } has expectation x covariance C x And is normalized to unit area

5 Special case of C x = p(x) = (2  ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } Note |C x | =  1 2   2 2 …  N 2 and (x-x) T C x -1 (x-x) =  i (x i -x i ) 2 /  i 2 So p(x) =  i (2  ) -1/2  i -1 exp{ (x i -x i ) 2 / 2  i 2 } Which is the product of N individual one-variable normal distributions  1 2 0 0 0 … 0  2 2 0 0 … 0 0  3 2 0 … … Uncorrelated case

6 How would you show that the this distribution p(x) = (2  ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } Really has expectation x And covariance C x ???

7 How would you prove this ? Do you remember how to transform a integral from x to y ?  …  p(x) d N x =  …  ? d N y =

8 given y(x) then  …  p(x) d N x =  …  p[x(y)] |dx/dy| d N y = Jacobian determinant, that is, the determinant of matrix J ij whose elements are dx i /dy j p(y)

9 Here’s how you prove the expectation … Insert p(x) into the usual formula for expectation E(x) = (2  ) -N/2 |C x | -1/2 ..  x exp{ -1/2 (x-x) T C x -1 (x-x) } d N x Now use the transformation y=C x -1/2 (x-x) Noting that the Jacobian determinant is |C x | 1/2 E(x) = (2  ) -N/2 ..  (x+ C x 1/2 y) exp{ -1/2 y T y } d N y = x ..  (2  ) -N/2 exp{ -1/2 y T y } d N y + (2  ) -N/2 C x 1/2 ..  y exp{ -1/2 y T y } d N y The first integral is the area under a N-dimensional gaussian, which is just unity The second integral contains an odd function of y times an even function, and so is zero, thus E(x) = x  1 + 0 = x

10 I’ve never tried to prove the covariance … But how much harder could it be ?

11 examples

12 x = 2 C x = 1 0 1 0 1 p(x,y)

13 x = 2 C x = 2 0 1 0 1 p(x,y)

14 x = 2 C x = 1 0 1 0 2 p(x,y)

15 x = 2 C x = 1 0.5 1 0.5 1 p(x,y)

16 x = 2 C x = 1 -0.5 1 -0.5 1 p(x,y)

17 Remember this from last lecture ? x2x2 x1x1 x2x2 x1x1 x1x1 p(x 1 ) p(x 1 ) =  p(x 1,x 2 ) dx 2 x2x2 p(x 2 ) p(x 2 ) =  p(x 1,x 2 ) dx 1 distribution of x 1 (irrespective of x 2 ) distribution of x 2 (irrespective of x 1 )

18 p(x,y) p(y) y y x p(y) =  p(x,y) dx

19 p(x) x p(x,y) y x p(x) =  p(x,y) dy

20 Remember p(x,y) = p(x|y) p(y) = p(y|x) p(x) from the last lecture ? we can compute p(x|y) and p(y,x) as follows P(x|y) = P(x,y) / P(y) P(y|x) = P(x,y) / P(x)

21 p(x,y) p(x|y) p(y|x)

22 Any linear function of a normal distribution is a normal distribution p(x) = (2  ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } And y=Mx then p(y) = (2  ) -N/2 |C y | -1/2 exp{ -1/2 (y-y) T C y -1 (y-y) } with y=Mx and C y =MC x M T

23 Proof needs rules [AB] -1 =B -1 A -1 and |AB|=|A||B| and |A -1 |=|A| -1 p(x) = (2  ) -N/2 |C x | -1/2 exp{ -1/2 (x-x) T C x -1 (x-x) } Transformation is p(y) = p[x(y)] |dx/dy| Substitute in x=M -1 y and |dx/dy|=|M -1 | P[x(y)]|dx/dy| = (2  ) -N |C x | -1/2 exp{ -1/2 (x-x) T M T M T-1 C x -1 M -1 M (x-x) }|M -1 | II Jacobian determinant

24 p[x(y)]|dx/dy| = (2  ) -N/2 |C x | -1/2 |M -1 | exp{ -1/2 (x-x) T M T M T-1 C x -1 M -1 M (x-x) } = |M -1/2 ||C x | -1/2 |M -1/2 | [M(x-x)] T [MC x M T ] -1 M(x-x) } |C y | -1/2 (y-y) T C y -1 (y-y) } So p(y) = (2  ) -N/2 |C y | -1/2 exp{ -1/2 (y-y) T C y -1 (y-y) }

25 Note that these rules work for the multivariate normal distribution if y is linearly related to x, y=Mx then y=Mx (rule for means) C y = M C x M T (rule for propagating error)

26 Do you remember this from a previous lecture? then the standard Least-squares Solution is m est = [G T G] -1 G T d if d = G m

27 Let’s suppose the data, d, are uncorrelated and that they all have the same variance, C d =  2 I To compute the variance of m est note that m est =[G T G] -1 G T d is a linear rule of the form m=Md, with M=[G T G] -1 G T so we can apply the rule C m = M C d M T

28 M=[G T G] -1 G T C m = M C d M T = {[G T G] -1 G T }  d 2 I {[G T G] -1 G T } T =  d 2 [G T G] -1 G T G [G T G] -1T =  d 2 [G T G] -1T =  d 2 [G T G] -1 G T G is a symmetric matrix, so its inverse it symmetic, too Memorize !

29 Example – all the data assumed to have the same true value, m 1, and each measured with the same variance,  d 2 d 1 1 d 2 1 d 3 = 1 m 1 … d N 1 G G T G = N so [G T G] -1 = N -1 G T d =  i d i m est =[G T G] -1 G T d = (  i d i ) / N C m =  d 2 / N

30 m 1 est = (  i d i ) / N … the traditional formula for the mean! the estimated mean has variance C m =  d 2 / N =  m 2 note then that  m =  d /  N the estimated mean is a normally-distributed random variable the width of this distribution,  m, decreases with the square root of the number of measurements

31 Accuracy grows only slowly with N N=1 N=100 N=10 N=1000

32 Another Example – fitting a straight line, with all the data assumed to have the same variance,  d 2 d 1 1 x 1 d 2 1 x 2 d 3 = 1 x 3 m 1 … m 2 d N 1 x N G G T G = N  i x i  i x i  i x i 2 C m =  2 d [G T G ] -1 = N  i x i 2 – [  i x i ] 2 2d2d  i x i 2 -  i x i  i x i N

33 C m =  2 d [G T G ] -1 = N  i x i 2 – [  i x i ] 2 2d2d  i x i 2 -  i x i  i x i N  2 intercept = N  i x i 2 – [  i x i ] 2 2d2d ixi2ixi2  2 slope = 2d2d  N  i x i 2 – [  i x i ] 2 intercept: m 1 est ± 2  intercept slope: m 2 est ± 2  slope standard error of the intercept 95% confidence intervals

34 Beware! intercept: m 1 ± 2  intercept slope: m 2 ± 2  slope 95% confidence intervals These are probabilities of m 1 irrespective of the value of m 2 And m 2 irrespective of the value of m 1 not the joint probability of m 1 and m 2 taken together

35 p(m 1,m 2 ) m2m2 m 2 est ± 2  2 m1m1 m 1 est ± 2  1 probability m 2 in in this box is 95%

36 p(m 1,m 2 ) m2m2 m 2 est ± 2  2 m1m1 m 1 est ± 2  1 probability m 1 in in this box is 95%

37 p(m 1,m 2 ) m2m2 m 2 est ± 2  2 m1m1 m 1 est ± 2  1 probability that both m 1 and m 2 are in in this box is < 95%

38 Intercept and slope are uncorrelated only when  i x i = 0, that is, the mean of the x’s is zero, which occurs when the data straddle the origin remember this discussion from a few lectures ago ? C m =  2 d [G T G ] -1 = N  i x i 2 – [  i x i ] 2 2d2d  i x i 2 -  i x i  i x i N

39 What  2 d do you use in these formulas?

40 Prior estimates of  d Based on knowledge of the limits of you measuring technique … my ruler has only mm tics, so I’m going to assume that  d = 0.5 mm the manufacturer claims that the instrument is accurate to 0.1%, so since my typical measurement is 25, I’ll assume  d =0.025

41 posterior estimate of the error Based on error measured with respect to best fit  2 d = (1/N)  i (d i obs -d i pre ) 2 = (1/N)  i e i 2

42 Dangerous … Because it assumes that the model (“a straight line”) accurately represents the behavior of the data Maybe the data really followed an exponential curve …

43 One refinement to the formula  2 d = (1/N)  i (d i obs -d i pre ) 2 having to do with the appearance of N, the number of data x y If there were only two data, then the best fitting straight line would have no error at all. x y If there were only three data, then the best fitting straight line would likely have just a little error.

44  2 d = (1/N)  i (d i obs -d i pre ) 2 Therefore this formula very likely underestimates the error An improved formula would replace N with N-2  2 d =  i (d i obs -d i pre ) 2 1 N-2 Where the “2” is chosen because two points exactly define a straight line

45 More generally, if there are M model parameters, then the formula would be the quantity N-M is often called the number of degrees of freedom  2 d =  i (d i obs -d i pre ) 2 1 N-M


Download ppt "Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks."

Similar presentations


Ads by Google