Download presentation
Presentation is loading. Please wait.
1
Lecture 7 Advanced Topics in Least Squares
2
the multivariate normal distribution for data, d p(d) = (2 ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d) } Let’s assume that the expectation d Is given by a general linear model d = Gm And that the covariance C d is known (prior covariance)
3
Then we have a distribution P(d; m) with unknown parameters, m p(d)=(2 ) -N/2 |C d | -1/2 exp{ -½ (d-Gm) T C d -1 (d-Gm) } We can now apply the principle of maximum likelihood To estimate the unknown parameters m
4
Principle of Maximum Likelihood Last lecture we stated this principle as L(m) = i ln p(d i ; m) with respect to m but in this distribution the whole data vector d is being treated as a single quantity So the princple becomes simply Maximize L(m) = ln p(d; m) p(d;m)=(2 ) -N/2 |C d | -1/2 exp{ -½ (d-Gm) T C d -1 (d-Gm) }
5
L(m) = ln p(d; m) = - ½Nln (2 ) - ½ln (|C d |) - ½(d-Gm) T C d -1 (d-Gm) The first two terms do not contain m, so the principle of maximum likelihood is Maximize -½ (d-Gm) T C d -1 (d-Gm) or Minimize (d-Gm) T C d -1 (d-Gm)
6
Special case of uncorrelated data with equal variance C d = d 2 I Minimize d -2 (d-Gm) T (d-Gm) with respect to m Which is the same as Minimize (d-Gm) T (d- Gm) with respect to m This is the Principle of Least Squares
7
minimize E = e T e = (d-Gm) T (d-Gm) with respect to m follows from the Principle of Maximum Likelihood in the special case of a multivariate Normal distribution the data being uncorrelated and of equal variance
8
Corollary If your data are NOT NORMALLY DISTRIBUTED Then least-squares is not the right method to use!
9
What if C d = d 2 I but d is unknown? note |C d | = 2N L(m, d ) = -½Nln(2 ) - ½ln(|C d |) - ½(d-Gm) T C d -1 (d-Gm) = -½Nln(2 ) – Nln( d ) - ½ d -2 (d-Gm) T (d-Gm) The first two terms do not contain m, so the principle of maximum likelihood still implies: Minimize (d-Gm) T (d-Gm) = e T e = E Then L/ d = 0 = N d -1 + d -3 (d-Gm) T (d-Gm) Or, solving for d d 2 = N -1 (d-Gm) T (d-Gm) = N -1 e T e
10
This is the Principle of Maximum Likelihood implies that d 2 = N -1 (d-Gm) T (d-Gm) = N -1 e T e Is a good posterior estimate of the variance of the data, when the data follow a multivariate normal distribution the data are uncorrelated and with uniform (but unknown) variance, d 2
11
But back to the general case … What formula for m does the rule Minimize (d-Gm) T C d -1 (d-Gm) imply ?
12
Trick … Minimize (d-Gm) T (d-Gm) Implies m = [G T G] -1 G T d Now write, Minimize (d-Gm) T C d -1 (d-Gm) = (d-Gm) T C d -1/2 C d -1/2 (d-Gm) = (C d -1/2 d-C d -1/2 Gm) T (C d -1/2 d-C d -1/2 Gm) = (d’-G’m) T (d’-G’m) with d’=C d -1/2 d G’ = C d -1/2 G This is simple least squares, so m= [G’ T G’] -1 G’ T d’ or m = [G T C d -1/2 C d -1/2 G] -1 G T C d -1/2 C d -1/2 d = [G T C d -1 G] -1 G T C d -1 d Symmetric, so it inverse and square root is symmetric, too
13
So, minimize (d-Gm) T C d -1 (d-Gm) implies m = [G T C d -1 G] -1 G T C d -1 d and C m = {[G T C d -1 G] -1 G T C d -1 } C d {[G T C d -1 G] -1 G T C d -1 } T = = [G T C d -1 G] -1 G T C d -1 G [G T C d -1 G] -1 = [G T C d -1 G] -1 Remember formula C m = M C d M T
14
Example with Correlated Noise Uncorrelated Noise Correlated Noise
15
Scatter Plots d i vs. d i+1 high correlation d i vs. d i+2 some correlation d i vs. d i+3 little correlation
16
data = straight line + correlated noise x d = a + bx + n
17
Model for C d [C d ] ij = exp{ -c |i-j| } with c=0.25 exponential falloff from main diagonal MatLab Code: c = 0.25; [XX, YY] = meshgrid( [1:N], [1:N] ); Cd = (sd^2)*exp(-c*abs(XX-YY));
18
Results d = a + bx + n x Both fits about the same … but Intercept Correlated 10.96 ± 20.6 Uncorrelated 8.42 ± 7.9 True 1.0 Slope Correlated 1.92 ± 0.35 Uncorrelated 1.97 ± 0.14 True 2.0 … note error estimates are larger (more realistic ?) for the correlated case
19
How to make correlated noise w = [0.1, 0.3, 0.7, 1.0, 0.7, 0.3, 0.1]'; w = w/sum(w); Nw = length(w); Nw2 = (Nw-1)/2; N=101; N2=(N-1)/2; n1 = random('Normal',0,1,N+Nw,1); n = zeros(N,1); for i = [-Nw2:Nw2] n = n + w(i+Nw2+1)*n1(i+Nw-Nw2:i+Nw+N-1-Nw2); end Define weighting function Start with uncorrelated noise Correlated noise is a weighted average of neighboring uncorrelated noise values
20
Let’s look at the transformations … d’=C d -1/2 d G’ = C d -1/2 G In the special case of uncorrelated data with different variances C d = diag( 1 2, 2 2, … N 2 ) d i ’= i -1 d i multiply each data by the reciprocal of its error G ij ’ = i -1 G ij multiply each row of the data kernel by the same amount Then solve by ordinary least squares 1 2 0 0 … 0 2 2 0 … 0 0 3 2 …...
21
G 11 1 G 12 1 G 13 … 2 G 21 2 G 22 2 G 13 … 3 G 31 3 G 32 3 G 33 … … N G N1 N G N2 N G N3 … m = 1 d 1 2 d 2 3 d 3 … N d N Rows have been weighted by a factor of i -1
22
So this special case is often called Weighted Least Squares Note that the total error is E = e T C d -1 e = i i -2 e i 2 Each individual error is weighted by the reciprocal of its variance, so errors involving data with SMALL variance get MORE weight weight
23
Example: fitting a straight line 100 data, first 50 have a different d than the last 50
24
N=101; N2=(N-1)/2; sd(1:N2-1) = 5; sd(N2:N) = 100; sd2i = sd.^(-2); Cdi = diag(sd2i); G(:,1)=ones(N,1); G(:,2)=x; GTCdiGI=inv(G'*Cdi*G); m = GTCdiGI*G'*Cdi*d; d2 = m(1) + m(2).* x ; MatLab Code Note that C d -1 is explicitly defines as a diagonal matrix
25
Equal variance Left 50: d = 5 right 50: d = 5
26
Left has smaller variance first 50: d = 5 last 50: d = 100
27
Right has smaller variance first 50: d = 100 last 50: d = 5
28
Finally, two miscellaneous comments about least-squares
29
Comment 1 Case of fitting functions to a dataset d i = m 1 f 1 (x i ) + m 2 f 2 (x i ) + m 3 f 3 (x i ) … e.g. d i = m 1 sin(x i ) + m 2 cos(x i ) + m 3 sin(2x i ) …
30
f 1 (x 1 ) f 2 (x 1 ) f 3 (x 1 ) … f 1 (x 2 ) f 2 (x 2 ) f 3 (x 2 ) … f 1 (x 3 ) f 2 (x 3 ) f 3 (x 3 ) … … f 1 (x N ) f 2 (x N ) f 3 (x N ) … m = d1d2d3…dNd1d2d3…dN
31
Note that the matrix G T G has element i,j [G T G] ij = i f i (x k )f j (x k ) = f i f j and thus is diagonal if the functions are orthogonal
32
if the functions are normalized so f i f i = then G T G = I and the least squares solution is m = G T d and C m = d 2 I super-simple formula! m i = f i d guaranteed uncorrelated errors!
33
Example of Straight line x y x1x1 x2x2 x3x3 x4x4 x5x5 x y i = a + bx i implies f 1 (x) = 1 and f 2 (x) = x so condition f 1 (x) f 2 (x)=0 implies i x i = 0 or x=0 this happens when the x’s straddle the origin The choice f 1 (x) = 1 and f 2 (x) = x-x i.e. y = a’ + b’ (x-x) leads to uncorrelated errors in (a’,b’) a a’
34
Example – wavelet functions Localized oscillation with a character- istic frequency
35
GTGGTG “Almost” diagonal
36
Comment 2 sometimes writing least-squares as [G T G] m = G T d or G T [G m] = G T d is more useful than m = [G T G] -1 G T d since you can use some method other than a matrix inverse for solving the equation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.