Download presentation
Presentation is loading. Please wait.
Published byHorace Fox Modified over 9 years ago
1
Lecture 3 Review of Linear Algebra Simple least-squares
2
9 things you need to remember from Linear Algebra
3
Number 1 rule for vector and matrix multiplication u = Mv u i = k=1 N M ik v k P = QR P ij = k=1 N Q ik R kj Sum over nearest neighbor indices Name of index in sum irrelevant. You can call it anything (as long as you’re consistent)
4
Number 2 transpostion rows become columns and columns become rows (A T ) ij = A ji and rule for transposition of products (AB) T = B T A T Note reversal of order
5
Number 3 rule for dot product a b = a T b = i=1 N a i b i note a a is sum of squared elements of a “the length of a”
6
Number 4 the inverse of a matrix A -1 A = I A A -1 = I (exists only when A is square) I is the identity matrix 1 0 0 0 1 0 0 0 1
7
Number 5 solving y=Mx using the inverse x = M -1 y
8
Number 6 multiplication by identity matrix M = IM = MI in component notation I ij = ij k=1 N ik M kj = M ij k=1 N ik M kj = M ij Just a name … Cross out sum Cross out ik And change k to i in rest of equation
9
Number 7 inverse of a 2 2 matrix abcdabcd A = d-b -ca A -1 = 1 ad-bc
10
Number 8 inverse of a diagonal matrix a 0 0 … 0 0 b 0 … 0 0 0 c … 0... 0 0 0 …z A = A -1 = 1/a 0 0 … 0 0 1/b 0 … 0 0 0 1/c … 0... 0 0 0 …1/z
11
Number 9 rule for taking a derivative use component-notation treat every element as a independent variable remember that since elements are independent dx i / dx j = ij = identity matrix
12
Example: Suppose y = Ax How does y i vary as we change x j ? (That’s the meaning of the derivative dy i /dx j ) first write i-th component of y, y i = k=1 N A ik x k (d/dx j ) y i = (d/dx j ) k=1 N A ik x k = k=1 N A ik dx k /dx j = k=1 N A ik kj = A ij We’re using I and j, so use a different letter, say k, in the summation! So the derivative dy i /dx j is just A ij. This is analogous to the case for scalars, where the derivative dy/dx of the scalar expression y=ax is just dy/dx=a.
13
best fitting line the combination of a pre and b pre that have the smallest sum-of-squared-errors find it by exhaustive search ‘grid search’
14
Fitting line to noisy data y obs = a + bx Observations: the vector, y obs
15
Guess values for a, b y pre = a guess + b guess x a guess =2.0 b guess =2.4 Prediction error = observed minus predicted e = y obs - y pre Total error: sum of squared predictions errors E = Σ e i 2 = e T e
16
Systematically examine combinations of (a, b) on a 101 101 grid Error Surface Minimum total error E is here Note E is not zero b pre a pre
17
Error Surface Note E min is not zero Here are best- fitting a, b best-fitting line
18
Note some range of values where the error is about the same as the minimun value, E min Error Surface E min is here Error pretty close to E min everywhere in here All a’s in this range and b’s in this range have pretty much the same error
19
moral the shape of the error surface controls the accuracy by which (a,b) can be estimated
20
What controls the shape of the error surface? Let’s examine effect of increasing the error in the data
21
Error in data = 0.5 Error in data = 5.0 E min = 0.20 E min = 23.5 The minimum error increases, but the shame of the error surface is pretty much the same
22
What controls the shape of the error surface? Let’s examine effect of shifting the x-position of the data
23
010 5 Big change by simply shifting x-values of the data Region of low error is now tilted High b low a has low error Low b high a has low error But (high b, high a) and (low a, low b) have high error
24
Meaning of tilted region of low error error in (a pre, b pre ) are correlated
25
Best-fit line Best fit intercept erroneous intercept When the data straddle the origin, if you tweak the intercept up, you can’t compensate by changing the slope Best-fit line Uncorrelated estimates of intercept and slope
26
Best-fit line Best fit intercept Low slope line erroneous intercept When the data are all to the right of the origin, if you tweak the intercept up, you must lower the slope to compensate Same slope s Best-fit line Negatively correlation of intercept and slope
27
Best-fit line Best fit intercept erroneous intercept When the data are all to the right of the origin, if you tweak the intercept up, you must raise the slope to compensate Same slope as best-fit line Positive correlation of intercept and slope Best fit intercept
28
data near origin possibly good control on intercept but lousy control on slope -5 0 5 small big
29
data far from origin lousy control on intercept but possibly good control on slope small big 0 50 100
30
Set up for standard Least Squares y i = a + b x i y 1 1 x 1 a y 2 = 1 x 2 b … … … y N 1 x N d = G m
31
Standard Least-squares Solution m est = [G T G] -1 G T d
32
Derivation: use fact that minimum is at dE/dm i = 0 E = k e k e k = k (d k - p G kp m p ) (d k - q G kq m q ) = k d k d k - 2 k d k p G kp m p + k p G kp m p q G kq m q dE/dm i = 0 - 2 k d k p G kp (dm p /dm i ) + k p G kp (dm p/ dm i ) q G kq m q + k p G kp m p q G kq (dm q /dm i ) = -2 k d k p G kp pi + k p G kp pi q G kq m q + k p G kp m p q G kq qi = -2 k d k G ki + k G ki q G kq m q + k p G kp m p G ki 2 k G ki d k + 2 q [ k G ki G kq ]m q = 0 or 2G T d + 2[G T G]m = 0 or m=[G T G] -1 G Td y
33
Why least-squares? Why not least-absolute length? Or something else?
34
Least-SquaresLeast Absolute Value a=1.00 b=2.02 a=0.94 b = 2.02
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.