4.4 Application--OLS Estimation
Background When we do a study of data and are looking at the relationship between 2 variables, and have reason to believe that a linear fit is appropriate, we need a way to determine a model that gives the optimal linear fit (one that best reflects the trend in the data). Ex. Relationship between hits and RBI’s A perfect fit would result if every point in the data exactly satisfied some equation y = a + bx, but this is next to impossible -- too much variability in real world data.
So what do we do? Assume y = a + bx is the best fit for the data. Then we can find a point on the line, (x i, f(x i )), with the same x- value as each of the points in the data set, (x i,y i ) Draw a diagram on the board. Then we say d i = y i - f(x i ) = distance from point in the data to the point on the line. d i is also called the error or residual at x i -- how far data is from line So to measure the fit of the line, we could add up all the errors, d 1 + d 2 + … + d n However, note that a best fit line will have some data above and some below, so this error will turn out to be 0.
So what do we do? Therefore, we need to make all of our errors positive by taking either |d i | or d i 2. |d i | will give the same weight to large and small errors, where d i 2 gives more weight to larger errors Ex d’s: {0,0,0,0,50} vs {10,10,10,10,10} avg of |d i | = 10 = 10
Which one is a better fit? Which should be considered a better fit? graph that goes right through 4 points, but nowhere near #5 graph that is same distance from each point (yes) |d i | method will not show this, but d i 2 method will.
Ordinary Least Squares Method So, we will select the model which minimizes the sum of squared residuals: S = d d … + d n 2 = [y 1 - f(x 1 )] 2 + …+ [y n - f(x n )] 2 This line is called the least squares approximating line We can use vectors to help us choose y = a + bx to minimize S
Ordinary Least Squares Method S, which we will minimize, is just the sum of the squares of the entries in the matrix, Y-MZ. If n = 3, then Y-MZ is a vector = Then S = || Y-MZ|| 2
Ordinary Least Squares Method S = || Y-MZ|| 2 Recall Y and M are given since we have 3 data points to fit. We simply need to select Z to minimize S. Let P be the set of all vectors MZ where Z varies:
Ordinary Least Squares Method It turns out that all of the vectors in set P lie in the same plane through the origin (we discuss why later in the book). The equation of the plane is Take a=0,b=1, or a,b=0 and find that this plane contains: And the normal vector will be U x V =
Ordinary Least Squares Method Y Y Y-MA O MZMA Recall that we are trying to minimize S = || Y-MZ|| 2 Y = (y 1,y 2,y 3 ) is a point in space, and MZ is some vector in the set P which we have illustrated as a plane. S = || Y-MZ|| 2 is the squared distance from the point to the plane, so if we can find the point,MA, in the plane closest to Y, we will have our solution.
Ordinary Least Squares Method Y Y Y-MA O MZMA Y-MA is orthogonal to all vectors,MZ, in the plane, so (MZ) (Y-MA) = 0 Note this rule for dot products when vectors are written as matrices:
Ordinary Least Squares Method Y Y Y-MA O MZMA 0 = (MZ) (Y-MA) =(MZ) T (Y-MA)=Z T M T (Y-MA) =Z T (M T Y-M T MA) = Z (M T Y-M T MA) The last dot product is in two dimensions and tells that (M T Y-M T MA) is orthogonal to every possible Z which can only happen if (M T Y-M T MA) = 0,so M T Y=M T MA called the normal equations for A
Ordinary Least Squares Method Y Y Y-MA O MZMA With x 1, x 2,x 3 all distinct, we can show that M T M is invertible, so from M T Y=M T MA,we get A = (M T M) -1 M T Y, This will give us A=(a,b) which will give then give us the point (a+bx 1,a+bx 2,a+bx 3 ) closest to Y. Thus the best fit line will then be y=a + bx.
Ordinary Least Squares Method Y Y Y-MA O MZMA Recall that this argument started by defining n=3 so that we could use a 3 dimensional argument with vectors. The argument becomes more complex, but does extend to any n.
Theorem 1 Suppose that n data points (x 1,y 1 ),…,(x n,y n ) of which at least two x’s are distinct. If Then, the least squares approximating line has equation y=a 0 + a 1 x where A = is found by Gaussian elimination from the normal equations M T Y=M T MA Since at least two x’s are distinct, M T M is invertible so A=(M T M) -1 M T Y
Example Find the least squares approximating line for the following data: (1,0),(2,2),(4,5),(6,9),(8,12) See what you get with the TI83+
Example Find an equation of the plane through P(1,3,2) with normal (2,0,-1).
We extend further... We can generalize to select the least squares approximating polynmial of degree m: f(x)=a 0 +a 1 x+a 2 x 2 +…+a n x n where we estimate the a’s
Theorem 2 (proof in ch 6) If n data points are given with at least m+1 x’s distinct, then Then least squares approximating polynomial of degree m is: f(x)=a 0 +a 1 x+a 2 x 2 +…+a n x n where Is found by Gaussian elim from normal equations M T Y=M T MA Since at least m+1 x’s are distinct, M T M is invertible so A=(M T M) -1 M T Y
Note we need at least one more data point than the degree of the polynomial we are trying to estimate. I.e. With n data points, we could not estimate a polynomial of degree n.
Example Find the least squares approximating quadratic for the following data points: (-2,0),(0,-4),(2,-10),(4,-9),(6,-3)