Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 3: Least Squares Algebra 3-2/58 Econometrics I Part 3 – Least Squares Algebra

Part 3: Least Squares Algebra 3-3/58 Vocabulary  Some terms to be used in the discussion. Population characteristics and entities vs. sample quantities and analogs Residuals and disturbances Population regression line and sample regression  Objective: Learn about the conditional mean function. Estimate  and  2  First step: Mechanics of fitting a line (hyperplane) to a set of data

Part 3: Least Squares Algebra 3-4/58 Fitting Criteria  The set of points in the sample  Fitting criteria - what are they: LAD Least squares and so on  Why least squares?  A fundamental result: Sample moments are “good” estimators of their population counterparts We will spend the next few weeks using this principle and applying it to least squares computation.

Part 3: Least Squares Algebra 3-5/58 An Analogy Principle for Estimating  In the population E[y | X ] = X so E[y - X |X] = 0 Continuing E[x i  i ] = 0 Summing, Σ i E[x i  i ] = Σ i 0 = 0 Exchange Σ i and E[] E[Σ i x i  i ] = E[ X ] = 0 E[X(y - X) ]= 0 Choose b, the estimator of  to mimic this population result: i.e., mimic the population mean with the sample mean Find b such that As we will see, the solution is the least squares coefficient vector.

Part 3: Least Squares Algebra 3-6/58 Population and Sample Moments We showed that E[ i |x i ] = 0 and Cov[x i, i ] = 0. If so, and if E[y|X] = X, then  = (Var[x i ]) -1 Cov[x i,y i ]. This will provide a population analog to the statistics we compute with the data.

Part 3: Least Squares Algebra 3-7/58 U.S. Gasoline Market, 1960-1995

Part 3: Least Squares Algebra 3-8/58 Least Squares  Example will be, G i on x i = [1, PG i, Y i ]  Fitting criterion: Fitted equation will be y i = b 1 x i1 + b 2 x i2 +... + b K x iK.  Criterion is based on residuals: e i = y i - b 1 x i1 + b 2 x i2 +... + b K x iK Make e i as small as possible. Form a criterion and minimize it.

Part 3: Least Squares Algebra 3-9/58 Fitting Criteria  Sum of residuals:  Sum of squares:  Sum of absolute values of residuals:  Absolute value of sum of residuals  We focus on now and later

Part 3: Least Squares Algebra 3-10/58 Least Squares Algebra

Part 3: Least Squares Algebra 3-11/58 Least Squares Normal Equations

Part 3: Least Squares Algebra 3-12/58 Least Squares Solution

Part 3: Least Squares Algebra 3-13/58 Second Order Conditions

Part 3: Least Squares Algebra 3-14/58 Does b Minimize e’e?

Part 3: Least Squares Algebra 3-15/58 Sample Moments - Algebra

Part 3: Least Squares Algebra 3-16/58 Positive Definite Matrix

Part 3: Least Squares Algebra 3-17/58 Algebraic Results - 1

Part 3: Least Squares Algebra 3-18/58 Residuals vs. Disturbances

Part 3: Least Squares Algebra 3-19/58 Algebraic Results - 2  A “residual maker” M = (I - X(X’X) -1 X’)  e = y - Xb= y - X(X’X) -1 X’y = My  My = The residuals that result when y is regressed on X  MX = 0 (This result is fundamental!) How do we interpret this result in terms of residuals? When a column of X is regressed on all of X, we get a perfect fit and zero residuals.  (Therefore) My = MXb + Me = Me = e (You should be able to prove this.  y = Py + My, P = X(X’X) -1 X’ = (I - M). PM = MP = 0.  Py is the projection of y into the column space of X.

Part 3: Least Squares Algebra 3-20/58 The M Matrix  M = I- X(X’X) -1 X’ is an nxn matrix  M is symmetric – M = M’  M is idempotent – MM = M (just multiply it out)  M is singular; M -1 does not exist. (We will prove this later as a side result in another derivation.)

Part 3: Least Squares Algebra 3-21/58 Results when X Contains a Constant Term  X = [1,x 2,…,x K ]  The first column of X is a column of ones  Since X’e = 0, x 1 ’e = 0 – the residuals sum to zero.

Part 3: Least Squares Algebra 3-22/58 Dummy Variable for One Observation A dummy variable that isolates a single observation. What does this do? Define d to be the dummy variable in question. Z = all other regressors. X = [Z,d] Multiple regression of y on X. We know that X'e = 0 where e = the column vector of residuals. That means d'e = 0, which says that e j = 0 for that particular residual. The observation will be predicted perfectly. Fairly important result. Important to know.

Part 3: Least Squares Algebra 3-23/58

Part 3: Least Squares Algebra 3-24/58 Least Squares Algebra

Part 3: Least Squares Algebra 3-25/58 Least Squares

Part 3: Least Squares Algebra 3-26/58 Residuals

Part 3: Least Squares Algebra 3-27/58 Least Squares Residuals. Note the peculiar pattern.

Part 3: Least Squares Algebra 3-28/58 Least Squares Algebra-3 M is NxN potentially huge

Part 3: Least Squares Algebra 3-29/58 Least Squares Algebra-4 MX = Not identically zero in a digital computer. Rounding error.

Part 3: Least Squares Algebra 3-30/58 Econometrics I Part 3.1 – Regression Algebra and Fit

Part 3: Least Squares Algebra 3-31/58 The Fit of the Regression  “Variation:” In the context of the “model” we speak of covariation of a variable as movement of the variable, usually associated with (not necessarily caused by) movement of another variable.  Total variation = = yM 0 y.  M 0 = I – i(i’i) -1 i’ = the M matrix for X = a column of ones.

Part 3: Least Squares Algebra 3-32/58 Decomposing the Variation Recall the decomposition: Var[y] = Var [E[y|x]] + E[Var [ y | x ]] = Variation of the conditional mean around the overall mean + Variation around the conditional mean function.

Part 3: Least Squares Algebra 3-33/58 Decomposing the Variation of Vector y Decomposition: (This all assumes the model contains a constant term. one of the columns in X is i.) y = Xb + e so M 0 y = M 0 Xb + M 0 e = M 0 Xb + e. (Deviations from means. Why is M 0 e = e? ) yM 0 y = b(X’ M 0 )(M 0 X)b + ee = bXM 0 Xb + ee. (M 0 is idempotent and e’ M 0 X = e’X = 0.) Total sum of squares = Regression Sum of Squares (SSR)+ Residual Sum of Squares (SSE)

Part 3: Least Squares Algebra 3-34/58 The Sum of Squared Residuals b minimizes ee = (y - Xb)(y - Xb). Algebraic equivalences, at the solution b = (XX) -1 Xy e’e = ye (why? e’ = y’ – b’X’ and X’e=0 ) (This is the F.O.C. for least squares.) ee = yy - y’Xb = yy - bXy = ey as eX = 0 (or e’y = y’e)

Part 3: Least Squares Algebra 3-35/58 A Fit Measure y = Xb + e M 0 y = M 0 Xb + M 0 e M 0 e = e. y’M 0 y = bXM 0 Xb + e’e 1 = bXM 0 Xb/yM 0 y + e’e/yM 0 y R 2 = bXM 0 Xb/yM 0 y (VIR) R 2 is bounded by zero if and one only if: (a) There is a constant term in X and (b) The line is computed by linear least squares.

Part 3: Least Squares Algebra 3-36/58 Minimizing ee Any other coefficient vector has a larger sum of squares. A quick proof: d = the vector, not equal to b u = y – Xd = y – Xb + Xb – Xd = e - X(d - b). Then, uu = (y - Xd)(y-Xd) = [y - Xb - X(d - b)][y - Xb - X(d - b)] = [e - X(d - b)] [e - X(d - b)] Expand to find uu = ee + (d-b)XX(d-b) = e’e + v’v > ee

Part 3: Least Squares Algebra 3-37/58 Dropping a Variable An important special case. Suppose b X,z = [b,c] = the regression coefficients in a regression of y on [X,z] b X = [d,0] = is the same, but computed to force the coefficient on z to equal 0. This removes z from the regression. We are comparing the results that we get with and without the variable z in the equation. Results which we can show:  Dropping a variable(s) cannot improve the fit - that is, it cannot reduce the sum of squared residuals.  Adding a variable(s) cannot degrade the fit - that is, it cannot increase the sum of squared residuals.

Part 3: Least Squares Algebra 3-38/58 Adding a Variable Never Increases the Sum of Squares Theorem 3.5 on text page 38. u = the residual in the regression of y on [X,z] e = the residual in the regression of y on X alone, uu = ee – c 2 (z*z*)  ee where z* = M X z.

Part 3: Least Squares Algebra 3-39/58 Adding Variables  R 2 never falls when a z is added to the regression.  A useful general result Squared partial correlation between y and z controlling for x.

Part 3: Least Squares Algebra 3-40/58 Adding Variables to a Model What is the effect of adding PN, PD, PS,

Part 3: Least Squares Algebra 3-41/58 Comparing fits of regressions Make sure the denominator in R 2 is the same - i.e., same left hand side variable. Example, linear vs. loglinear. Loglinear will almost always appear to fit better because taking logs reduces variation.

Part 3: Least Squares Algebra 3-43/58 Adjusted R Squared  Adjusted R 2 (for degrees of freedom?) = 1 - [(n-1)/(n-K)](1 - R 2 )  Degrees of freedom” adjustment suggests something about “unbiasedness.” The ratio is not unbiased.  includes a penalty for variables that don’t add much fit. Can fall when a variable is added to the equation.

Part 3: Least Squares Algebra 3-44/58 Adjusted R 2 What is being adjusted? The penalty for using up degrees of freedom. = 1 - [ee/(n – K)]/[yM 0 y/(n-1)] uses the ratio of two ‘unbiased’ estimators. Is the ratio unbiased? = 1 – [(n-1)/(n-K)(1 – R 2 )] Will rise when a variable is added to the regression? is higher with z than without z if and only if the t ratio on z is in the regression when it is added is larger than one in absolute value.

Part 3: Least Squares Algebra 3-45/58 Full Regression (Without PD) ---------------------------------------------------------------------- Ordinary least squares regression............ LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36 Model size Parameters = 9 Degrees of freedom = 27 Residuals Sum of squares = 596.68995 Standard error of e = 4.70102 Fit R-squared =.99334 <********** Adjusted R-squared =.99137 <********** Model test F[ 8, 27] (prob) = 503.3(.0000) --------+------------------------------------------------------------- Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X --------+------------------------------------------------------------- Constant| -8220.38** 3629.309 -2.265.0317 PG| -26.8313*** 5.76403 -4.655.0001 2.31661 Y|.02214***.00711 3.116.0043 9232.86 PNC| 36.2027 21.54563 1.680.1044 1.67078 PUC| -6.23235 5.01098 -1.244.2243 2.34364 PPT| 9.35681 8.94549 1.046.3048 2.74486 PN| 53.5879* 30.61384 1.750.0914 2.08511 PS| -65.4897*** 23.58819 -2.776.0099 2.36898 YEAR| 4.18510** 1.87283 2.235.0339 1977.50 --------+-------------------------------------------------------------

Part 3: Least Squares Algebra 3-46/58 PD added to the model. R 2 rises, Adj. R 2 falls ---------------------------------------------------------------------- Ordinary least squares regression............ LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36 Model size Parameters = 10 Degrees of freedom = 26 Residuals Sum of squares = 594.54206 Standard error of e = 4.78195 Fit R-squared =.99336 Was 0.99334 Adjusted R-squared =.99107 Was 0.99137 --------+------------------------------------------------------------- Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X --------+------------------------------------------------------------- Constant| -7916.51** 3822.602 -2.071.0484 PG| -26.8077*** 5.86376 -4.572.0001 2.31661 Y|.02231***.00725 3.077.0049 9232.86 PNC| 30.0618 29.69543 1.012.3207 1.67078 PUC| -7.44699 6.45668 -1.153.2592 2.34364 PPT| 9.05542 9.15246.989.3316 2.74486 PD| 11.8023 38.50913.306.7617 1.65056 (NOTE LOW t ratio) PN| 47.3306 37.23680 1.271.2150 2.08511 PS| -60.6202** 28.77798 -2.106.0450 2.36898 YEAR| 4.02861* 1.97231 2.043.0514 1977.50 --------+-------------------------------------------------------------

Part 3: Least Squares Algebra 3-47/58 Econometrics I Part 3.2 – Transformed Data

Part 3: Least Squares Algebra 3-48/58 (Linearly) Transformed Data  How does linear transformation affect the results of least squares? Z = XP for KxK nonsingular P  Based on X, b = (XX) -1 X’y.  You can show (just multiply it out), the coefficients when y is regressed on Z are c = P -1 b  “Fitted value” is Zc = XPP -1 b = Xb. The same!!  Residuals from using Z are y - Zc = y - Xb (we just proved this.). The same!!  Sum of squared residuals must be identical, as y-Xb = e = y-Zc.  R 2 must also be identical, as R 2 = 1 - ee/y’M 0 y (!!).

Part 3: Least Squares Algebra 3-49/58 Understandintg Linear Transformation  Xb is the projection of y into the column space of X. Zc is the projection of y into the column space of Z. But, since the columns of Z are just linear combinations of those of X, the column space of Z must be identical to that of X. Therefore, the projection of y into the former must be the same as the latter, which now produces the other results.)  What are the practical implications of this result? Transformation does not affect the fit of a model to a body of data. Transformation does affect the “estimates.” If b is an estimate of something (  ), then c cannot be an estimate of  - it must be an estimate of P -1 , which might have no meaning at all.

Part 3: Least Squares Algebra 3-50/58 I have a simple question for you. Yesterday, I was estimating a regional production function with yearly dummies. The coefficients of the dummies are usually interpreted as a measure of technical change with respect to the base year (excluded dummy variable). However, I felt that it could be more interesting to redefine the dummy variables in such a way that the coefficient could measure technical change from one year to the next. You could get the same result by subtracting two coefficients in the original regression but you would have to compute the standard error of the difference if you want to do inference. Is this a well known procedure?

Part 3: Least Squares Algebra 3-51/58 Production Function for 247 Dairy Farms: 1993-1998.

Part 3: Least Squares Algebra 3-55/58 Reducing the Dimensionality of X Principal Components  Z = XC Fewer columns than X Includes as much ‘variation’ of X as possible Columns of Z are orthogonal  Why do we do this? Collinearity Combine variables of ambiguous identity such as test scores as measures of ‘ability’  How do we do this? Later in the course. Requires some further results from matrix algebra.

Part 3: Least Squares Algebra 3-56/58 What is a Principal Component?  X = a data matrix (deviations from means)  z = Xp = linear combination of the columns of X.  Choose p to maximize the variation of z.  How? p = eigenvector that corresponds to the largest eigenvalue of X’X

Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Similar presentations

Presentation on theme: "Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Similar presentations

Presentation on theme: "Part 3: Least Squares Algebra 3-1/58 Econometrics I Professor William Greene Stern School of Business Department of Economics."— Presentation transcript:

Similar presentations

About project

Feedback