Part 7: Estimating the Variance of b 7-1/53 Econometrics I Professor William Greene Stern School of Business Department of Economics.

Slides:



Advertisements
Similar presentations
Econometric Modeling Through EViews and EXCEL
Advertisements

Managerial Economics in a Global Economy
Multiple Regression Analysis
Classical Linear Regression Model
More on understanding variance inflation factors (VIFk)
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
Part 17: Nonlinear Regression 17-1/26 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
3. Binary Choice – Inference. Hypothesis Testing in Binary Choice Models.
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Part 8: Hypothesis Testing 8-1/50 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Objectives (BPS chapter 24)
Analysis of Economic Data
Multiple regression analysis
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
Chapter 11 Multiple Regression.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Basics of regression analysis
Part 5: Regression Algebra and Fit 5-1/34 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 14: Generalized Regression 14-1/46 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Econometric Methodology. The Sample and Measurement Population Measurement Theory Characteristics Behavior Patterns Choices.
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
Microeconometric Modeling William Greene Stern School of Business New York University.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Introduction to Linear Regression
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 3: Basic techniques for innovation data analysis. Part II: Introducing regression.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2013 William Greene Department of Economics Stern School.
[Topic 1-Regression] 1/37 1. Descriptive Tools, Regression, Panel Data.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Multiple Regression II 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 2) Terry Dielman.
1/61: Topic 1.2 – Extensions of the Linear Regression Model Microeconometric Modeling William Greene Stern School of Business New York University New York.
1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.
Statistical Inference and Regression Analysis: GB
Chapter 15 Multiple Regression Model Building
The Simple Linear Regression Model: Specification and Estimation
John Loucks St. Edward’s University . SLIDES . BY.
Microeconometric Modeling
Microeconometric Modeling
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Chengyuan Yin School of Mathematics
Multiple Regression Chapter 14.
Econometrics Chengyaun yin School of Mathematics SHUFE.
Simple Linear Regression
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Econometrics I Professor William Greene Stern School of Business
Microeconometric Modeling
Microeconometric Modeling
Econometrics I Professor William Greene Stern School of Business
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2019 William Greene Department of Economics Stern School.
Microeconometric Modeling
Econometrics I Professor William Greene Stern School of Business
Presentation transcript:

Part 7: Estimating the Variance of b 7-1/53 Econometrics I Professor William Greene Stern School of Business Department of Economics

Part 7: Estimating the Variance of b 7-2/53 Econometrics I Part 7 – Estimating the Variance of b

Part 7: Estimating the Variance of b 7-3/53 Context The true variance of b|X is  2 (XX) -1. We consider how to use the sample data to estimate this matrix. The ultimate objectives are to form interval estimates for regression slopes and to test hypotheses about them. Both require estimates of the variability of the distribution. We then examine a factor which affects how "large" this variance is, multicollinearity.

Part 7: Estimating the Variance of b 7-4/53 Estimating  2 Using the residuals instead of the disturbances: The natural estimator: ee/N as a sample surrogate for  /n Imperfect observation of  i, e i =  i - (  - b)x i Downward bias of ee/N. We obtain the result E[ee|X] = (N-K)  2

Part 7: Estimating the Variance of b 7-5/53 Expectation of ee

Part 7: Estimating the Variance of b 7-6/53 Method 1:

Part 7: Estimating the Variance of b 7-7/53 Estimating σ 2 The unbiased estimator is s 2 = ee/(N-K). “Degrees of freedom correction” Therefore, the unbiased estimator of  2 is s 2 = ee/(N-K)

Part 7: Estimating the Variance of b 7-8/53 Method 2: Some Matrix Algebra

Part 7: Estimating the Variance of b 7-9/53 Decomposing M

Part 7: Estimating the Variance of b 7-10/53 Example: Characteristic Roots of a Correlation Matrix

Part 7: Estimating the Variance of b 7-11/53

Part 7: Estimating the Variance of b 7-12/53 Gasoline Data

Part 7: Estimating the Variance of b 7-13/53 X’X and its Roots

Part 7: Estimating the Variance of b 7-14/53 Var[b|X] Estimating the Covariance Matrix for b|X The true covariance matrix is  2 (X’X) -1 The natural estimator is s 2 (X’X) -1 “Standard errors” of the individual coefficients are the square roots of the diagonal elements.

Part 7: Estimating the Variance of b 7-15/53 X’X (X’X) -1 s 2 (X’X) -1

Part 7: Estimating the Variance of b 7-16/53 Standard Regression Results Ordinary least squares regression LHS=G Mean = Standard deviation = Number of observs. = 36 Model size Parameters = 7 Degrees of freedom = 29 Residuals Sum of squares = Standard error of e = <= sqr[ /(36 – 7)] Fit R-squared = Adjusted R-squared = Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| PG| *** Y|.02365*** TREND| ** PNC| PUC| PPT| **

Part 7: Estimating the Variance of b 7-17/53 Bootstrapping Some assumptions that underlie it - the sampling mechanism Method: 1. Estimate using full sample: --> b 2. Repeat R times: Draw N observations from the n, with replacement Estimate  with b(r). 3. Estimate variance with V = (1/R)  r [b(r) - b][b(r) - b]’

Part 7: Estimating the Variance of b 7-18/53 Bootstrap Application matr;bboot=init(3,21,0.)$ Store results here name;x=one,y,pg$ Define X regr;lhs=g;rhs=x$ Compute b calc;i=0$ Counter Proc Define procedure regr;lhs=g;rhs=x;quietly$ … Regression matr;{i=i+1};bboot(*,i)=b$... Store b(r) Endproc Ends procedure exec;n=20;bootstrap=b$ 20 bootstrap reps matr;list;bboot' $ Display results

Part 7: Estimating the Variance of b 7-19/ Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| *** Y|.03692*** PG| *** Completed 20 bootstrap iterations Results of bootstrap estimation of model. Model has been reestimated 20 times. Means shown below are the means of the bootstrap estimates. Coefficients shown below are the original estimates based on the full sample. bootstrap samples have 36 observations Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X B001| *** B002|.03692*** B003| *** Results of Bootstrap Procedure

Part 7: Estimating the Variance of b 7-20/53 Bootstrap Replications Full sample result Bootstrapped sample results

Part 7: Estimating the Variance of b 7-21/53 OLS vs. Least Absolute Deviations Least absolute deviations estimator Residuals Sum of squares = Standard error of e = Fit R-squared = Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X |Covariance matrix based on 50 replications. Constant| *** Y|.03784*** PG| *** Ordinary least squares regression Residuals Sum of squares = Standard error of e = Standard errors are based on Fit R-squared = bootstrap replications Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X Constant| *** Y|.03692*** PG| ***

Part 7: Estimating the Variance of b 7-22/53 Quantile Regression: Application of Bootstrap Estimation

Part 7: Estimating the Variance of b 7-23/53 Quantile Regression  Q(y|x,  ) =  x,  = quantile  Estimated by linear programming  Q(y|x,.50) =  x,.50  median regression  Median regression estimated by LAD (estimates same parameters as mean regression if symmetric conditional distribution)  Why use quantile (median) regression? Semiparametric Robust to some extensions (heteroscedasticity?) Complete characterization of conditional distribution

Part 7: Estimating the Variance of b 7-24/53 Estimated Variance for Quantile Regression  Asymptotic Theory  Bootstrap – an ideal application

Part 7: Estimating the Variance of b 7-25/53

Part 7: Estimating the Variance of b 7-26/53  =.25  =.50  =.75

Part 7: Estimating the Variance of b 7-27/53

Part 7: Estimating the Variance of b 7-28/53

Part 7: Estimating the Variance of b 7-29/53 Multicollinearity Not “short rank,” which is a deficiency in the model. A characteristic of the data set which affects the covariance matrix. Regardless,  is unbiased. Consider one of the unbiased coefficient estimators of  k. E[b k ] =  k Var[b] =  2 (X’X) -1. The variance of b k is the kth diagonal element of  2 (X’X) -1. We can isolate this with the result in your text. Let [X,z] be [Other xs, x k ] = [X 1,x 2 ] (a convenient notation for the results in the text). We need the residual maker, M X. The general result is that the diagonal element we seek is [zM 1 z] -1, which we know is the reciprocal of the sum of squared residuals in the regression of z on X.

Part 7: Estimating the Variance of b 7-30/53 I have a sample of observations in a logit model. Two predictors are highly collinear (pairwaise corr.96; p<.001); vif are about 12 for eachof them; average vif is 2.63; condition number is 10.26; determinant of correlation matrix is ; the two lowest eigen vales are and Centering/standardizing variables does not change the story. Note: most obs are zeros for these two variables; I only have approx 600 non-zero obs for these two variables on a total of obs. Both variable coefficients are significant and must be included in the model (as per specification). -- Do I have a problem of multicollinearity?? -- Does the large sample size attenuate this concern, even if I have a correlation of.96? -- What could I look at to ascertain that the consequences of multi-collinearity are not a problem? -- Is there any reference I might cite, to say that given the sample size, it is not a problem? I hope you might help, because I am really in trouble!!!

Part 7: Estimating the Variance of b 7-31/53 Variance of Least Squares

Part 7: Estimating the Variance of b 7-32/53 Multicollinearity

Part 7: Estimating the Variance of b 7-33/53 Gasoline Market Regression Analysis: logG versus logIncome, logPG The regression equation is logG = logIncome logPG Predictor Coef SE Coef T P Constant logIncome logPG S = R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Part 7: Estimating the Variance of b 7-34/53 Gasoline Market Regression Analysis: logG versus logIncome, logPG,... The regression equation is logG = logIncome logPG logPNC logPUC logPPT Predictor Coef SE Coef T P Constant logIncome logPG logPNC logPUC logPPT S = R-Sq = 96.0% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total The standard error on logIncome doubles when the three variables are added to the equation.

Part 7: Estimating the Variance of b 7-35/53 Condition Number and Variance Inflation Factors Condition number larger than 30 is ‘large.’ What does this mean?

Part 7: Estimating the Variance of b 7-36/53

Part 7: Estimating the Variance of b 7-37/53 The Longley Data

Part 7: Estimating the Variance of b 7-38/53 NIST Longley Solution

Part 7: Estimating the Variance of b 7-39/53 Excel Longley Solution

Part 7: Estimating the Variance of b 7-40/53 The NIST Filipelli Problem

Part 7: Estimating the Variance of b 7-41/53 Certified Filipelli Results

Part 7: Estimating the Variance of b 7-42/53 Minitab Filipelli Results

Part 7: Estimating the Variance of b 7-43/53 Stata Filipelli Results

Part 7: Estimating the Variance of b 7-44/53 Even after dropping two (random columns), results are only correct to 1 or 2 digits.

Part 7: Estimating the Variance of b 7-45/53 Regression of x2 on all other variables

Part 7: Estimating the Variance of b 7-46/53 Using QR Decomposition

Part 7: Estimating the Variance of b 7-47/53 Multicollinearity There is no “cure” for collinearity. Estimating something else is not helpful (principal components, for example). There are “measures” of multicollinearity, such as the condition number of X and the variance inflation factor. Best approach: Be cognizant of it. Understand its implications for estimation. What is better: Include a variable that causes collinearity, or drop the variable and suffer from a biased estimator? Mean squared error would be the basis for comparison. Some generalities. Assuming X has full rank, regardless of the condition, b is still unbiased Gauss-Markov still holds

Part 7: Estimating the Variance of b 7-48/53 Specification and Functional Form: Nonlinearity

Part 7: Estimating the Variance of b 7-49/53 Log Income Equation Ordinary least squares regression LHS=LOGY Mean = Estimated Cov[b1,b2] Standard deviation = Number of observs. = Model size Parameters = 7 Degrees of freedom = Residuals Sum of squares = Standard error of e = Fit R-squared = Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X AGE|.06225*** AGESQ| *** D Constant| *** MARRIED|.32153*** HHKIDS| *** FEMALE| EDUC|.05542*** Average Age = Estimated Partial effect = – 2(.00074) = Estimated Variance e-6 + 4( ) 2 ( e-10) + 4( )( e-8) = e-08. Estimated standard error =

Part 7: Estimating the Variance of b 7-50/53 Specification and Functional Form: Interaction Effect

Part 7: Estimating the Variance of b 7-51/53 Interaction Effect Ordinary least squares regression LHS=LOGY Mean = Standard deviation = Number of observs. = Model size Parameters = 4 Degrees of freedom = Residuals Sum of squares = Standard error of e = Fit R-squared = Adjusted R-squared = Model test F[ 3, 27318] (prob) = 82.4(.0000) Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X Constant| *** AGE|.00227*** FEMALE|.21239*** AGE_FEM| *** Do women earn more than men (in this sample?) The coefficient on FEMALE would suggest so. But, the female “difference” is *Age. At average Age, the effect is ( ) =

Part 7: Estimating the Variance of b 7-52/53

Part 7: Estimating the Variance of b 7-53/53