3.3 Omitted Variable Bias -When a valid variable is excluded, we UNDERSPECIFY THE MODEL and OLS estimates are biased -Consider the true population model:

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

Multiple Regression Analysis
The Simple Regression Model
10.3 Time Series Thus Far Whereas cross sectional data needed 3 assumptions to make OLS unbiased, time series data needs only 2 -Although the third assumption.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
10 Further Time Series OLS Issues Chapter 10 covered OLS properties for finite (small) sample time series data -If our Chapter 10 assumptions fail, we.
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
Assumption MLR.3 Notes (No Perfect Collinearity)
8.4 Weighted Least Squares Estimation Before the existence of heteroskedasticity-robust statistics, one needed to know the form of heteroskedasticity -Het.
Part 1 Cross Sectional Data
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
2.5 Variances of the OLS Estimators
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
Simple Linear Regression
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Multiple Regression Analysis
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
Multiple Regression Analysis
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 3. Asymptotic Properties.
Economics 20 - Prof. Anderson
The Simple Regression Model
Lecture 2 (Ch3) Multiple linear regression
6.4 Prediction -We have already seen how to make predictions about our dependent variable using our OLS estimates and values for our independent variables.
FIN357 Li1 The Simple Regression Model y =  0 +  1 x + u.
1Prof. Dr. Rainer Stachuletz Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Simple Linear Regression Analysis
Ordinary Least Squares
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
3. Multiple Regression Analysis: Estimation -Although bivariate linear regressions are sometimes useful, they are often unrealistic -SLR.4, that all factors.
Introduction to Linear Regression and Correlation Analysis
Hypothesis Testing in Linear Regression Analysis
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
10. Basic Regressions with Times Series Data 10.1 The Nature of Time Series Data 10.2 Examples of Time Series Regression Models 10.3 Finite Sample Properties.
1 Copyright © 2007 Thomson Asia Pte. Ltd. All rights reserved. CH5 Multiple Regression Analysis: OLS Asymptotic 
2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
5. Consistency We cannot always achieve unbiasedness of estimators. -For example, σhat is not an unbiased estimator of σ -It is only consistent -Where.
Economics 20 - Prof. Anderson1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Chapter 12: Correlation and Linear Regression 1.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Regression Overview. Definition The simple linear regression model is given by the linear equation where is the y-intercept for the population data, is.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Multiple Regression Analysis: Estimation
Fundamentals of regression analysis
Multiple Regression Analysis
Basic Econometrics Chapter 4: THE NORMALITY ASSUMPTION:
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Multiple Regression Analysis: OLS Asymptotics
Presentation transcript:

3.3 Omitted Variable Bias -When a valid variable is excluded, we UNDERSPECIFY THE MODEL and OLS estimates are biased -Consider the true population model: -Assume this satisfies all 4 assumptions and that we are concerned with x 1 -if we exclude x 2, our estimation becomes:

3.3 Omitted Variable Bias -From (3.23) we know that: -where Bhats come from regressing y on ALL x’s and deltatilde comes from regressing x 2 on x 1 -since deltatilde depends on independent variables, it is considered fixed -we also know from Theorem 3.1 that Bhats are unbiased estimators, therefore:

3.3 Omitted Variable Bias -From this we can calculate Btilde’s bias: -this bias is often called OMITTED VARIABLE BIAS -From this equation, B 1 tilde is unbiased in two cases: 1)B 2 =0; x 2 has no impact on y in the true model 2)deltatilde=0

3.3 Deltatilde=0 -deltatilde is equal to the covariance of x 1 and x 2 over the variance of x 1, all in the sample -deltatilde is equal to zero only if x 1 and x 2 are uncorrelated -therefore if they are uncorrelated, B 1 hat is unbiased -it is also unbiased if we can show that:

3.3 Omitted Variable Bias -As B 1 hat’s bias depends on B 2 and deltatilde, the following table summarizes the possible biases: Corr(x 1,x 2 )>0Corr(x 1,x 2 )<0 B 2 hat>0Positive BiasNegative Bias B 2 hat<0Negative BiasPositive Bias

3.3 Omitted Variable Bias -the SIZE of the bias is also important, as a small bias may not be cause for concern -therefore the SIZE of B 2 and deltatilde are important -although B 2 is unknown, theory can give us a good idea about its sign -likewise, the direction of correlation between x 1 and x 2 can be guessed through theory -a positive (negative) bias indicates that given random sampling, on average your estimates will be too large (small)

3.3 Example Take the true regression: Where pasta taste depends on experience making pasta and love -While we can measure years of experience, we can’t measure love, so we find that: What is the bias?

3.3 Example We know that the true B 2 should be positive; love improves cooking We can also support a positive correlation between experience and love, if you love someone you spend time cooking for them Therefore B 1 hat will have a positive bias However, since the correlation between experience and love is small, the bias will likewise be small

3.3 Bias Notes -It is important to realize that the direction of bias is ON AVERAGE -a positive bias on average may underestimate in a given sample If There is an UPWARD BIAS If There is a DOWNWARD BIAS And B 1 tilde is BIASED TOWARDS ZERO if it is closer to zero than B 1

3.3 General Omitted Bias Deriving the direction of omitted variable bias with more independent variables is more difficult -Note that correlation between any explanatory variable and the error causes ALL OLS estimates to be biased. -Consider the true and estimated models: x 3 is omitted and correlated with x 1 but not x 2 Both B 1 tilde and B 2 tilde will always be biased unless x 1 and x 2 are uncorrelated

3.3 General Omitted Bias Since our x values can be pairwise correlated, it is hard to derive the bias for our OLS estimates -If we assume that x 1 and x 2 are uncorrelated, we can analyze B 1 tilde’s bias without x 2 having an effect, similar to our 2 variable regression: With this formula similar to (3.45), the previous table can be used to determine bias -Note that much uncorrelation is needed to determine bias

3.4 The Variance of OLS Estimators -We now know the expected value, or central tendency, of the OLS estimators -Next we need information on how much spread OLS has in its sampling distribution -To calculate variance, we impose a HOMOSKEDASTICITY (constant error variance) assumption in order to 1)Simplify variance formulas 2)Give OLS an important efficiency property

Assumption MLR.5 (Homoskedasticity) The error u has the same variance given any values of the explanatory variables. In other words,

Assumption MLR.5 Notes -MLR. 5 assumes that the variance of the error term, u, is the SAME for ANY combination of explanatory variables -If ANY explanatory variable affects the error’s variance, HETEROSKEDASTICITY is present -The above five assumptions are called the GAUSS-MARKOV ASSUMPTIONS -As listed above, they apply only to cross- sectional data with random sampling -time series and panel data analysis require more complicated, related assumptions

Assumption MLR.5 Notes If we let X represent all x variables, combining assumptions 1 through 4 give us: Or as an example: MLR. 5 can be simplified to: Or for example:

3.4 MLR.4 vs. MLR.5 “Assumption MRL. 4 says that the expected value of y, given X, is linear in the parameters – but it certainly depends on x 1, x 2,….,x k.” “Assumption MLR. 5 says that the variance of y, given X, does not depend on the values of the independent variables.” (bold added)

Theorem 3.2 (Sampling Variances of the OLS Slope Estimators) Under assumptions MLR. 1 through MRL. 5, conditional on the sample values of the independent variables, For j= 1, 2,…,k, where R j 2 is the R-squared from regressing x j on all other independent variables (and including an intercept) and SST is the total sample variation in x j :

Theorem 3.2 Notes Note that all FIVE Gauss-Markov assumptions were needed for this theorem Homoskedasticity (MLR. 5) wasn’t needed to prove OLS bias The size of Var(B j hat) is very important -a large variance leads to larger confidence intervals and less accurate hypothesis tests