Announcements Reminder: Exam 1 on Wed.

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

Multiple Regression Analysis
The Simple Regression Model
The Multiple Regression Model.
C 3.7 Use the data in MEAP93.RAW to answer this question
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Assumption MLR.3 Notes (No Perfect Collinearity)
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 6: Interpreting Regression Results Logarithms (Chapter 4.5) Standard Errors (Chapter.
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Lecture 2 (Ch3) Multiple linear regression
Correlation and Regression Analysis
3.1 Ch. 3 Simple Linear Regression 1.To estimate relationships among economic variables, such as y = f(x) or c = f(i) 2.To test hypotheses about these.
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression Analysis: Inference
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Heteroscedasticity Chapter 8
Multiple Regression Analysis: Estimation
The simple linear regression model and parameter estimation
Chapter 4 Basic Estimation Techniques
Ch. 2: The Simple Regression Model
Linear Regression.
Multiple Regression Analysis: Estimation
The Simple Linear Regression Model: Specification and Estimation
THE LINEAR REGRESSION MODEL: AN OVERVIEW
Chapter 5: The Simple Regression Model
More on Specification and Data Issues
Multivariate Regression
The Simple Regression Model
Multiple Regression Analysis
More on Specification and Data Issues
Ch. 2: The Simple Regression Model
Correlation and Simple Linear Regression
Instrumental Variables and Two Stage Least Squares
I271B Quantitative Methods
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Instrumental Variables and Two Stage Least Squares
Migration and the Labour Market
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Correlation and Simple Linear Regression
Instrumental Variables and Two Stage Least Squares
Econometrics Chengyaun yin School of Mathematics SHUFE.
Simple Linear Regression
Multiple Regression Analysis: Estimation
Heteroskedasticity.
The Simple Regression Model
Linear Panel Data Models
Multiple Regression Analysis: OLS Asymptotics
Multiple Regression Analysis: OLS Asymptotics
Chapter 13 Additional Topics in Regression Analysis
Instrumental Variables Estimation and Two Stage Least Squares
More on Specification and Data Issues
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Introduction to Regression
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression Models - Introduction
Presentation transcript:

Announcements Reminder: Exam 1 on Wed. You may bring 1 3x5 inch index card with notes, which I will collect with your exam. No other notes, books, cell phones. You will be given a simple calculator to use. No bathroom breaks. Last year’s exam are on Piazza. Solutions will be posted tonight. Review session by TAs tomorrow 7-9pm

Announcements Reminder: Lab session today Please bring at least two potential research topics.

Recap: Omitted Variable Bias Example: Omitting innate ability in a wage equation When is there no omitted variable bias? If the omitted variable is irrelevant (𝛽 2 =0) or uncorrelated with 𝑋 1 𝛿 1 =0 . Will both be positive The return to education will be overestimated because . It will look as if people with many years of education earn very high wages, but this is partly due to the fact that people with more education are also more able, on average.

Predicting the sign of the Omitted Variable Bias 𝐸[ 𝛽 1 ] = 𝛽 1 + 𝛽 2 𝛿 1 this is called the “bias” term To figure out whether the expected bias from omitting X2 is positive or negative, simply reason through: - is 𝛽 2 likely positive or negative? - is 𝛿 1 likely positive or negative? Exercise in groups to practice this.

Multiple Regression Analysis: Estimation Properties of OLS on any sample of data Fitted values and residuals Algebraic properties of OLS regression Fitted or predicted values Residuals Deviations from regression line sum up to zero Correlations between deviations and regressors are zero Sample averages of y and of the regressors lie on regression line

Multiple Regression Analysis: Estimation Goodness-of-Fit Decomposition of total variation R-squared Question: What happens to the R-squared value if we add one additional explanatory variable to our regression? Answer: It cannot decrease, mathematically impossible. It will increase with the addition of each explanatory variable.

Multiple Regression Analysis: Estimation Example: Explaining log wage Log(wage) = 𝐵 0 + 𝐵 1 𝐸𝑑𝑢𝑐+ 𝐵 1 𝐸𝑥𝑝𝑒𝑟+𝑈 R2 = .1309 Let‘s add IQ to the model and see how it affects R2 (and 𝑩 𝟏 ) Let‘s add a less important variable to the model, which we don‘t think has much to do with wages.. Interpretation: Even with an unimportant variable that doesn‘t significantly affect Y, the R-squared still increases by a little bit. General remark on R-squared Even if R-squared is small, regression may still provide good estimates of ceteris paribus effects. (The purpose of this regression is not to claim that we can exactly predict a person‘s wage.)

Multiple Regression Analysis: Estimation Standard assumptions for the multiple regression model Assumption MLR.1 (Linear in parameters) Difference: Linear in parameters vs. Linear in variables Assumption MLR.2 (Random sampling) In the population, the relation- ship between y and the expla- natory variables is linear The data is a random sample drawn from the population Each data point therefore follows the population equation

Multiple Regression Analysis: Estimation Standard assumptions for the multiple regression model (cont.) Assumption MLR.3 (No perfect collinearity) Remarks on MLR.3 The assumption only rules out perfect collinearity/correlation bet-ween explanatory variables; imperfect correlation is allowed If an explanatory variable is a perfect linear combination of other explanatory variables it is superfluous and may be eliminated Constant variables are also ruled out (collinear with intercept) “In the sample (and therefore in the population), none of the independent variables is constant and there are no exact relationships among the independent variables“

Multiple Regression Analysis: Estimation Example for perfect collinearity: relationships between regressors ShareA and ShareB represent the Share of total campaign spending that was spent by candidate A versus candidate B. This cannot be estimated by Stata because there is an exact linear relationship between them: shareA + shareB = 1 SIMPLE SOLUTION: Either shareA or shareB needs to be dropped from the regression model.

Multiple Regression Analysis: Estimation Standard assumptions for the multiple regression model (cont.) Assumption MLR.4 (Zero conditional mean) In a multiple regression model, the zero conditional mean assumption is easier to justify because fewer things end up in the error Example: Explaining variation in test scores across public schools The value of the explanatory variables must contain no information about the mean of the unobserved factors If avginc was not included in the regression, it would end up in the error term; it would then be hard to defend that expend is uncorrelated with the error

Multiple Regression Analysis: Estimation Discussion of the zero mean conditional assumption Explanatory variables that are correlated with the error term are called endogenous; endogeneity is a violation of assumption MLR.4 Explanatory variables that are uncorrelated with the error term are called exogenous; MLR.4 holds if all explanat. var. are exogenous Exogeneity is the key assumption for a causal interpretation of the regression, and for unbiasedness of the OLS estimators Can we find a source of exogenous variation in X if it is typically endogenous? Theorem 3.1 (Unbiasedness of OLS) Unbiasedness is an average property in repeated samples; in one given sample, the estimates may still be far away from the true values

Multiple Regression Analysis: Estimation Standard assumptions for the multiple regression model (cont.) Assumption MLR.5 (Homoscedasticity) Example: Wage equation Short hand notation The value of the explanatory variables must contain no information about the variance of the unobserved factors This assumption may also be hard to justify in many cases All explanatory variables are collected in a random vector with

Multiple Regression Analysis: Estimation Theorem 3.2 (Sampling variances of OLS slope estimators) Under assumptions MLR.1 – MLR.5: Variance of the error term Total sample variation in explanatory variable xj: R-squared from a regression of explanatory variable xj on all other independent variables (including a constant)

Multiple Regression Analysis: Estimation Components of OLS Variances: 1) The error variance A high error variance increases the sampling variance because there is more “noise“ in the equation A large error variance always makes estimates more imprecise The error variance does not decrease with sample size 2) The total sample variation in the explanatory variable More sample variation leads to more precise estimates Total sample variation automatically increases with the sample size Increasing the sample size is thus a way to get more precise estimates

Multiple Regression Analysis: Estimation 3) Linear relationships among the independent variables Sampling variance of increases as the explanatory variable can better be linearly explained by other independent variables The problem of almost linearly dependent explanatory variables is called multicollinearity (i.e. for some ) Regress on all other independent variables (including a constant) A high R-squared of this regression means that xj can be linearly explained pretty well by the other independent variables--- this is bad for the precision of estimating .

Multiple Regression Analysis: Estimation Example of Multicollinearity: Explaining log wage The typical "Mincer“ equation (named after Jacob Mincer) is: Log(wage) = 𝐵 0 + 𝐵 1 𝐸𝑑𝑢𝑐+ 𝐵 1 𝐸𝑥𝑝𝑒𝑟+ 𝐵 2 𝐸𝑥𝑝𝑒 𝑟 2 +𝑈 Why include Experience-squared? We can estimate this using the wage2 dataset: DEMO Interpretation: In this dataset, it‘s better not to include 𝐸𝑥𝑝𝑒 𝑟 2 because our range of ages is so narrow (28-38), 𝐸𝑥𝑝𝑒 𝑟 2 is highly correlated with 𝐸𝑥𝑝𝑒𝑟 itself. What about the wage1 dataset??

Multiple Regression Analysis: Estimation Second example for multicollinearity Average standardized test score of school Expenditures for teachers Expenditures for in- structional materials Other ex- penditures The different expenditure categories will be strongly correlated because if a school has a lot of resources it will spend a lot on everything. It will be hard to estimate the differential effects of different expenditure categories because all expenditures are either high or low. For precise estimates of the differential effects, one would need information about situations where expenditure categories change differentially. As a consequence, sampling variance of the estimated effects will be large.

Multiple Regression Analysis: Estimation Discussion of the multicollinearity problem In the above example, it would probably be better to lump all expen-diture categories together because effects cannot be disentangled In other cases, dropping some independent variables may reduce multicollinearity (but this may lead to omitted variable bias) Only the sampling variance of the variables involved in multicollinearity will be inflated; the estimates of other effects are not affected. Note that multicollinearity is not a violation of MLR.3 in the strict sense Multicollinearity may be detected through „variance inflation factors“ As an (arbitrary) rule of thumb, the variance inflation factor should not be larger than 10

Multiple Regression Analysis: Estimation Estimating the error variance Theorem 3.3 (Unbiased estimator of the error variance) An unbiased estimate of the error variance can be obtained by substracting the number of estimated regression coefficients from the number of observations. The number of observations minus the number of estimated parameters is also called the degrees of freedom.

Multiple Regression Analysis: Estimation Estimation of the sampling variances of the OLS estimators Note that these formulas are only valid under assumptions MLR.1-MLR.5 (in particular, there has to be homoscedasticity) The true sampling variation of the estimated Plug in for the unknown The estimated samp- ling variation of the estimated

Multiple Regression Analysis: Estimation Efficiency of OLS: The Gauss-Markov Theorem Under assumptions MLR.1 - MLR.5, OLS is unbiased However, under these same assumptions there may be many other estimators that are unbiased Which one is the unbiased estimator with the smallest variance? In order to answer this question one usually limits the comparison to linear estimators, i.e. estimators linear in the dependent variable May be an arbitrary function of the sample values of all the explanatory variables; the OLS estimator can be shown to be of this form

Multiple Regression Analysis: Estimation Theorem 3.4 (Gauss-Markov Theorem) Under assumptions MLR.1 - MLR.5, the OLS estimators are the best linear unbiased estimators (BLUE) of the regression coefficients, i.e. OLS is only the best (BLUE) estimator if MLR.1 – MLR.5 hold; Question: What if there is heteroscedasticity? Is the estimator still: B? L? U? for all for which .