Class 18 – Thursday, Nov. 11 Omitted Variables Bias

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Topic 12: Multiple Linear Regression
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Qualitative Variables and
1 Multiple Regression Response, Y (numerical) Explanatory variables, X 1, X 2, …X k (numerical) New explanatory variables can be created from existing.
Chapter 13 Multiple Regression
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Chapter 12 Multiple Regression
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs., April 8th
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Ch. 14: The Multiple Regression Model building
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Hypothesis Tests and Confidence Intervals in Multiple Regressors
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
1 MF-852 Financial Econometrics Lecture 9 Dummy Variables, Functional Form, Trends, and Tests for Structural Change Roy J. Epstein Fall 2003.
LESSON 5 Multiple Regression Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-1.
Lecture 3-3 Summarizing r relationships among variables © 1.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Chapter 13 Multiple Regression
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
9.1 Chapter 9: Dummy Variables A Dummy Variable: is a variable that can take on only 2 possible values: yes, no up, down male, female union member, non-union.
Chapter 6 Introduction to Multiple Regression. 2 Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
The General Linear Model. Estimation -- The General Linear Model Formula for a straight line y = b 0 + b 1 x x y.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Chapter 14 Introduction to Multiple Regression
Applied Biostatistics: Lecture 2
Multiple Regression Analysis and Model Building
STAT 250 Dr. Kari Lock Morgan
Multiple Regression Analysis with Qualitative Information
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Soc 3306a: ANOVA and Regression Models
Multiple Regression Analysis with Qualitative Information
Soc 3306a Lecture 11: Multivariate 4
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Seminar in Economics Econ. 470
I. Introduction and Data Collection C. Conducting a Study
Presentation transcript:

Class 18 – Thursday, Nov. 11 Omitted Variables Bias Specially Constructed Explanatory Variables Interactions Squared Terms for Curvature Dummy variables for categorical variables (next class) I will e-mail you Homework 7 after class. It will be due next Thursday.

California Test Score Data The California Standardized Testing and Reporting (STAR) data set californiastar.JMP contains data on test performance, school characteristics and student demographic backgrounds from 1998-1999. Average Test Score is the average of the reading and math scores for a standardized test administered to 5th grade students. One interesting question: What would be the causal effect of decreasing the student-teacher ratio by one student per teacher?

Multiple Regression and Causal Inference Goal: Figure out what the causal effect on average test score would be of decreasing student-teacher ratio and keeping everything else in the world fixed. Lurking variable: A variable that is associated with both average test score and student-teacher ratio. In order to figure out whether a drop in student-teacher ratio causes higher test scores, we want to compare mean test scores among schools with different student-teacher ratios but the same values of the lurking variables. If we include all of the lurking variables in the multiple regression model, the coefficient on student-teacher ratio represents the change in the mean of test scores that is caused by a one unit increase in student-teacher ratio.

Omitted Variables Bias Schools with many English learners tend to have worst resources. The multiple regression that shows how mean test score changes when student teacher ratio changes but percent of English learners is held fixed gives a better idea of the causal effect of the student-teacher ratio than the simple linear regression that does not hold percent of English learners fixed. Omitted variables bias of omitting percentage of English learners = -2.28-(-1.10)=-1.28.

Omitted Variables Bias: General Formula What happens if we omit a lurking variable from the regression? Suppose we are interested in the causal effect of on y and believe that there are lurking variables and that is the causal effect of on y. If we omit the lurking variable, , then the multiple regression will be estimating the coefficient as the coefficient on . How different are and .

Omitted Variables Bias Formula Suppose that Then Formula tells us about direction and magnitude of bias from omitting a variable in estimating a causal effect. Formula also applies to least squares estimates, i.e., Key point: In order for there to be omitted variable bias, the omitted variable must be associated with both the explanatory variable of interest and the response.

Omitted Variables Bias Examples Would you expect the slope coefficient on X to be too high, too low or have no bias for the regression that omits the given variable? Y = Test Score, X= Number of Music Classes Taken, Omitted Variable = Student Ability Y = Salary, X = Gender (1=Female, 0=Male), Omitted Variable = Education

Key Warning About Multiple Regression Even if we have included many lurking variables in the multiple regression, we may have failed to include one or not have enough data to include one. There will then be omitted variables bias. The best way to study causal effects is to do a randomized experiment (coming up next week).

Specially Constructed Explanatory Variables Interaction variables Squared and higher polynomial terms for curvature Dummy variables for categorical variables.

Interaction Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X1 and X2). There is an interaction between X1 and X2 if the impact of an increase in X2 on Y depends on the level of X1. To incorporate interaction in multiple regression model, we add the explanatory variable . There is evidence of an interaction if the coefficient on is significant (t-test has p-value < .05).

Interaction variables in JMP To add an interaction variable in Fit Model in JMP, add the usual explanatory variables first, then highlight in the Select Columns box and in the Construct Model Effects Box. Then click Cross in the Construct Model Effects Box. JMP creates the explanatory variable

Interaction Example The number of car accidents on a stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. A city alderman has decided to ask the county sheriff to provide him with statistics covering the last few years with the intention of examining these data statistically so that she can introduce new speed laws that will reduce traffic accidents. accidents.JMP contains data for different time periods on the number of cars passing along the stretch of road, the average speed of the cars and the number of accidents during the time period.

Interactions in Accident Data Increases in speed have a worse impact on number of accidents when there are a large number of cars on the road than when there are a small number of cars on the road.

Notes on Interactions The need for interactions is not easily spotted with residual plots. It is best to try including an interaction term and see if it is significant. To understand better the multiple regression relationship when there is an interaction, it is useful to make an Interaction Plot. After Fit Model, click red triangle next to Response, click Factor Profiling and then click Interaction Plots.

Plot on left displays E(Accidents|Cars, Speed=56 Plot on left displays E(Accidents|Cars, Speed=56.6), E(Accidents|Cars,Speed=62.5) as a function of Cars. Plot on right displays E(Accidents|Cars=12.6), E(Accidents| Cars,Speed=7) as a function of Speed. We can see that the impact of speed on Accidents depends critically on the number of cars on the road.

Fast Food Locations An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

Multivariate Correlations Scatterplot Matrix Revenue Income Age 1.0000 0.4355 0.3769 0.0201 Correlations 900 1000 1100 1200 1300 20 25 30 35 5.0 7.5 10.0 12.5 15.0 Scatterplot Matrix Multivariate

Squared Terms for Curvature To capture a quadratic relationship between X1 and Y, we add as an explanatory variable. To do this in JMP, add X1 to the model, then highlight X1 in the Select Columns box and highlight X1 in the Construct Model Effects box and click Cross.

Notes on Squared Terms for Curvature If t-test for squared term has p-value <.05, indicating that there is curvature, then we keep the linear term in the model regardless of its p-value. Coefficients in model with squared terms for curvature are tricky to interpret. If we have explanatory variables and in the model, then we can’t keep fixed and change As with interactions, to better understand the multiple regression relationship when there is a squared term for curvature, a plot is useful. After Fit Model, click red triangle next to Response, click Factor Profiling and click Profiler. JMP shows a plot for each explanatory variable of how the mean of Y changes as the explanatory variable is increased and the other explanatory variables are held fixed at their mean value.

Left hand plot is a plot of Mean Revenue for different levels of income when Age is held fixed at its mean value of 8.392. The 1208.257+/-32.825 is a confidence interval for the mean response at income=24.2, Age=8.392.

Regression Model for Fast Food Chain Data Interactions and polynomial terms can be combined in a multiple regression model. Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.