QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)

Slides:



Advertisements
Similar presentations
Random Assignment Experiments
Advertisements

C 3.7 Use the data in MEAP93.RAW to answer this question
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
Lecture 25 Multiple Regression Diagnostics (Sections )
Lecture 24: Thurs., April 8th
Chapter 2 – Tools of Positive Analysis
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Correlation Scatter Plots Correlation Coefficients Significance Test.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
QM222 Class 19 Section D1 Tips on your Project
Thursday, May 12, 2016 Report at 11:30 to Prairieview
QM222 Class 12 Section D1 1. A few Stata things 2
For presentation schedule and makeup test signups, see:
Chapter 2 Linear regression.
Sit in your permanent seat
Sit in your permanent seat
Chapter 14 Introduction to Multiple Regression
QM222 Nov. 9 Section D1 Visualizing Using Graphs More on your project Test returned QM222 Fall 2016 Section D1.
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
Chapter 4 Basic Estimation Techniques
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.
Data Analysis Module: Correlation and Regression
CHAPTER 7 LINEAR RELATIONSHIPS
Review Multiple Regression Multiple-Category Dummy Variables
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
A linear approach to predicting house prices
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 19 Omitted Variable Bias pt 2 Different slopes for a single variable QM222 Fall 2017 Section A1.
QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient.
QM222 Class 18 Omitted Variable Bias
QM222 Class 9 Section D1 1. Multiple regression – review and in-class exercise 2. Goodness of fit 3. What if your Dependent Variable is an 0/1 Indicator.
QM222 A1 More on Excel QM222 Fall 2017 Section A1.
QM222 Class 15 Today’s New topic: Time Series
Correlation and Regression Basics
QM222 A1 On tests and projects
QM222 Class 8 Section A1 Using categorical data in regression
PowerPoint Template – delete this slide
QM222 Class 8 Section D1 1. Review: coefficient statistics: standard errors, t-statistics, p-values (chapter 7) 2. Multiple regression 3. Goodness of fit.
Chapter 12: Regression Diagnostics
26134 Business Statistics Week 6 Tutorial
QM222 A1 Nov. 27 More tips on writing your projects
QM222 A1 How to proceed next in your project Multicollinearity
(Residuals and
Correlation and Regression Basics
QM222 Class 14 Today’s New topic: What if the Dependent Variable is a Dummy Variable? QM222 Fall 2017 Section A1.
CHAPTER 26: Inference for Regression
QM222 Your regressions and the test
QM222 Dec. 5 Presentations For presentation schedule, see:
Analysis of Covariance ANCOVA
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
MGS 3100 Business Analysis Regression Feb 18, 2016
CHAPTER 3 Describing Relationships
Scatterplots, Association, and Correlation
Scatterplots contd: Correlation The regression line
Presentation transcript:

QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.) The bias on a regression coefficient due to leaving out confounding factors from a Regression QM222 Fall 2016 Section D1

Assignment 4 – Due Friday at 6pm: Hard copy and online Part A: Current Project Status If you have changed or added any aspect of the Current Project Status (Q1-6), revise it. Part B: Questions on your dependent variable (if you have > 1, choose the most important one): If you have a numeric dependent variable, create a histogram of your dependent variable in Stata (histogram varname). If you have a categorical dependent variable, tabulate it with the Stata command: tab variablename, missing. What do you learn from this histogram or tabulation? If you have a numeric dependent variable, get descriptive statistics for your (key) dependent variable in Stata by using summarize variablename, detail. If you have a categorical dependent variable, make it into a single indicator variable, making sure that any missing values are left as missing. Then summarize varname, detail. What important things do you learn about the distribution of your dependent variable from these descriptive statistics? Answer in 1-4 sentences. Based on this evidence, are there any observations with values that seem like mistakes? Should you drop these observations or correct the mistake? Explain and drop. (For numeric variables only) Based on this evidence, is your dependent variable very skewed, and particularly are there any extreme outliers? If so, do you think we should top- code these values (or use logs etc.)? Explain why. Then top-code or change into logs if appropriate. QM222 Fall 2016 Section D1

Assignment 4 – Due Friday at 6pm: Hard copy and online Part C: Questions on your key explanatory variable (if you have > 1, choose the most important one): If it is a numeric variable, create a histogram of it in Stata . If it is a categorical variable, tabulate it with the Stata command: tab variablename, missing. If it is a numeric variable, get descriptive statistics for it summarize variablename, detail. If it is categorical, make it into a single indicator (dummy) variable, keeping missing values as missing. What important things do you learn about the distribution of your key explanatory variable from these descriptive statistics? Based on this evidence, are there any observations with values that seem like mistakes? Do you think we should drop these observations or correct the mistake? Explain, and drop if appropriate. Based on this evidence, is your explanatory variable very skewed, and particularly are there any extreme outliers? If so, do you think we should top- code these values (or use logs etc.)? Explain (and do it). . Then top-code or change into logs if appropriate. QM222 Fall 2016 Section D1

Assignment 4 – Due Friday at 6pm: Hard copy and online Part D: Questions on Correlation: Correlate all variables you plan to use. What important things do you learn about the relationship between your dependent variable(s) and your key explanatory variable(s) from this correlation table? Part E: Simple Regression: Run a simple regression of your key dependent variable on your key explanatory variable (or one of them, if you have several.) What important things do you learn about the relationship between your key dependent and explanatory variables from this regression? In your answer, include a discussion of the explanatory variable’s coefficient, its t- statistic and its confidence interval. QM222 Fall 2016 Section D1

Omitted Variable Bias QM222 Fall 2016 Section D1

Why know about this? It is useful in your projects to understand why coefficients change when you add a variable. So you can know which coefficient answers your question. It is useful in your projects to understand what possibly confounding variables you should search for. Also, if there is a confounding variable that you cannot measure, this will help you predict what the sign of the omitted variable bias is. Finally, it will be on the test. QM222 Fall 2015 Section D1

So isolating each factor’s effect Multiple regression measures the individual impacts of different factors on Y…. Multiple regression helps us to measure the individual impacts of different factors on our dependent variable Y… Holding the other factors constant So isolating each factor’s effect QM222 Fall 2016 Section D1

Condo’s Price = 520729 – 46969 BEACON Price = 6981 + 409 SIZE + 32936 BEACON Why are the coefficients on Beacon so different? The coefficient on Beacon in the first (simple) regression says: Across all the properties in our dataset, those on Beacon cost $46,239 less on average.   In contrast, the coefficient on Beacon in the multiple regression says: If we compare two condos of the same size, one on Beacon and one not on Beacon, the one on Beacon costs $32,946 more. QM222 Fall 2016 Section D1

If you really want to measure the effect of X1 alone (e. g If you really want to measure the effect of X1 alone (e.g. Beacon), you need to control for possibly confounding factors. If you don’t, the coefficient on X1 is biased. We call this omitted or missing variable bias. Omitted variable bias occurs when The omitted variable has an effect on the dependent variable, AND 2. The omitted variable is correlated with the explanatory variable of interest. QM222 Fall 2016 Section D1

Omitted variable bias in the condo case Price = 520729 – 46969 BEACON (simple regression) In a simple regression of Y on X1, the coefficient b1 measures the combined effects of: the direct (or often called “causal”) effect of the included variable X1 on Y PLUS an “omitted variable bias” due to factors that were left out (omitted) from the regression. Often we want to measure the direct, causal effect. In this case, the coefficient in the simple regression is biased. QM222 Fall 2016 Section D1

Another example: How does getting more education affect salaries? Let’s say you un this regression: Income = 20,000 + 4000 Education (in years). But, the coefficient 4000 may pick up the fact that more intelligent people have both more education and higher income. If you could add the variable IQ to the regression, the coefficient on education would hold IQ constant. QM222 Fall 2016 Section D1

We are going to learn methods so that you can understand Omitted Variable Bias- first with graphs Really, both being on Beacon and price affect price, as in the multiple regression Y = b0 + b1X1 + b2X2 Let’s call this the Full model. Let’s call b1 and b2 the direct effects. QM222 Fall 2016 Section D1

The mis-specified or Limited model However, in the simple (1 X variable) regression, we measure only a (combined) effect of Beacon on price. Call its coefficient c1 Y = c0 + c1X1 Let’s call c1 is the combined effect. QM222 Fall 2016 Section D1

The reason that there is a bias on X­1 is that there is a Background Relationship between the X’s We also know that there is a relationship between X­1 (Beacon) and X2 (Size). We call this the Background Relationship: . correlate price size Beacon_Street (obs=1085) | price size Beacon~t -------------+--------------------------- price | 1.0000 size | 0.8655 1.0000 Beacon_Str~t | -0.0552 -0.1081 1.0000 This background relationship, shown here as a1, is negative. QM222 Fall 2016 Section D1

Let’s combine all 3 pictures: the full model, the limited model & the background relationship The effect of X­1 on Y has two channels. The first one is the direct effect b1. The second channel is the indirect effect through X­2. When X­1 changes, X2 also tends to change (a1) This change in X­2 has another effect on Y (b2) QM222 Fall 2016 Section D1

If we want the direct effect only When we include both X­1 and X2 in a multiple regression, we get the coefficient b1 – the direct effect of X­1. QM222 Fall 2016 Section D1