QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)

Slides:

Advertisements

Similar presentations

Random Assignment Experiments

Advertisements

C 3.7 Use the data in MEAP93.RAW to answer this question

Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.

Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.

Econ 140 Lecture 121 Prediction and Fit Lecture 12.

Lecture 25 Multiple Regression Diagnostics (Sections )

Lecture 24: Thurs., April 8th

Chapter 2 – Tools of Positive Analysis

Stat 112: Lecture 9 Notes Homework 3: Due next Thursday

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.

Correlation Scatter Plots Correlation Coefficients Significance Test.

Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.

Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.

AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.

QM222 Class 19 Section D1 Tips on your Project

Thursday, May 12, 2016 Report at 11:30 to Prairieview

QM222 Class 12 Section D1 1. A few Stata things 2

For presentation schedule and makeup test signups, see:

Chapter 2 Linear regression.

Sit in your permanent seat

Sit in your permanent seat

Chapter 14 Introduction to Multiple Regression

QM222 Nov. 9 Section D1 Visualizing Using Graphs More on your project Test returned QM222 Fall 2016 Section D1.

QM222 Class 9 Section A1 Coefficient statistics

QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.

Chapter 4 Basic Estimation Techniques

QM222 Class 10 Section D1 1. Goodness of fit -- review 2

QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.

Data Analysis Module: Correlation and Regression

CHAPTER 7 LINEAR RELATIONSHIPS

Review Multiple Regression Multiple-Category Dummy Variables

QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.

A linear approach to predicting house prices

QM222 Class 11 Section A1 Multiple Regression

QM222 Class 19 Omitted Variable Bias pt 2 Different slopes for a single variable QM222 Fall 2017 Section A1.

QM222 Class 14 Section D1 Different slopes for the same variable (Chapter 14) Review: Omitted variable bias (Chapter 13.) The bias on a regression coefficient.

QM222 Class 18 Omitted Variable Bias

QM222 Class 9 Section D1 1. Multiple regression – review and in-class exercise 2. Goodness of fit 3. What if your Dependent Variable is an 0/1 Indicator.

QM222 A1 More on Excel QM222 Fall 2017 Section A1.

QM222 Class 15 Today’s New topic: Time Series

Correlation and Regression Basics

QM222 A1 On tests and projects

QM222 Class 8 Section A1 Using categorical data in regression

PowerPoint Template – delete this slide

QM222 Class 8 Section D1 1. Review: coefficient statistics: standard errors, t-statistics, p-values (chapter 7) 2. Multiple regression 3. Goodness of fit.

Chapter 12: Regression Diagnostics

26134 Business Statistics Week 6 Tutorial

QM222 A1 Nov. 27 More tips on writing your projects

QM222 A1 How to proceed next in your project Multicollinearity

Correlation and Regression Basics

QM222 Class 14 Today’s New topic: What if the Dependent Variable is a Dummy Variable? QM222 Fall 2017 Section A1.

CHAPTER 26: Inference for Regression

QM222 Your regressions and the test

QM222 Dec. 5 Presentations For presentation schedule, see:

Analysis of Covariance ANCOVA

Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships

Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.

MGS 3100 Business Analysis Regression Feb 18, 2016

CHAPTER 3 Describing Relationships

Scatterplots, Association, and Correlation

Scatterplots contd: Correlation The regression line

Presentation transcript:

QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.) The bias on a regression coefficient due to leaving out confounding factors from a Regression QM222 Fall 2016 Section D1

Assignment 4 – Due Friday at 6pm: Hard copy and online Part A: Current Project Status If you have changed or added any aspect of the Current Project Status (Q1-6), revise it. Part B: Questions on your dependent variable (if you have > 1, choose the most important one): If you have a numeric dependent variable, create a histogram of your dependent variable in Stata (histogram varname). If you have a categorical dependent variable, tabulate it with the Stata command: tab variablename, missing. What do you learn from this histogram or tabulation? If you have a numeric dependent variable, get descriptive statistics for your (key) dependent variable in Stata by using summarize variablename, detail. If you have a categorical dependent variable, make it into a single indicator variable, making sure that any missing values are left as missing. Then summarize varname, detail. What important things do you learn about the distribution of your dependent variable from these descriptive statistics? Answer in 1-4 sentences. Based on this evidence, are there any observations with values that seem like mistakes? Should you drop these observations or correct the mistake? Explain and drop. (For numeric variables only) Based on this evidence, is your dependent variable very skewed, and particularly are there any extreme outliers? If so, do you think we should top- code these values (or use logs etc.)? Explain why. Then top-code or change into logs if appropriate. QM222 Fall 2016 Section D1

Assignment 4 – Due Friday at 6pm: Hard copy and online Part C: Questions on your key explanatory variable (if you have > 1, choose the most important one): If it is a numeric variable, create a histogram of it in Stata . If it is a categorical variable, tabulate it with the Stata command: tab variablename, missing. If it is a numeric variable, get descriptive statistics for it summarize variablename, detail. If it is categorical, make it into a single indicator (dummy) variable, keeping missing values as missing. What important things do you learn about the distribution of your key explanatory variable from these descriptive statistics? Based on this evidence, are there any observations with values that seem like mistakes? Do you think we should drop these observations or correct the mistake? Explain, and drop if appropriate. Based on this evidence, is your explanatory variable very skewed, and particularly are there any extreme outliers? If so, do you think we should top- code these values (or use logs etc.)? Explain (and do it). . Then top-code or change into logs if appropriate. QM222 Fall 2016 Section D1

Assignment 4 – Due Friday at 6pm: Hard copy and online Part D: Questions on Correlation: Correlate all variables you plan to use. What important things do you learn about the relationship between your dependent variable(s) and your key explanatory variable(s) from this correlation table? Part E: Simple Regression: Run a simple regression of your key dependent variable on your key explanatory variable (or one of them, if you have several.) What important things do you learn about the relationship between your key dependent and explanatory variables from this regression? In your answer, include a discussion of the explanatory variable’s coefficient, its t- statistic and its confidence interval. QM222 Fall 2016 Section D1

Omitted Variable Bias QM222 Fall 2016 Section D1

Why know about this? It is useful in your projects to understand why coefficients change when you add a variable. So you can know which coefficient answers your question. It is useful in your projects to understand what possibly confounding variables you should search for. Also, if there is a confounding variable that you cannot measure, this will help you predict what the sign of the omitted variable bias is. Finally, it will be on the test. QM222 Fall 2015 Section D1

So isolating each factor’s effect Multiple regression measures the individual impacts of different factors on Y…. Multiple regression helps us to measure the individual impacts of different factors on our dependent variable Y… Holding the other factors constant So isolating each factor’s effect QM222 Fall 2016 Section D1

Condo’s Price = 520729 – 46969 BEACON Price = 6981 + 409 SIZE + 32936 BEACON Why are the coefficients on Beacon so different? The coefficient on Beacon in the first (simple) regression says: Across all the properties in our dataset, those on Beacon cost $46,239 less on average. In contrast, the coefficient on Beacon in the multiple regression says: If we compare two condos of the same size, one on Beacon and one not on Beacon, the one on Beacon costs $32,946 more. QM222 Fall 2016 Section D1

If you really want to measure the effect of X1 alone (e. g If you really want to measure the effect of X1 alone (e.g. Beacon), you need to control for possibly confounding factors. If you don’t, the coefficient on X1 is biased. We call this omitted or missing variable bias. Omitted variable bias occurs when The omitted variable has an effect on the dependent variable, AND 2. The omitted variable is correlated with the explanatory variable of interest. QM222 Fall 2016 Section D1

Omitted variable bias in the condo case Price = 520729 – 46969 BEACON (simple regression) In a simple regression of Y on X1, the coefficient b1 measures the combined effects of: the direct (or often called “causal”) effect of the included variable X1 on Y PLUS an “omitted variable bias” due to factors that were left out (omitted) from the regression. Often we want to measure the direct, causal effect. In this case, the coefficient in the simple regression is biased. QM222 Fall 2016 Section D1

Another example: How does getting more education affect salaries? Let’s say you un this regression: Income = 20,000 + 4000 Education (in years). But, the coefficient 4000 may pick up the fact that more intelligent people have both more education and higher income. If you could add the variable IQ to the regression, the coefficient on education would hold IQ constant. QM222 Fall 2016 Section D1

We are going to learn methods so that you can understand Omitted Variable Bias- first with graphs Really, both being on Beacon and price affect price, as in the multiple regression Y = b0 + b1X1 + b2X2 Let’s call this the Full model. Let’s call b1 and b2 the direct effects. QM222 Fall 2016 Section D1

The mis-specified or Limited model However, in the simple (1 X variable) regression, we measure only a (combined) effect of Beacon on price. Call its coefficient c1 Y = c0 + c1X1 Let’s call c1 is the combined effect. QM222 Fall 2016 Section D1

The reason that there is a bias on X1 is that there is a Background Relationship between the X’s We also know that there is a relationship between X1 (Beacon) and X2 (Size). We call this the Background Relationship: . correlate price size Beacon_Street (obs=1085) | price size Beacon~t -------------+--------------------------- price | 1.0000 size | 0.8655 1.0000 Beacon_Str~t | -0.0552 -0.1081 1.0000 This background relationship, shown here as a1, is negative. QM222 Fall 2016 Section D1

Let’s combine all 3 pictures: the full model, the limited model & the background relationship The effect of X1 on Y has two channels. The first one is the direct effect b1. The second channel is the indirect effect through X2. When X1 changes, X2 also tends to change (a1) This change in X2 has another effect on Y (b2) QM222 Fall 2016 Section D1

If we want the direct effect only When we include both X1 and X2 in a multiple regression, we get the coefficient b1 – the direct effect of X1. QM222 Fall 2016 Section D1