Managerial Economics & Decision Sciences Department cross-section and panel data  fixed effects  omitted variable bias  business analytics II Developed.

Slides:



Advertisements
Similar presentations
Methods of Economic Investigation Lecture 2
Advertisements

Lecture 8 (Ch14) Advanced Panel Data Method
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
Lecture 4 This week’s reading: Ch. 1 Today:
Multiple Linear Regression Model
Pooled Cross Sections and Panel Data II
Chapter 12 Simple Regression
Chapter 4 Multiple Regression.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Chapter 15 Panel Data Analysis.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Back to House Prices… Our failure to reject the null hypothesis implies that the housing stock has no effect on prices – Note the phrase “cannot reject”
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Ordinary Least Squares
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Objectives of Multiple Regression
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Introduction to Linear Regression and Correlation Analysis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Determining Sample Size
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Introduction 1 Panel Data Analysis. And now for… Panel Data! Panel data has both a time series and cross- section component Observe same (eg) people over.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
(1)Combine the correlated variables. 1 In this sequence, we look at four possible indirect methods for alleviating a problem of multicollinearity. POSSIBLE.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Managerial Economics & Decision Sciences Department intro to dummy variables  dummy regressions  slope dummies  business analytics II Developed for.
Managerial Economics & Decision Sciences Department introduction  inflated standard deviations  the F  test  business analytics II Developed for ©
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Managerial Economics & Decision Sciences Department hypotheses, test and confidence intervals  linear regression: estimation and interpretation  linear.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
Managerial Economics & Decision Sciences Department non-linearity  heteroskedasticity  clustering  business analytics II Developed for © 2016 kellogg.
Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management.
Managerial Economics & Decision Sciences Department true and truncated relations  the omitted variable bias effect  spurious regressions  business analytics.
Managerial Economics & Decision Sciences Department intro to linear regression  underlying concepts for the linear regression  interpret linear regression.
Managerial Economics & Decision Sciences Department random variables  density functions  cumulative functions  business analytics II Developed for ©
business analytics II ▌assignment three - solutions pet food 
business analytics II ▌assignment four - solutions mba for yourself 
business analytics II ▌applications cigarettes  car dealership 
business analytics II ▌assignment three - solutions pet food 
business analytics II ▌appendix – regression performance the R2 
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
assignment 7 solutions ► office networks ► super staffing
Multiple Regression Analysis: Further Issues
business analytics II ▌assignment one - solutions autoparts 
Econometrics ITFD Week 8.
business analytics II ▌panel data models
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
business analytics II ▌applications fuel efficiency 
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
QM222 Class 8 Section A1 Using categorical data in regression
assignment 8 solutions ► yogurt brands Developed for
I271B Quantitative Methods
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Chapter 7: The Normality Assumption and Inference with OLS
Ch. 13. Pooled Cross Sections Across Time: Simple Panel Data.
Presentation transcript:

Managerial Economics & Decision Sciences Department cross-section and panel data  fixed effects  omitted variable bias  business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II ▌ panel data models week 9 week 8 week 10 week 3

© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II readings ► statistics & econometrics ► (MSN) cross section and panel data  working with data across years  regression for panel data fixed effects  definition  use of fixed effects to eliminate ovb learning objectives  fixed effects regression: xi:regress ►  Chapter 8 ► (CS)  Bonus Data session nine panel data models business analytics II Developed for ► (KTN)  Fixed Effects

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page1 cross section and panel data session nine ► So far we only looked at data sets without taking into account that observations might be recorded at different point in time. ► Suppose that you work in the central office of a global sales organization.  The central office sets base pay across the organization  Regional managers set bonuses for their regional sales people; the bonus is a percentage of sales and the percentage is set at the start of the year  You want to know if higher bonuses translate into greater sales effort  You have the following data from four sales offices regionyearbonussales Atlanta Beijing Cairo Delhi Atlanta Beijing Cairo Delhi ► Looking at data for year 2010 or 2011 only we are looking at the data with cross-section “glasses” ► However we can consider “following” information about one particular observation across time – the panel- data interpretation Figure 1. Sales and related bonuses for offices across the world

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page2 cross section and panel data session nine ► We usually write the regression as cross-section : y i   0   1 · x 1 i   2 · x 2 i  …   k · x ki   i where i indexes the individuals from 1 to n (thus we have a total of n individuals) k is the number of independent variables  for example x 1 i indicates the i th individual for independent variable x 1 ► In this formulation we do not take into account the possible time-index for each observation. ► If we take into account the time we will write: panel data : y it   0   1 · x 1 it   2 · x 2 it  … +  k · x kit   it where i indexes the individuals from 1 to n (thus we have a total of n individuals) t indexes time from 1 to T (thus there are T periods) k is the number of independent variables  for example x 1 it indicates the i th individual in period t for independent variable x 1. ► For the cross-section regression we can run two types of regressions:  by period, thus we will run T regressions, one for each period  pooled for all periods, thus we simply ignore the time index and pool all observations cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page3 cross section and panel data session nine ► Let’s consider again the data on bonuses and run the cross section regressions. ► Separate regression for each year: regress sales bonus if year  2010 regress sales bonus if year  2011 ► Pooled regression: regress sales bonus ► Results for the three regressions are presented below. modelconstantcoefficient on bonus R2R2 cross for – cross for – pooled57.77– regionyearbonussales Atlanta Beijing Cairo Delhi Atlanta Beijing Cairo Delhi Figure 2. Sales and related bonuses for offices across the world Figure 3. Results for the three regressions cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page4 cross section and panel data session nine ► The results of our regression fly in the face of economic theory: higher bonus percentages should lead to higher effort but  it seems that higher bonuses really cause lower effort…  are sales people behaving irrationally?  are the regression results biased? ► A possible solution to our problem is to add additional controls, i.e. we suspect an omitted variable bias. direct channel indirect channel correlation channel causal correlation truncated ► As we saw several times so far, in case of omitted variable bias we would look for a candidate variable, we called it z, that is currently omitted from the regression but that is:  correlated with bonus ( x )  causal to sales ( z ) ► We infer then qualitatively whether and the direction of the bias in the coefficient of x. But when are we sure that we identified all the candidates? cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page5 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Let’s consider the following situation: indeed there is a variable z that we cannot identify and that is probably correlated with x and has a causal impact on y. But we make a very important assumption about the omitted variable:  for each individual the variable z it is fixed across time periods, thus instead of z it we write z i ► The correct regression, by individual and time, is thus: Remark. The index of the omitted variable is only “ i “ not “ it “ as for the other variables. This means that while the values for x and y can:  vary for each individual across periods of time ( within group variation or within group effect )  vary for each period across individuals ( between groups variation or between groups effect ) Given the assumption above for z we have only between group variation, i.e. it is fixed across time for each individual. This is the fixed effect framework. Remark. For our sales/bonus example: i  {Atlanta, Beijng, Cairo, Delhi} and n = 4 (number of individuals) t  {2010, 2011} and T = 2 (number of periods)

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page6 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Back to the true regression, written by individual and time: ► For each individual, i.e. for each i, let’s add the above equality for all periods, assume there are T periods, and divide by T : ► But the complicated expression are simply the averages across time for each individual, i.e.: ► Thus we can write ► Subtract this last equality from the initial regression’s equation for each individual and time to get for each i and t : ► Surprise!!! (and a pleasant one…) By taking this difference we managed to get rid of the omitted variable …

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page7 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Let’s write the equation as: ► Notice that the last term is specific to each individual, i.e. it is indexed only by “ i ”. This has the flavor of a dummy variable framework. Let and since, as mentioned above, this variable is specific to each individual, we can write it as a sum of dummy variables: where d 1  1 if i  1 and 0 otherwise, d 2  1 if i  2 and 0 otherwise,…, d n -1  1 if i  n – 1 and 0 otherwise. ► Basically we write:

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page8 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► We get a very useful result: in order to eliminate the omitted variable bias we simply run the regression ► The steps to construct the above regression are: Step 1: Construct n – 1 dummy variables (where n is the number of different individuals) using the rule: Step 2:Run the regression above on the n – 1 dummy variables and the x variable(s) Step 3: Interpret the coefficients; this follows directly from the part in which we studied dummy variables: a 0 is the average y for the excluded individual when x it is constant a 1 is the difference in average y for individual 1 and excluded individual when x it is constant … a n – 1 is the difference in average y for individual n – 1 and excluded individual when x it is constant b 1 is the change in average y when x changes by one unit

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page9 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Luckily STATA offers a very easy way to generate the regression xi:regress y x i.individual_label i.region _Iregion_1-4 (_Iregion_1 for region==Atlanta omitted) sales | Coef. Std. Err. t P>|t| [95% Conf. Interval] bonus | _Iregion_2 | _Iregion_3 | _Iregion_4 | _cons | ► STATA indicates which individual is excluded, thus interpretation of coefficients should be made accordingly ► The coefficient on bonus is positive: an improvement (from an economical point of view) Remark. The individual_label is the label (name) of the variable that identifies individuals. For our sales/bonus example: individual_label is actually region. Figure 4. Results for regression by individuals: xi:regress sales bonus i.region

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page10 omitted variable bias session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► By controlling for all time-invariant differences in unobservable factors, fixed effects models removes a potential source of ovb. ► If there are some unobservables that vary over time within each group, the fixed effect approach will not remove ovb from those sources  In our example, if you are comfortable assuming that customer characteristics in each region are stable during the time covered by your data, then you can be comfortable that fixed effect models eliminate ovb  If you are not comfortable with this assumption, then fixed effects results can still be biased. Even so, the potential for bias would be even greater if you did not include fixed effects. Put another way, we all intuitively believe that before/after comparisons are more valid than cross-section comparisons. Fixed effects are like before/after comparisons. ► The second limitation of fixed effects models is that we cannot assess the effect of variables that do not vary within groups over time, e.g. if bonuses did not vary over time, we could not use fixed effects. ► If it is crucial to learn the effect of a variable that lacks within group variation, then we would have to forego fixed effects estimation. We would have to rely on within group variation and work to minimize ovb

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page11 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► linear regression. First a simple regression of Sales on FuelPrice and Radio: Sales | Coef. Std. Err. t P>|t| [95% Conf. Interval] FuelPrice | Radio | _cons | Figure 5. Results for regression of Sales on FuelPrice and Radio Figure 6. The rvfplot for regression of Sales on FuelPrice and Radio E [ Sales ]   0   1  FuelPrice   2  Radio Remark The estimated regression is Est.E [ Sales ]   175,016  1,892  FuelPrice  14  Radio  The positive coefficient on FuelPrice is suspicious: the higher the FuelPrice the higher the (non-fuel related) Sales  The rvfplot indicates possible curvature in the data. The U-shaped rvfplot recommends using a log-linear model as E [ln( Sales )]   0   1  FuelPrice   2  Radio

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page12 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► log-linear regression. We try the log-linear specification: Figure 7. Results for regression of ln( Sales ) on FuelPrice and Radio Figure 8. The rvfplot for regression of ln( Sales ) on FuelPrice and Radio E [ln( Sales )]   0   1  FuelPrice   2  Radio Remark The estimated regression is Est.E [ lnSales ]  4.53  0.05  FuelPrice   Radio  The positive coefficient on FuelPrice is still suspicious.  The rvfplot indicates that the curvature in the data has been solved. In addition we can immediately test for heteroskedasticity (cannot reject at 5%): lnSales | Coef. Std. Err. t P>|t| [95% Conf. Interval] FuelPrice | Radio | _cons | Ho: Constant variance Variables: fitted values of lnsales chi2(1) = 2.80 Prob > chi2 =

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page13 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► omitted variable bias The regression above potentially suffers from omitted variables bias ( ovb ): locations that have higher fuel prices may well be located in higher traffic locations and/or have less competition, and such factors would also likely support higher sales at the convenience stores. ► We need to eliminate any ovb coming from characteristics of the location that are constant over time, because we are trying to estimate what will happen to sales when the Marseille location changes its prices. ► None of the other characteristics of the location are changing, so it is crucial to control for them when estimating the price effect. Since we have panel data, we can best do this by using a fixed effects model. ► We use the log-linear specification with results: xi:regress lnSales FuelPrice Radio i.StoreId i.StoreId _IStoreId_1-20 naturally coded; _IStoreId_1 omitted) note: _IStoreId_13 omitted because of collinearity lnsales | Coef. Std. Err. t P>|t| [95% Conf. Interval] FuelPrice | Radio | _cons | Figure 8. Results for fixed effects regression of lnSales on FuelPrice and Radio

Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page14 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ Figure 8. The rvfplot for regression of lnSales on FuelPrice and Radio Remark The estimated fixed effect regression (for presentation purposes the coefficients on dummy variables are not included) Est.E [ lnSales ]    FuelPrice   Radio  The negative coefficient on FuelPrice is in line with expectations.  The rvfplot indicates no curvature in the data In addition we can immediately test for heteroskedasticity (cannot reject at 5%). ► confidence interval To obtain the estimate and the 95% confidence interval for the change in sales corresponding to the 50 cents increase in fuel price, we use the klincom command: lnsales | Coef. Std. Err. t P>|t| [95% Conf. Interval] (1) | Figure 9. The klincom results Remark The estimated change in Sales is about –1.75% and the 95% interval for this change is from –3.37% to – 0.12%. klincom _b[FuelPrice]*0.5