Download presentation
Presentation is loading. Please wait.
Published byMagdalene Dickerson Modified over 8 years ago
1
Managerial Economics & Decision Sciences Department cross-section and panel data fixed effects omitted variable bias business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II ▌ panel data models week 9 week 8 week 10 week 3
2
© 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II readings ► statistics & econometrics ► (MSN) cross section and panel data working with data across years regression for panel data fixed effects definition use of fixed effects to eliminate ovb learning objectives fixed effects regression: xi:regress ► Chapter 8 ► (CS) Bonus Data session nine panel data models business analytics II Developed for ► (KTN) Fixed Effects
3
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page1 cross section and panel data session nine ► So far we only looked at data sets without taking into account that observations might be recorded at different point in time. ► Suppose that you work in the central office of a global sales organization. The central office sets base pay across the organization Regional managers set bonuses for their regional sales people; the bonus is a percentage of sales and the percentage is set at the start of the year You want to know if higher bonuses translate into greater sales effort You have the following data from four sales offices regionyearbonussales Atlanta20101056 Beijing20102050 Cairo20103044 Delhi201040 Atlanta20111660 Beijing20111849 Cairo20114050 Delhi20115047 ► Looking at data for year 2010 or 2011 only we are looking at the data with cross-section “glasses” ► However we can consider “following” information about one particular observation across time – the panel- data interpretation Figure 1. Sales and related bonuses for offices across the world
4
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page2 cross section and panel data session nine ► We usually write the regression as cross-section : y i 0 1 · x 1 i 2 · x 2 i … k · x ki i where i indexes the individuals from 1 to n (thus we have a total of n individuals) k is the number of independent variables for example x 1 i indicates the i th individual for independent variable x 1 ► In this formulation we do not take into account the possible time-index for each observation. ► If we take into account the time we will write: panel data : y it 0 1 · x 1 it 2 · x 2 it … + k · x kit it where i indexes the individuals from 1 to n (thus we have a total of n individuals) t indexes time from 1 to T (thus there are T periods) k is the number of independent variables for example x 1 it indicates the i th individual in period t for independent variable x 1. ► For the cross-section regression we can run two types of regressions: by period, thus we will run T regressions, one for each period pooled for all periods, thus we simply ignore the time index and pool all observations cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄
5
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page3 cross section and panel data session nine ► Let’s consider again the data on bonuses and run the cross section regressions. ► Separate regression for each year: regress sales bonus if year 2010 regress sales bonus if year 2011 ► Pooled regression: regress sales bonus ► Results for the three regressions are presented below. modelconstantcoefficient on bonus R2R2 cross for 201061.00– 0.540.99 cross for 201158.69– 0.230.44 pooled57.77– 0.290.44 regionyearbonussales Atlanta20101056 Beijing20102050 Cairo20103044 Delhi201040 Atlanta20111660 Beijing20111849 Cairo20114050 Delhi20115047 Figure 2. Sales and related bonuses for offices across the world Figure 3. Results for the three regressions cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄
6
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page4 cross section and panel data session nine ► The results of our regression fly in the face of economic theory: higher bonus percentages should lead to higher effort but it seems that higher bonuses really cause lower effort… are sales people behaving irrationally? are the regression results biased? ► A possible solution to our problem is to add additional controls, i.e. we suspect an omitted variable bias. direct channel indirect channel correlation channel causal correlation truncated ► As we saw several times so far, in case of omitted variable bias we would look for a candidate variable, we called it z, that is currently omitted from the regression but that is: correlated with bonus ( x ) causal to sales ( z ) ► We infer then qualitatively whether and the direction of the bias in the coefficient of x. But when are we sure that we identified all the candidates? cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄
7
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page5 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Let’s consider the following situation: indeed there is a variable z that we cannot identify and that is probably correlated with x and has a causal impact on y. But we make a very important assumption about the omitted variable: for each individual the variable z it is fixed across time periods, thus instead of z it we write z i ► The correct regression, by individual and time, is thus: Remark. The index of the omitted variable is only “ i “ not “ it “ as for the other variables. This means that while the values for x and y can: vary for each individual across periods of time ( within group variation or within group effect ) vary for each period across individuals ( between groups variation or between groups effect ) Given the assumption above for z we have only between group variation, i.e. it is fixed across time for each individual. This is the fixed effect framework. Remark. For our sales/bonus example: i {Atlanta, Beijng, Cairo, Delhi} and n = 4 (number of individuals) t {2010, 2011} and T = 2 (number of periods)
8
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page6 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Back to the true regression, written by individual and time: ► For each individual, i.e. for each i, let’s add the above equality for all periods, assume there are T periods, and divide by T : ► But the complicated expression are simply the averages across time for each individual, i.e.: ► Thus we can write ► Subtract this last equality from the initial regression’s equation for each individual and time to get for each i and t : ► Surprise!!! (and a pleasant one…) By taking this difference we managed to get rid of the omitted variable …
9
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page7 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Let’s write the equation as: ► Notice that the last term is specific to each individual, i.e. it is indexed only by “ i ”. This has the flavor of a dummy variable framework. Let and since, as mentioned above, this variable is specific to each individual, we can write it as a sum of dummy variables: where d 1 1 if i 1 and 0 otherwise, d 2 1 if i 2 and 0 otherwise,…, d n -1 1 if i n – 1 and 0 otherwise. ► Basically we write:
10
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page8 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► We get a very useful result: in order to eliminate the omitted variable bias we simply run the regression ► The steps to construct the above regression are: Step 1: Construct n – 1 dummy variables (where n is the number of different individuals) using the rule: Step 2:Run the regression above on the n – 1 dummy variables and the x variable(s) Step 3: Interpret the coefficients; this follows directly from the part in which we studied dummy variables: a 0 is the average y for the excluded individual when x it is constant a 1 is the difference in average y for individual 1 and excluded individual when x it is constant … a n – 1 is the difference in average y for individual n – 1 and excluded individual when x it is constant b 1 is the change in average y when x changes by one unit
11
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page9 fixed effects session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► Luckily STATA offers a very easy way to generate the regression xi:regress y x i.individual_label i.region _Iregion_1-4 (_Iregion_1 for region==Atlanta omitted) ------------------------------------------------------------------------------ sales | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bonus |.65.0288675 22.52 0.000.5581307.7418693 _Iregion_2 | -12.4.3605551 -34.39 0.000 -13.54745 -11.25255 _Iregion_3 | -25.3.7094599 -35.66 0.000 -27.55782 -23.04218 _Iregion_4 | -35.3.9763879 -36.15 0.000 -38.4073 -32.1927 _cons | 49.55.4368447 113.43 0.000 48.15977 50.94023 ------------------------------------------------------------------------------ ► STATA indicates which individual is excluded, thus interpretation of coefficients should be made accordingly ► The coefficient on bonus is positive: an improvement (from an economical point of view) Remark. The individual_label is the label (name) of the variable that identifies individuals. For our sales/bonus example: individual_label is actually region. Figure 4. Results for regression by individuals: xi:regress sales bonus i.region
12
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page10 omitted variable bias session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► By controlling for all time-invariant differences in unobservable factors, fixed effects models removes a potential source of ovb. ► If there are some unobservables that vary over time within each group, the fixed effect approach will not remove ovb from those sources In our example, if you are comfortable assuming that customer characteristics in each region are stable during the time covered by your data, then you can be comfortable that fixed effect models eliminate ovb If you are not comfortable with this assumption, then fixed effects results can still be biased. Even so, the potential for bias would be even greater if you did not include fixed effects. Put another way, we all intuitively believe that before/after comparisons are more valid than cross-section comparisons. Fixed effects are like before/after comparisons. ► The second limitation of fixed effects models is that we cannot assess the effect of variables that do not vary within groups over time, e.g. if bonuses did not vary over time, we could not use fixed effects. ► If it is crucial to learn the effect of a variable that lacks within group variation, then we would have to forego fixed effects estimation. We would have to rely on within group variation and work to minimize ovb
13
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page11 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► linear regression. First a simple regression of Sales on FuelPrice and Radio: Sales | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------+---------------------------------------------------------------- FuelPrice | 1892.702 497.7717 3.80 0.000 904.7625 2880.641 Radio | 14.64805 39.74652 0.37 0.713 -64.2378 93.5339 _cons | -175016.9 56122.55 -3.12 0.002 -286404.6 -63629.17 Figure 5. Results for regression of Sales on FuelPrice and Radio Figure 6. The rvfplot for regression of Sales on FuelPrice and Radio E [ Sales ] 0 1 FuelPrice 2 Radio Remark The estimated regression is Est.E [ Sales ] 175,016 1,892 FuelPrice 14 Radio The positive coefficient on FuelPrice is suspicious: the higher the FuelPrice the higher the (non-fuel related) Sales The rvfplot indicates possible curvature in the data. The U-shaped rvfplot recommends using a log-linear model as E [ln( Sales )] 0 1 FuelPrice 2 Radio
14
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page12 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► log-linear regression. We try the log-linear specification: Figure 7. Results for regression of ln( Sales ) on FuelPrice and Radio Figure 8. The rvfplot for regression of ln( Sales ) on FuelPrice and Radio E [ln( Sales )] 0 1 FuelPrice 2 Radio Remark The estimated regression is Est.E [ lnSales ] 4.53 0.05 FuelPrice 0.00008 Radio The positive coefficient on FuelPrice is still suspicious. The rvfplot indicates that the curvature in the data has been solved. In addition we can immediately test for heteroskedasticity (cannot reject at 5%): lnSales | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------+---------------------------------------------------------------- FuelPrice |.0515498.0091854 5.61 0.000.0333193.0697803 Radio | -.0000846.0007334 -0.12 0.908 -.0015403.001371 _cons | 4.536015 1.035632 4.38 0.000 2.480573 6.591457 Ho: Constant variance Variables: fitted values of lnsales chi2(1) = 2.80 Prob > chi2 = 0.0945
15
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page13 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ ► omitted variable bias The regression above potentially suffers from omitted variables bias ( ovb ): locations that have higher fuel prices may well be located in higher traffic locations and/or have less competition, and such factors would also likely support higher sales at the convenience stores. ► We need to eliminate any ovb coming from characteristics of the location that are constant over time, because we are trying to estimate what will happen to sales when the Marseille location changes its prices. ► None of the other characteristics of the location are changing, so it is crucial to control for them when estimating the price effect. Since we have panel data, we can best do this by using a fixed effects model. ► We use the log-linear specification with results: xi:regress lnSales FuelPrice Radio i.StoreId i.StoreId _IStoreId_1-20 naturally coded; _IStoreId_1 omitted) note: _IStoreId_13 omitted because of collinearity lnsales | Coef. Std. Err. t P>|t| [95% Conf. Interval] -----------+---------------------------------------------------------------- FuelPrice | -.0350037.0163477 -2.14 0.035 -.0675429 -.0024645 Radio | -.0004923.0014257 -0.35 0.731 -.0033302.0023455 _cons | 15.11625 1.897308 7.97 0.000 11.33975 18.89274 Figure 8. Results for fixed effects regression of lnSales on FuelPrice and Radio
16
Managerial Economics & Decision Sciences Department session nine panel data models business analytics II Developed for © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II | page14 omitted variable bias: EuroPet S.A. session nine cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ Figure 8. The rvfplot for regression of lnSales on FuelPrice and Radio Remark The estimated fixed effect regression (for presentation purposes the coefficients on dummy variables are not included) Est.E [ lnSales ] 15.11 0.035 FuelPrice 0.00049 Radio The negative coefficient on FuelPrice is in line with expectations. The rvfplot indicates no curvature in the data In addition we can immediately test for heteroskedasticity (cannot reject at 5%). ► confidence interval To obtain the estimate and the 95% confidence interval for the change in sales corresponding to the 50 cents increase in fuel price, we use the klincom command: lnsales | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+---------------------------------------------------------------- (1) | -.0175019.0081738 -2.14 0.035 -.0337715 -.0012323 Figure 9. The klincom results Remark The estimated change in Sales is about –1.75% and the 95% interval for this change is from –3.37% to – 0.12%. klincom _b[FuelPrice]*0.5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.