Presentation is loading. Please wait.

Presentation is loading. Please wait.

business analytics II ▌panel data models

Similar presentations


Presentation on theme: "business analytics II ▌panel data models"— Presentation transcript:

1 business analytics II ▌panel data models
Managerial Economics & Decision Sciences Department Developed for business analytics II week 8 week 9 ▌panel data models week 10 cross-section and panel data  fixed effects  omitted variable bias  week 3 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

2 ► statistics & econometrics
session nine panel data models Developed for business analytics II learning objectives ► statistics & econometrics cross section and panel data  working with data across years  regression for panel data fixed effects  definition  use of fixed effects to eliminate ovb  fixed effects regression: xi:regress readings ► (MSN)  Chapter 8 ► (KTN)  Fixed Effects ► (CS)  Bonus Data © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

3 cross section and panel data
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ cross section and panel data ► So far we only looked at data sets without taking into account that observations might be recorded at different point in time. ► Suppose that you work in the central office of a global sales organization.  The central office sets base pay across the organization  Regional managers set bonuses for their regional sales people; the bonus is a percentage of sales and the percentage is set at the start of the year  You want to know if higher bonuses translate into greater sales effort  You have the following data from four sales offices Figure 1. Sales and related bonuses for offices across the world region year bonus sales Atlanta 2010 10 56 Beijing 20 50 Cairo 30 44 Delhi 40 2011 16 60 18 49 47 ► Looking at data for year 2010 or 2011 only we are looking at the data with cross-section “glasses” ► However we can consider “following” information about one particular observation across time – the panel-data interpretation © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 1

4 cross section and panel data
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ cross section and panel data ► We usually write the regression as cross-section: yi  0  1·x1i  2·x2i  …  k·xki  i where i indexes the individuals from 1 to n (thus we have a total of n individuals) k is the number of independent variables  for example x1i indicates the ith individual for independent variable x1 ► In this formulation we do not take into account the possible time-index for each observation. ► If we take into account the time we will write: panel data: yit  0  1·x1it  2·x2it  … + k·xkit  it t indexes time from 1 to T (thus there are T periods)  for example x1it indicates the ith individual in period t for independent variable x1. ► For the cross-section regression we can run two types of regressions:  by period, thus we will run T regressions, one for each period  pooled for all periods, thus we simply ignore the time index and pool all observations © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 2

5 cross section and panel data
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ cross section and panel data ► Let’s consider again the data on bonuses and run the cross section regressions. Figure 2. Sales and related bonuses for offices across the world ► Separate regression for each year: regress sales bonus if year  2010 regress sales bonus if year  2011 ► Pooled regression: regress sales bonus ► Results for the three regressions are presented below. region year bonus sales Atlanta 2010 10 56 Beijing 20 50 Cairo 30 44 Delhi 40 2011 16 60 18 49 47 Figure 3. Results for the three regressions model constant coefficient on bonus R2 cross for 2010 61.00 – 0.54 0.99 cross for 2011 58.69 – 0.23 0.44 pooled 57.77 – 0.29 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 3

6 cross section and panel data
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ cross section and panel data ► The results of our regression fly in the face of economic theory: higher bonus percentages should lead to higher effort but  it seems that higher bonuses really cause lower effort…  are sales people behaving irrationally?  are the regression results biased? ► A possible solution to our problem is to add additional controls, i.e. we suspect an omitted variable bias. ► As we saw several times so far, in case of omitted variable bias we would look for a candidate variable, we called it z, that is currently omitted from the regression but that is:  correlated with bonus (x)  causal to sales (y) ► We infer then qualitatively whether and the direction of the bias in the coefficient of x. But when are we sure that we identified all the candidates? correlation channel correlation direct channel causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 4

7 ► The correct regression, by individual and time, is thus:
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ fixed effects ► Let’s consider the following situation: indeed there is a variable z that we cannot identify and that is probably correlated with x and has a causal impact on y. But we make a very important assumption about the omitted variable:  for each individual the variable z it is fixed across time periods, thus instead of zit we write zi ► The correct regression, by individual and time, is thus: Remark. The index of the omitted variable is only “i“ not “it“ as for the other variables. This means that while the values for x and y can:  vary for each individual across periods of time (within group variation or within group effect)  vary for each period across individuals (between groups variation or between groups effect) Given the assumption above for z we have only between group variation, i.e. it is fixed across time for each individual. This is the fixed effect framework. Remark. For our sales/bonus example: i{Atlanta, Beijng, Cairo, Delhi} and n = 4 (number of individuals) t{2010, 2011} and T = 2 (number of periods) © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 5

8 ► Back to the true regression, written by individual and time:
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ fixed effects ► Back to the true regression, written by individual and time: ► For each individual, i.e. for each i, let’s add the above equality for all periods, assume there are T periods, and divide by T: ► But the complicated expressions are simply the averages across time for each individual, i.e.: ► Thus we can write ► Subtract this last equality from the initial regression’s equation for each individual and time to get for each i and t: ► Surprise!!! (and a pleasant one…) By taking this difference we managed to get rid of the omitted variable… © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 6

9 ► Let’s write the equation as:
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ fixed effects ► Let’s write the equation as: ► Notice that the last term is specific to each individual, i.e. it is indexed only by “i”. This has the flavor of a dummy variable framework. Let and since, as mentioned above, this variable is specific to each individual, we can write it as a sum of dummy variables: where d1  1 if i 1 and 0 otherwise, d2  1 if i 2 and 0 otherwise,…, dn-1  1 if i n – 1 and 0 otherwise. ► Basically we write: © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 7

10 ► The steps to construct the above regression are:
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ fixed effects ► We get a very useful result: in order to eliminate the omitted variable bias we simply run the regression ► The steps to construct the above regression are: Step 1: Construct n – 1 dummy variables (where n is the number of different individuals) using the rule: Step 2: Run the regression above on the n – 1 dummy variables and the x variable(s) Step 3: Interpret the coefficients; this follows directly from the part in which we studied dummy variables: a0 is the average y for the excluded individual when xit is constant a1 is the difference in average y for individual 1 and excluded individual when xit is constant an – 1 is the difference in average y for individual n – 1 and excluded individual when xit is constant b1 is the change in average y when x changes by one unit © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 8

11 ► Luckily STATA offers a very easy way to generate the regression
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ fixed effects ► Luckily STATA offers a very easy way to generate the regression xi:regress y x i.individual_label Remark. The individual label is the label (name) of the variable that identifies individuals. For our sales/bonus example: individual label is actually region. Figure 4. Results for regression by individuals: xi:regress sales bonus i.region ► STATA indicates which individual is excluded, thus interpretation of coefficients should be made accordingly i.region _Iregion_ (_Iregion_1 for region==Atlanta omitted) sales | Coef. Std. Err t P>|t| [95% Conf. Interval] bonus | _Iregion_2 | _Iregion_3 | _Iregion_4 | _cons | ► The coefficient on bonus is positive: an improvement (from an economical point of view) © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 9

12 Managerial Economics &
Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ omitted variable bias ► By controlling for all time-invariant differences in unobservable factors, fixed effects models removes a potential source of ovb. ► If there are some unobservables that vary over time within each group, the fixed effect approach will not remove ovb from those sources  In our example, if you are comfortable assuming that customer characteristics in each region are stable during the time covered by your data, then you can be comfortable that fixed effect models eliminate ovb  If you are not comfortable with this assumption, then fixed effects results can still be biased. Even so, the potential for bias would be even greater if you did not include fixed effects. Put another way, we all intuitively believe that before/after comparisons are more valid than cross-section comparisons. Fixed effects are like before/after comparisons. ► The second limitation of fixed effects models is that we cannot assess the effect of variables that do not vary within groups over time, e.g. if bonuses did not vary over time, we could not use fixed effects. ► If it is crucial to learn the effect of a variable that lacks within group variation, then we would have to forego fixed effects estimation. We would have to rely on within group variation and work to minimize ovb © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 10

13 E[ln(Sales)]  0  1FuelPrice  2Radio
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ omitted variable bias: EuroPet S.A. ► linear regression. First a simple regression of Sales on FuelPrice and Radio: E[Sales]  0  1FuelPrice  2Radio Figure 5. Results for regression of Sales on FuelPrice and Radio Sales | Coef. Std. Err t P>|t| [95% Conf. Interval] FuelPrice | Radio | _cons | Figure 6. The rvfplot for regression of Sales on FuelPrice and Radio Remark The estimated regression is Est.E[Sales]  175,016  1,892FuelPrice  14Radio  The positive coefficient on FuelPrice is suspicious: the higher the FuelPrice the higher the (non-fuel related) Sales  The rvfplot indicates possible curvature in the data. The U-shaped rvfplot recommends using a log-linear model as E[ln(Sales)]  0  1FuelPrice  2Radio © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 11

14 E[ln(Sales)]  0  1FuelPrice  2Radio
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ omitted variable bias: EuroPet S.A. ► log-linear regression. We try the log-linear specification: E[ln(Sales)]  0  1FuelPrice  2Radio Figure 7. Results for regression of ln(Sales) on FuelPrice and Radio lnSales | Coef. Std. Err t P>|t| [95% Conf. Interval] FuelPrice | Radio | _cons | Figure 8. The rvfplot for regression of ln(Sales) on FuelPrice and Radio Remark The estimated regression is Est.E[lnSales]  4.53  0.05FuelPrice  Radio  The positive coefficient on FuelPrice is still suspicious.  The rvfplot indicates that the curvature in the data has been solved. In addition we can immediately test for heteroskedasticity (cannot reject at 5%): Ho: Constant variance Variables: fitted values of lnsales chi2(1) = Prob > chi2 = © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 12

15 omitted variable bias: EuroPet S.A.
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ omitted variable bias: EuroPet S.A. ► omitted variable bias The regression above potentially suffers from omitted variables bias (ovb): locations that have higher fuel prices may well be located in higher traffic locations and/or have less competition, and such factors would also likely support higher sales at the convenience stores. ► We need to eliminate any ovb coming from characteristics of the location that are constant over time, because we are trying to estimate what will happen to sales when the Marseille location changes its prices. ► None of the other characteristics of the location are changing, so it is crucial to control for them when estimating the price effect. Since we have panel data, we can best do this by using a fixed effects model. ► We use the log-linear specification with results: xi:regress lnSales FuelPrice Radio i.StoreId Figure 8. Results for fixed effects regression of lnSales on FuelPrice and Radio i.StoreId _IStoreId_1-20 naturally coded; _IStoreId_1 omitted) note: _IStoreId_13 omitted because of collinearity lnsales | Coef. Std. Err t P>|t| [95% Conf. Interval] FuelPrice | Radio | _cons | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 13

16 omitted variable bias: EuroPet S.A.
Managerial Economics & Decision Sciences Department session nine panel data models Developed for business analytics II cross section and panel data ◄ fixed effects ◄ omitted variable bias ◄ omitted variable bias: EuroPet S.A. Figure 8. The rvfplot for regression of lnSales on FuelPrice and Radio Remark The estimated fixed effect regression (for presentation purposes the coefficients on dummy variables are not included) Est.E[lnSales]   0.035FuelPrice  Radio  The negative coefficient on FuelPrice is in line with expectations.  The rvfplot indicates no curvature in the data In addition we can immediately test for heteroskedasticity (cannot reject at 5%). ► confidence interval To obtain the estimate and the 95% confidence interval for the change in sales corresponding to the 50 cents increase in fuel price, we use the klincom command: klincom _b[FuelPrice]*0.5 Figure 9. The klincom results lnsales | Coef. Std. Err t P>|t| [95% Conf. Interval] (1) | Remark The estimated change in Sales is about –1.75% and the 95% interval for this change is from –3.37% to – 0.12%. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session nine | page 14


Download ppt "business analytics II ▌panel data models"

Similar presentations


Ads by Google