business analytics II ▌applications cigarettes car dealership Managerial Economics & Decision Sciences Department Developed for business analytics II week 9 week 10 ▌applications cigarettes car dealership horse racing orangia week 3 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
estimate the model, interpret coefficients session ten applications Developed for business analytics II learning objectives ► linear regression estimate the model, interpret coefficients statistical significance, p-value and confidence intervals ► confidence and prediction intervals klincom and kpredint commands: use and misuse ► dummy variables definition and interpretation of dummy and slope dummy variables use of dummy and slope dummy regressions in hypothesis testing ► pitfalls for linear regression omitted variable bias: identify the bias multicolinearity: test and correct spurious regression: identify heteroskedasticity: identify(test) and correct curvature: identify and correct ► non-linear models log specification: definition, estimation and interpretation ► panel data models assumptions, use and estimation of fixed effects models © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II
Est.E[SALES] b1·NICOTINE Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► For each store in a random sample of 100 stores, the following information was recorded: SALES The number of packs sold in a year NICOTINE The nicotine content of the cigarettes in milligrams per cigarette STORE Dummy variable that equals 0 for a convenience store and equals 1 for a supermarket ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES] 2127 257·NICOTINE 1137·STORE (105.2) (247.3) Using this regression, estimate the change in sales of cigarette packs in a supermarket if the nicotine content of the cigarettes is reduced by 0.2 milligrams per cigarette. i. Identify the change: Est.E[SALES] b1·NICOTINE where b1 257 and NICOTINE 0.2, thus Est.E[SALES] 257·( 0.2) 51.4 Remark: A change in level is always related to the change in one or several “x-variables” and their slopes. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 1
b1 Std.Error[b1]tdf,/2 1 b1 Std.Error[b1]tdf,/2 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► For each store in a random sample of 100 stores, the following information was recorded: SALES The number of packs sold in a year NICOTINE The nicotine content of the cigarettes in milligrams per cigarette STORE Dummy variable that equals 0 for a convenience store and equals 1 for a supermarket ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES] 2127 257·NICOTINE 1137·STORE (105.2) (247.3) Provide a 95% interval that contains the true change in sales given this reduction in nicotine content. ii. The general form of an interval with confidence level 1 is Estimate Std.Error[Estimate]tdf,/2 True Value of Estimate Estimate Std.Error[Estimate]tdf,/2 Since Estimate b1·NICOTINE the above interval can be based on the interval for 1 multiplied by NICOTINE. The interval for 1 is simply b1 Std.Error[b1]tdf,/2 1 b1 Std.Error[b1]tdf,/2 where b1 257, Std.Error[b1] 105.2 and tdf,/2 invttail(97,0.025) 1.9847 The interval for 1 is thus [257 1.9847·105.2, 257 1.9847·105.2] [48.20956,465.79044] and the interval for 1·NICOTINE is [ 93.158, 9.642]. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 2
cigarettes manufacturing Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES] 2127 257·NICOTINE 1137·STORE (105.2) (247.3) ► A second regression is reported: Est.E[SALES] 2739 335·NICOTINE (137.7) The coefficient for NICOTINE in the first regression is lower than in the second regression. Why is that the case? What does this imply about the types of cigarettes that are sold at convenience stores as compared to supermarkets? iii. The observed difference in estimated coefficients is most likely a result of omitted variable bias where the omitted variable (in the second regression) is STORE: b1 b1*. The overestimation means b2·a1 0 and since we already know that b2 0 it must be the case that a1 0. correlation channel correlation direct channel causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 3
cigarettes manufacturing Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES] 2127 257·NICOTINE 1137·STORE (105.2) (247.3) ► A second regression is reported: Est.E[SALES] 2739 335·NICOTINE (137.7) The coefficient for NICOTINE in the first regression is lower than in the second regression. Why is that the case? What does this imply about the types of cigarettes that are sold at convenience stores as compared to supermarkets? iii. Having a1 0 for relation means that STORE and NICOTINE are positively related: high level for STORE (i.e., STORE 1) it is likely associated with high levels for NICOTINE low level for STORE (i.e., STORE 0) it is likely associated with low levels for NICOTINE Thus supermarkets (STORE 1) are likely to sell cigarettes with higher NICOTINE level than do convenience stores (STORE 0). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 4
car dealership i. The regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership ► You have collected data from a random sample of 62 past transactions (auto.dta), which contain the following variables: • GENDER gender of the buyer, equal to 1 if male and 0 if female • INCOME yearly income of the buyer in $ • AGE age of the buyer in years • COLLEGE a dummy variable equal to 1 if the buyer is a college graduate and 0 otherwise • PRICE the price of the car in $ Run a regression of price on the remaining 4 variables. Report the estimated regression equation. Do not drop any variables from the regression. i. The regression is Est.E[price] 2,280.36 1,444.20·gender 0.1861·income 15.59·age 2,080.86·college Figure 1. Regression results price | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- gender | 1444.197 738.0868 1.96 0.055 -33.79631 2922.19 income | .1860856 .0246546 7.55 0.000 .1367155 .2354556 age | -15.58905 46.07751 -0.34 0.736 -107.8577 76.67957 college | 2080.855 673.0907 3.09 0.003 733.0142 3428.696 _cons | 2280.362 1271.326 1.79 0.078 -265.4246 4826.149 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 5
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership Can you prove at a 10% significance level that the average price of cars bought by 30 year-old male college graduates with income of $90,000 is higher than $20,000. ii. We are asked to evaluate whether the level of selling price is greater than a certain level (20,000): We base the hypothesis on E[price] 0 1 ·gender 2·income 3·age 4·college with gender 1 (male), income 90,000, age 30 and college 1 (college graduate) hypothesis H0: E[price] 20,000 Ha: E[price] 20,000 set hypotheses hypothesis H0: 0 1·1 2·90,000 3·30 4·1 20,000 Ha: 0 1·1 2·90,000 3·30 4·1 20,000 set hypotheses We test a combination of coefficients using either klincom or kpredint (here we deal with average across graduates): . klincom _b[_cons]+_b[gender]*1+_b[income]*90000+_b[age]*30+_b[college]*1-20000 price | Coef. Std. Err. t P>|t| [90% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 2085.445 1254.126 1.66 0.102 -11.49022 4182.381 ------------------------------------------------------------------------------ If Ha: < then Pr(T < t) = .949 If Ha: not = then Pr(|T| > |t|) = .102 If Ha: > then Pr(T > t) = .051 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 6
E[price] 0 1·0 2·80,000 3·45 4·1 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership Jane is a woman and 45 years old, has a college degree and earns $80,000. Provide a range of values that you are 95% confident will contain the price of Jane’s next car.. iii. We are asked to provide an interval for the level of selling price for one individual with certain characteristics. The interval has the form: Est.E[price] Std.Err.[price]tdf,/2 E[price] Est.E[price] Std.Err.[price]tdf,/2 Since we are asked for an interval for the level of the dependent variable we use either klincom or kpredint. Here the question is about the level of selling price for one individual thus we use kpredint. We are given gender 0 (female), income 80,000, age 45 and college 1 (college graduate) thus the interval is for E[price] 0 1·0 2·80,000 3·45 4·1 . kpredint _b[_cons]+_b[gender]*0+_b[income]*80000+_b[age]*45+_b[college]*1 Estimate: 18546.557 Standard Error of Individual Prediction: 2492.0427 Individual Prediction Interval (95%): [13556.327,23536.786] t-ratio: 7.4423108 If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 the prediction interval © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 7
genderincome gender·income Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. iv. Here the interaction between gender and income is fairly transparent thus a slope dummy variable defined as genderincome gender·income will help testing the claims above. The regression becomes E[price] 0 1 ·gender 2·income 3·age 4·college 5·genderincome and the estimated regression is shown below. Figure 2. Regression results . generate genderincome=gender*income . regress price gender income genderincome age college price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | 2124.783 2294.148 0.93 0.358 -2470.948 6720.514 income | .2074518 .0725208 2.86 0.006 .0621751 .3527285 age | -18.27614 47.23003 -0.39 0.700 -112.8893 76.33697 college | 2043.064 689.0966 2.96 0.004 662.6371 3423.49 genderincome | -.0212742 .0678361 -0.31 0.755 -.1571662 .1146179 _cons | 1721.126 2195.929 0.78 0.436 -2677.848 6120.101 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 8
E[price] 2·income 5·genderincome Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. v. Based on the regression E[price] 0 1 ·gender 2·income 3·age 4·college 5·genderincome we can now test claims that relate the selling price with the level of income for different genders. In particular we can test claims that relate the change in selling price with the change in income: E[price] 2·income 5·genderincome The first claim is that income has no impact on selling price for women (gender 0). But for gender 0 we get E[price] 2·income thus “income has no impact on selling price for women” means to test: hypothesis H0: 2 0 Ha: 2 0 set hypotheses From the regression table we find immediately for income pvalue 0.006 thus we reject the null. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 9
v. Based on the regression Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. v. Based on the regression E[price] 0 1 ·gender 2·income 3·age 4·college 5·genderincome we can now test claims that relate the selling price with the level of income for different genders. In particular we can test claims that relate the change in selling price with the change in income: E[price] 2·income 5·genderincome The second claim is that men (gender 0) tend to buy more expensive cars the higher the income. For gender 1 we get E[price] 2·income 5·income (2 5)·income thus the claim means to test: hypothesis H0: 2 5 0 Ha: 2 5 0 set hypotheses We need to run klincom in order to test this hypothesis: klincom _b[income]*1 _b[genderincome]*1 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 10
horse racing ► The regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injured b5·novice b6·ratio b7·noviceratio Based on Steve’s regression, provide a point estimate for the difference in lnodds for two horses that are identical except that one just suffered a minor injury whereas the other did not, assuming the two horses are competing in the same race? i. We are considering two horses identical in all respects except for the number of injuries: no injury: Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·0 b5·novice b6·ratio b7·noviceratio injury: Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·1 b5·novice b6·ratio b7·noviceratio Thus lnodds for these two horses is simply b4 1.998479 (the difference between the two equations). Notice that the horses are identical otherwise thus the values for all other variables are the same for the two horses so “they cancel out when taking the difference between the two equations above. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 11
horse racing ► The regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injured b5·novice b6·ratio b7·noviceratio A horse is participating in a race against seven other horses. Based on Steve’s regression, all other factors in the regression held fixed, how would the odds on that horse be affected if two additional horses joined the race? ii. We are considering two races: initial race has 8 horses while the second has 10 horses. Thus 8 horses: Est.E[lnodds] b0 b1·distance b2·8 b3·last b4·injury b5·novice b6·ratio b7·noviceratio 10 horses : Est.E[lnodds] b0 b1·distance b2·10 b3·last b4·injury b5·novice b6·ratio b7·noviceratio Thus lnodds for these two races is simply lnodds 2·b2 0.1093014. Odds change by 10.93% when two horses are added. Notice that the two equations above are for the same horse thus all the characteristics are identical so “they cancel out when taking the difference between the two equations above. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 12
Est.E[lnodds] b6·ratio b7·ratio (b6 b7)·ratio Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injured b5·novice b6·ratio b7·noviceratio Steve claims that for novice horses the ratio of past wins is irrelevant for the horse’s odds, all other factors in the regression held fixed. Can you prove him wrong using a 10% level of significance? iii. We are considering novice horses (for which novice 1) and we are interested whether the change in ratio has any effect on lnodds. For novice 1: Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injury b5·1 b6·ratio b7·ratio thus Est.E[lnodds] b6·ratio b7·ratio (b6 b7)·ratio Steve claims basically requires a test of the following hypothesis H0: 6 7 0 Ha: 6 7 0 set hypotheses The command klincom _b[ratio] _b[noviceratio] provides the following: If Ha: < then Pr(T < t) = .053 If Ha: not = then Pr(|T| > |t|) = .106 If Ha: > then Pr(T > t) = .947 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 13
sprinterdistance sprinter·distance Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injured b5·novice b6·ratio b7·noviceratio Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. What would you add to the regression in Part I to allow you to evaluate this claim?. iv. We are clearly looking at interaction between being a sprinter and the length of the race thus a slope dummy capturing this interaction is required: sprinterdistance sprinter·distance The regression becomes (we need to include also the dummy sprinter): Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injury b5·novice b6·ratio b7·rationovice b8·sprinter b9·sprinterdistance © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 14
v. The estimated regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injury b5·novice b6·ratio b7·rationovice b8·sprinter b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. Carry out the modification you suggest in part v. and write down the new estimated regression equation. v. The estimated regression is Figure 3. Regression results . generate sprinterdistance = sprinter*distance . regress lnodds distance starters last injured novice ratio noviceratio sprinter sprinterdistance lnOdds | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+---------------------------------------------------------------- distance | -.0008171 .0002318 -3.52 0.001 -.0012774 -.0003567 starters | .0531778 .0178135 2.99 0.004 .0177987 .0885568 last | .0428266 .0128888 3.32 0.001 .0172284 .0684248 injured | 1.820267 .1748925 10.41 0.000 1.472915 2.167618 novice | .1995128 .31917 0.63 0.533 -.4343864 .8334119 ratio | -3.639829 .6952844 -5.24 0.000 -5.020724 -2.258935 noviceratio | 1.411642 1.290283 1.09 0.277 -1.150971 3.974254 sprinter | -2.806884 .5395038 -5.20 0.000 -3.878385 -1.735383 sprinterdistance | .0019458 .0003504 5.55 0.000 .0012499 .0026416 _cons | 3.119532 .4811813 6.48 0.000 2.163864 4.075199 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 15
Est.E[lnodds]|sprinter 0 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injury b5·novice b6·ratio b7·rationovice b8·sprinter b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. In terms of your new regression model, what must be true in order for Steve’s claim to be correct? Test the claim. vi. We need to evaluate the change in lnodds for sprinters as we change distance: sprinters: Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injury b5·novice b6·ratio b7·rationovice b8·1 b9·distance The change in lnodds for a change in distance is thus: sprinters: Est.E[lnodds]|sprinter b1·distance b9·distance (b1 b9)·distance Steve’s claim is that Est.E[lnodds]|sprinter 0 Using the expression above the claim is really about b1 b9 0 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 16
horse racing ► The regression is vii. The test is about Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds] b0 b1·distance b2·starters b3·last b4·injury b5·novice b6·ratio b7·rationovice b8·sprinter b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. In terms of your new regression model, what must be true in order for Steve’s claim to be correct? Test the claim. vii. The test is about hypothesis H0: 1 9 0 Ha: 1 9 0 set hypotheses The command klincom _b[distance] _b[sprinterdistance] provides the following: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 17
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing Claim: “Older horses would probably be more prone to injury and that older horses are also less likely to win, i.e. have higher odds.” Steve counters that older horses are in fact less likely to get injured (young horses are less disciplined and get minor injuries all the time), but agrees that older horses are less likely to win, all else equal. He reruns the regression with Age in it in addition to all the original variables. In this new regression, the estimated coefficient on injured is 1.305 viii. When Age is omitted from the regression, the coefficient of injured carries an omitted variable bias. The sign of ovb equals the product of (a) the sign of the relation between Age and lnOdds – this is b10 below (b) the sign of the relation between injured and Age – this is a2 below Alison and Steve agree that Age and odds are positively related, thus b10 0. Alison believes that injured and Age are also positively related, thus a1 0, therefore the ovb according to her must be positive which corresponds to overestimation, thus b4* b4. Steve says that injured and Age are negatively related, therefore he expects a negative ovb which corresponds to underestimation, thus b4* b4. correlation channel When Steve reruns the regression, with Age included among the regressors, the coefficient on injured goes down, i.e. b4* b4. This finding is consistent with Alison’s opinion, but not with Steve’s. direct channel correlation causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 18
Est.E[ratio] b6·days 0.0002077·( 250) 0.051925 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio] 0 1·fairpr 2·bidders 3·rigged 4·length 5·fxcost 6·days Based on the regression in the table, give your best estimate and a 90 percent confidence interval of what will happen to the ratio of the actual price to the estimated cost if the number of days for a project decreases by 250, holding the other independent variables fixed. i. We are asked to evaluate the change in ratio for a change in days with everything else held constant: E[ratio] 6·days where days 250. Thus the estimated change is: Est.E[ratio] b6·days 0.0002077·( 250) 0.051925 We can obtain the 90% confidence interval for this change using klincom (alternative use the standard deviation for the coefficient on days and the required tvalue): . klincom _b[days]*(-250), level(90) ratio | Coef. Std. Err. t P>|t| [90% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -.0519238 .0304968 -1.70 0.091 -.102458 -.0013895 ------------------------------------------------------------------------------ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 19
E[ratio] 2·bidders Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio] 0 1·fairpr 2·bidders 3·rigged 4·length 5·fxcost 6·days Can you claim at the 5 percent significance level that an increase in the number of bidders, holding the other independent variables fixed, would on average decrease the project’s ratio of actual price to estimated cost? ii. We are asked to evaluate the change in ratio for a change in bidders with everything else held constant: E[ratio] 2·bidders The claim is that an increase in bidders result in a decrease in ratio, that is we should test: hypothesis H0: 2 0 Ha: 2 0 set hypotheses Running klincom _b[bidders] gives ( 1) bidders = 0 If Ha: < then Pr(T < t) = .031 If Ha: not = then Pr(|T| > |t|) = .063 If Ha: > then Pr(T > t) = .969 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 20
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio] 0 1·fairpr 2·bidders 3·rigged 4·length 5·fxcost 6·days Would it be legitimate to drop the variables FairPr and FxCost from the regression if you wanted to do so? If the answer is yes, write down the new estimated regression equation. iii. We run the multicolinearity tests vif and testparm for fairpr and fxcost: . vif Variable | VIF 1/VIF -------------+---------------------- fairpr | 54.77 0.018259 fxcost | 42.73 0.023405 days | 4.77 0.209548 length | 1.42 0.705534 bidders | 1.41 0.711154 rigged | 1.31 0.761681 Mean VIF | 17.73 . testparm fairpr fxcost ( 1) fairpr = 0 ( 2) fxcost = 0 F( 2, 126) = 0.78 Prob > F = 0.4601 The vif indicates inflated standard errors for fairpr and fxcost. Since the pvalue for the Ftest is definitely higher than any reasonable significance level we can reject the joint hypothesis that the two variable are jointly significant. We will have to drop both variables and re-run the regression without these two variables as regressors. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 21
iii. The estimated regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The new regression is: E[ratio] 0 1·bidders 2·rigged 3·length 4·days Would it be legitimate to drop the variables FairPr and FxCost from the regression if you wanted to do so? If the answer is yes, write down the new estimated regression equation. iii. The estimated regression is Figure 4. Regression results ratio | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+---------------------------------------------------------------- bidders | -.0080003 .0039566 -2.02 0.045 -.0158291 -.0001715 rigged | .184931 .0247339 7.48 0.000 .1359908 .2338712 length | -.0002443 .0021309 -0.11 0.909 -.0044606 .003972 days | .0000828 .0000589 1.40 0.163 -.0000338 .0001994 _cons | .9272769 .027766 33.40 0.000 .8723372 .9822166 rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of ratio chi2(1) = 0.63 Prob > chi2 = 0.4260 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 22
E[price] 0 1·fairpr 2·fxcost 3·bidders 4·rigged Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. The variables for which we have values are: fairpr 1,000, fxcost 700, bidders 4, and rigged 0. All of these variables are plausibly related to the winning bid (price). Therefore, we should initially run a regression of price against these four variables. We do not include any slope-dummies (as was requested). E[price] 0 1·fairpr 2·fxcost 3·bidders 4·rigged The estimated regression is Figure 5. Regression results price | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- fairpr | .8514863 .0771848 11.03 0.000 .6987629 1.00421 fxcost | .1059598 .1238826 0.86 0.394 -.139163 .3510827 bidders | -17.3139 9.677152 -1.79 0.076 -36.4618 1.833996 rigged | 88.10779 59.66611 1.48 0.142 -29.9518 206.1674 _cons | 93.96637 64.09852 1.47 0.145 -32.86351 220.7963 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 23
iv. We check for heteroskedasticitiy first: rvfplot Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. We check for heteroskedasticitiy first: rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 552.19 Prob > chi2 = 0.0000 reject the null The hettest and the results indicate that we reject the null of homoskedasticity. Thus the regression is “tainted” by heteroskedasticity. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 24
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. To deal with heteroskedasticity we try log-specifications. Below the first specification is a linear-log and the second is the log-linear. First we generate the log-variables with the exception of rigged which is a dummy variable. . regress price lnfairpr lnfxcost lnbidders rigged linear-log specification price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 652.6868 139.2706 4.69 0.000 377.1162 928.2574 lnfxcost | 181.3509 101.7388 1.78 0.077 -19.95669 382.6584 lnbidders | -91.85952 185.6159 -0.49 0.622 -459.1323 275.4132 rigged | -243.3638 209.8808 -1.16 0.248 -658.6488 171.9213 _cons | -3635.201 537.064 -6.77 0.000 -4697.874 -2572.529 . regress lnprice fairpr fxcost bidders rigged log-linear specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fairpr | .0017956 .000211 8.51 0.000 .0013782 .002213 fxcost | -.0018892 .0003386 -5.58 0.000 -.0025591 -.0012193 bidders | .0598305 .0264482 2.26 0.025 .0074982 .1121629 rigged | .5460886 .163071 3.35 0.001 .2234247 .8687525 _cons | 4.643232 .1751851 26.50 0.000 4.296599 4.989866 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 25
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. How do we choose between the two (if any at all)? We check for curvature first: linear-log specification: “U”-shaped log-linear specification: “∩”-shaped ► The “U”-shaped rvfplot indicates that the y -variable has to be “logged” thus the next step from a linear-log specification is the log-log specification. ► The “∩”-shaped rvfplot indicates that the x-variable has to be “logged” thus the next step from a log-linear specification is the log-log specification. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 26
E[lnratio] 0 1·lnfairpr 2·lnfxcost 3·lnbidders 4·rigged Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. The log-log regression and its estimation are given below. E[lnratio] 0 1·lnfairpr 2·lnfxcost 3·lnbidders 4·rigged Figure 6. Regression results lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------+---------------------------------------------------------------- lnfairpr | .9845441 .0189004 52.09 0.000 .9471465 1.021942 lnfxcost | .0174492 .0138069 1.26 0.209 -.0098702 .0447686 lnbidders | -.0480919 .0251899 -1.91 0.058 -.0979344 .0017505 rigged | .1790167 .0284829 6.29 0.000 .1226585 .2353749 _cons | -.0269711 .0728848 -0.37 0.712 -.1711861 .1172439 rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lnprice chi2(1) = 0.09 Prob > chi2 = 0.7582 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 27
We exponentiate the above results to find: Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Predict the winning bid and provide an interval that will contain the winning bid with 95 percent confidence. v. The x-variables are: lnfairpr ln(1000) 6.907755, lnfxcost ln(700) 6.55108, lnbidders ln(4) 1.386294, rigged 0 We are asked to estimate and provide an interval for the level of the bidding thus we used these values in the kpredint command: . kpredint _b[_cons]+_b[lnfairpr]*6.9077+_b[lnfxcost]*6.5510+_b[lnbidders]*1.3862+_b[rigged]*0 Estimate: 6.8216599 Standard Error of Individual Prediction: .12481638 Individual Prediction Interval (95%): [6.5746894,7.0686304] We exponentiate the above results to find: estimate for fairpr: exp(6.8216599) 917,507 estimate for lower bound: exp(6.5746894) 716,723 estimate for upper bound: exp(7.0686304) 1,174,538 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 28
There are four possible specifications: Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► ODOT has a road reconstruction project that is in the early planning phase. Just before putting the job up for auction, it learns that an additional pedestrian bridge will be necessary as part of the project. This change will not affect job duration or road length, but will increase fixed costs (FxCost) by 15 percent and overall estimated costs (FairPr) by 5 percent. Develop a regression model to estimate the percentage increase in the winning bid (the Price of the contract) that will ultimately result from the change in projected costs. What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression vi. We are told how fairpr, fxcost, length and days will change (by 5%, 15%, 0%, and 0%, respectively) and all are plausibly related to the winning price, therefore we must initially include at least these variables in our regression. Since we are in the pre-announcement (planning) phase, the number of bidders and whether the auction will be rigged are not under our control, and might react to the changes – we must not include these variables in the initial regression. There are four possible specifications: Model Dependent variable Independent variable standard linear y x log-linear ln(y) linear-log ln(x) log-log © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 29
The linear-linear specification fails the heteroskedasticity tests. Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress price fairpr fxcost length days linear-linear specification price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fairpr | .5759849 .1048968 5.49 0.000 .3684287 .7835411 fxcost | .3944423 .1492031 2.64 0.009 .0992185 .689666 length | 13.26696 5.850897 2.27 0.025 1.689957 24.84396 days | .9197984 .2924231 3.15 0.002 .3411893 1.498407 _cons | -59.00671 43.61251 -1.35 0.178 -145.3015 27.28809 rvfplot : linear-linear specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 599.25 Prob > chi2 = 0.0000 reject the null The linear-linear specification fails the heteroskedasticity tests. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 30
The linear-log specification fails the heteroskedasticity tests. Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress price lnfairpr lnfxcost lnlength lndays linear-log specification price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 731.5671 216.1821 3.38 0.001 303.814 1159.32 lnfxcost | -70.77947 129.2247 -0.55 0.585 -326.4725 184.9136 lnlength | -170.3642 89.99115 -1.89 0.061 -348.4271 7.69867 lndays | 555.5358 272.7642 2.04 0.044 15.82517 1095.246 _cons | -5698.615 796.5472 -7.15 0.000 -7274.719 -4122.51 rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 87.32 Prob > chi2 = 0.0000 reject the null The linear-log specification fails the heteroskedasticity tests. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 31
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress lnprice fairpr fxcost length days log-linear specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fairpr | .000834 .0002781 3.00 0.003 .0002837 .0013843 fxcost | -.000801 .0003956 -2.02 0.045 -.0015838 -.0000183 length | .0495787 .0155129 3.20 0.002 .0188838 .0802735 days | .0033391 .0007753 4.31 0.000 .001805 .0048732 _cons | 4.712592 .1156327 40.75 0.000 4.483793 4.941391 rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 0.62 Prob > chi2 = 0.4314 cannot reject the null The log-linear specification passes the heteroskedasticity tests. However the ∩-shaped rvfplot suggests to log the x-variable. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 32
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress lnprice lnfairpr lnfxcost lnlength lndays log-log specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 1.000758 .0380092 26.33 0.000 .9255507 1.075966 lnfxcost | .0225473 .0227203 0.99 0.323 -.0224087 .0675034 lnlength | -.0008182 .0158223 -0.05 0.959 -.0321252 .0304889 lndays | -.0674284 .0479575 -1.41 0.162 -.1623206 .0274638 _cons | .1588012 .1400493 1.13 0.259 -.1183103 .4359126 rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 0.10 Prob > chi2 = 0.7572 cannot reject the null The log-log specification passes the heteroskedasticity tests and the rvfplot suggests no curvature-related issues © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 33
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression vi. The final test here is for multicollinearity of lnfairpr and lnfxcost. . vif Variable | VIF 1/VIF -------------+---------------------- lnfairpr | 14.56 0.068685 lnfxcost | 10.32 0.096924 lndays | 7.39 0.135314 lnlength | 1.98 0.506191 Mean VIF | 8.56 . testparm lnfairpr lnfxcost ( 1) lnfairpr = 0 ( 2) lnfxcost = 0 F( 2, 128) = 695.42 Prob > F = 0.0000 reject the null The vif shows inflated standard errors for lnfair and lnfxcost but the testparm F-test indicates that the two variables are jointly significant so keep them in the regression. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 34
vii. The estimated regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia Using this regression, what is your estimate for the percentage increase in the Price of this contract? vii. The estimated regression is . regress lnprice lnfairpr lnfxcost lnlength lndays log-log specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 1.000758 .0380092 26.33 0.000 .9255507 1.075966 lnfxcost | .0225473 .0227203 0.99 0.323 -.0224087 .0675034 lnlength | -.0008182 .0158223 -0.05 0.959 -.0321252 .0304889 lndays | -.0674284 .0479575 -1.41 0.162 -.1623206 .0274638 _cons | .1588012 .1400493 1.13 0.259 -.1183103 .4359126 We are given the percentage based change in each of the four variables and since we are using a log-log specification we can directly plug the percentage changes for x-variables into the regression to find the corresponding percentage-based change in the y-variable. Thus multiplying the percentage changes by their respective coefficients we get the estimated percentage change in price as 1.0008·5 0.02255·15 0.0008·0 0.0674·0 5.34 Thus price increases by 5.34%. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 35