Presentation is loading. Please wait.

Presentation is loading. Please wait.

business analytics II ▌applications cigarettes  car dealership 

Similar presentations


Presentation on theme: "business analytics II ▌applications cigarettes  car dealership "— Presentation transcript:

1 business analytics II ▌applications cigarettes  car dealership 
Managerial Economics & Decision Sciences Department Developed for business analytics II week 9 week 10 ▌applications cigarettes  car dealership  horse racing  orangia  week 3 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

2  estimate the model, interpret coefficients
session ten applications Developed for business analytics II learning objectives ► linear regression  estimate the model, interpret coefficients  statistical significance, p-value and confidence intervals ► confidence and prediction intervals  klincom and kpredint commands: use and misuse ► dummy variables  definition and interpretation of dummy and slope dummy variables  use of dummy and slope dummy regressions in hypothesis testing ► pitfalls for linear regression  omitted variable bias: identify the bias  multicolinearity: test and correct  spurious regression: identify  heteroskedasticity: identify(test) and correct  curvature: identify and correct ► non-linear models  log specification: definition, estimation and interpretation ► panel data models  assumptions, use and estimation of fixed effects models © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

3 Est.E[SALES]  b1·NICOTINE
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► For each store in a random sample of 100 stores, the following information was recorded:  SALES The number of packs sold in a year  NICOTINE The nicotine content of the cigarettes in milligrams per cigarette  STORE Dummy variable that equals 0 for a convenience store and equals 1 for a supermarket ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) Using this regression, estimate the change in sales of cigarette packs in a supermarket if the nicotine content of the cigarettes is reduced by 0.2 milligrams per cigarette. i. Identify the change: Est.E[SALES]  b1·NICOTINE where b1  257 and NICOTINE   0.2, thus Est.E[SALES]  257·( 0.2)   51.4 Remark: A change in level is always related to the change in one or several “x-variables” and their slopes. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 1

4 b1  Std.Error[b1]tdf,/2  1  b1  Std.Error[b1]tdf,/2
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► For each store in a random sample of 100 stores, the following information was recorded:  SALES The number of packs sold in a year  NICOTINE The nicotine content of the cigarettes in milligrams per cigarette  STORE Dummy variable that equals 0 for a convenience store and equals 1 for a supermarket ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) Provide a 95% interval that contains the true change in sales given this reduction in nicotine content. ii. The general form of an interval with confidence level 1   is Estimate  Std.Error[Estimate]tdf,/2  True Value of Estimate  Estimate  Std.Error[Estimate]tdf,/2 Since Estimate  b1·NICOTINE the above interval can be based on the interval for 1 multiplied by NICOTINE. The interval for 1 is simply b1  Std.Error[b1]tdf,/2  1  b1  Std.Error[b1]tdf,/2 where b1  257, Std.Error[b1]  and tdf,/2  invttail(97,0.025)  The interval for 1 is thus [257  ·105.2, 257  ·105.2]  [ , ] and the interval for 1·NICOTINE is [ ,  9.642]. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 2

5 cigarettes manufacturing
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) ► A second regression is reported: Est.E[SALES]  2739  335·NICOTINE (137.7) The coefficient for NICOTINE in the first regression is lower than in the second regression. Why is that the case? What does this imply about the types of cigarettes that are sold at convenience stores as compared to supermarkets? iii. The observed difference in estimated coefficients is most likely a result of omitted variable bias where the omitted variable (in the second regression) is STORE: b1  b1*. The overestimation means b2·a1  0 and since we already know that b2  0 it must be the case that a1  0. correlation channel correlation direct channel causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 3

6 cigarettes manufacturing
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) ► A second regression is reported: Est.E[SALES]  2739  335·NICOTINE (137.7) The coefficient for NICOTINE in the first regression is lower than in the second regression. Why is that the case? What does this imply about the types of cigarettes that are sold at convenience stores as compared to supermarkets? iii. Having a1  0 for relation means that STORE and NICOTINE are positively related:  high level for STORE (i.e., STORE  1) it is likely associated with high levels for NICOTINE  low level for STORE (i.e., STORE  0) it is likely associated with low levels for NICOTINE Thus supermarkets (STORE  1) are likely to sell cigarettes with higher NICOTINE level than do convenience stores (STORE  0). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 4

7 car dealership i. The regression is
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership ► You have collected data from a random sample of 62 past transactions (auto.dta), which contain the following variables: • GENDER gender of the buyer, equal to 1 if male and 0 if female • INCOME yearly income of the buyer in $ • AGE age of the buyer in years • COLLEGE a dummy variable equal to 1 if the buyer is a college graduate and 0 otherwise • PRICE the price of the car in $ Run a regression of price on the remaining 4 variables. Report the estimated regression equation. Do not drop any variables from the regression. i. The regression is Est.E[price]  2,  1,444.20·gender  ·income  15.59·age  2,080.86·college Figure 1. Regression results price | Coef. Std. Err t P>|t| [95% Conf. Interval] gender | income | age | college | _cons | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 5

8 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership Can you prove at a 10% significance level that the average price of cars bought by 30 year-old male college graduates with income of $90,000 is higher than $20,000. ii. We are asked to evaluate whether the level of selling price is greater than a certain level (20,000): We base the hypothesis on E[price]  0  1 ·gender  2·income  3·age  4·college with gender  1 (male), income  90,000, age  30 and college  1 (college graduate) hypothesis H0: E[price]  20,000 Ha: E[price]  20,000 set hypotheses hypothesis H0: 0  1·1  2·90,000  3·30  4·1  20,000 Ha: 0  1·1  2·90,000  3·30  4·1  20,000 set hypotheses We test a combination of coefficients using either klincom or kpredint (here we deal with average across graduates): . klincom _b[_cons]+_b[gender]*1+_b[income]*90000+_b[age]*30+_b[college]* price | Coef. Std. Err t P>|t| [90% Conf. Interval] (1) | If Ha: < then Pr(T < t) = .949 If Ha: not = then Pr(|T| > |t|) = .102 If Ha: > then Pr(T > t) = .051 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 6

9 E[price]  0  1·0  2·80,000  3·45  4·1
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership Jane is a woman and 45 years old, has a college degree and earns $80,000. Provide a range of values that you are 95% confident will contain the price of Jane’s next car.. iii. We are asked to provide an interval for the level of selling price for one individual with certain characteristics. The interval has the form: Est.E[price]  Std.Err.[price]tdf,/2  E[price]  Est.E[price]  Std.Err.[price]tdf,/2 Since we are asked for an interval for the level of the dependent variable we use either klincom or kpredint. Here the question is about the level of selling price for one individual thus we use kpredint. We are given gender  0 (female), income  80,000, age  45 and college  1 (college graduate) thus the interval is for E[price]  0  1·0  2·80,000  3·45  4·1 . kpredint _b[_cons]+_b[gender]*0+_b[income]*80000+_b[age]*45+_b[college]*1 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] t-ratio: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 the prediction interval © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 7

10 genderincome  gender·income
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. iv. Here the interaction between gender and income is fairly transparent thus a slope dummy variable defined as genderincome  gender·income will help testing the claims above. The regression becomes E[price]  0  1 ·gender  2·income  3·age  4·college  5·genderincome and the estimated regression is shown below. Figure 2. Regression results . generate genderincome=gender*income . regress price gender income genderincome age college price | Coef. Std. Err t P>|t| [95% Conf. Interval] gender | income | age | college | genderincome | _cons | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 8

11 E[price]  2·income  5·genderincome
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. v. Based on the regression E[price]  0  1 ·gender  2·income  3·age  4·college  5·genderincome we can now test claims that relate the selling price with the level of income for different genders. In particular we can test claims that relate the change in selling price with the change in income: E[price]  2·income  5·genderincome  The first claim is that income has no impact on selling price for women (gender  0). But for gender  0 we get E[price]  2·income thus “income has no impact on selling price for women” means to test: hypothesis H0: 2  0 Ha: 2  0 set hypotheses From the regression table we find immediately for income pvalue  thus we reject the null. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 9

12 v. Based on the regression
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. v. Based on the regression E[price]  0  1 ·gender  2·income  3·age  4·college  5·genderincome we can now test claims that relate the selling price with the level of income for different genders. In particular we can test claims that relate the change in selling price with the change in income: E[price]  2·income  5·genderincome  The second claim is that men (gender  0) tend to buy more expensive cars the higher the income. For gender  1 we get E[price]  2·income  5·income  (2  5)·income thus the claim means to test: hypothesis H0: 2  5  0 Ha: 2  5  0 set hypotheses We need to run klincom in order to test this hypothesis: klincom _b[income]*1  _b[genderincome]*1 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 10

13 horse racing ► The regression is
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio Based on Steve’s regression, provide a point estimate for the difference in lnodds for two horses that are identical except that one just suffered a minor injury whereas the other did not, assuming the two horses are competing in the same race? i. We are considering two horses identical in all respects except for the number of injuries: no injury: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·0  b5·novice  b6·ratio  b7·noviceratio injury: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·1  b5·novice  b6·ratio  b7·noviceratio Thus lnodds for these two horses is simply b4  (the difference between the two equations). Notice that the horses are identical otherwise thus the values for all other variables are the same for the two horses so “they cancel out when taking the difference between the two equations above. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 11

14 horse racing ► The regression is
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio A horse is participating in a race against seven other horses. Based on Steve’s regression, all other factors in the regression held fixed, how would the odds on that horse be affected if two additional horses joined the race? ii. We are considering two races: initial race has 8 horses while the second has 10 horses. Thus 8 horses: Est.E[lnodds]  b0  b1·distance  b2·8  b3·last  b4·injury  b5·novice  b6·ratio  b7·noviceratio 10 horses : Est.E[lnodds]  b0  b1·distance  b2·10  b3·last  b4·injury  b5·novice  b6·ratio  b7·noviceratio Thus lnodds for these two races is simply lnodds  2·b2  Odds change by 10.93% when two horses are added. Notice that the two equations above are for the same horse thus all the characteristics are identical so “they cancel out when taking the difference between the two equations above. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 12

15 Est.E[lnodds]  b6·ratio  b7·ratio  (b6  b7)·ratio
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio Steve claims that for novice horses the ratio of past wins is irrelevant for the horse’s odds, all other factors in the regression held fixed. Can you prove him wrong using a 10% level of significance? iii. We are considering novice horses (for which novice  1) and we are interested whether the change in ratio has any effect on lnodds. For novice  1: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·1  b6·ratio  b7·ratio thus Est.E[lnodds]  b6·ratio  b7·ratio  (b6  b7)·ratio Steve claims basically requires a test of the following hypothesis H0: 6  7  0 Ha: 6  7  0 set hypotheses The command klincom _b[ratio]  _b[noviceratio] provides the following: If Ha: < then Pr(T < t) = .053 If Ha: not = then Pr(|T| > |t|) = .106 If Ha: > then Pr(T > t) = .947 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 13

16 sprinterdistance  sprinter·distance
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. What would you add to the regression in Part I to allow you to evaluate this claim?. iv. We are clearly looking at interaction between being a sprinter and the length of the race thus a slope dummy capturing this interaction is required: sprinterdistance  sprinter·distance The regression becomes (we need to include also the dummy sprinter): Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury   b5·novice  b6·ratio  b7·rationovice   b8·sprinter  b9·sprinterdistance © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 14

17 v. The estimated regression is
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·sprinter  b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. Carry out the modification you suggest in part v. and write down the new estimated regression equation. v. The estimated regression is Figure 3. Regression results . generate sprinterdistance = sprinter*distance . regress lnodds distance starters last injured novice ratio noviceratio sprinter sprinterdistance lnOdds | Coef. Std. Err t P>|t| [95% Conf. Interval] distance | starters | last | injured | novice | ratio | noviceratio | sprinter | sprinterdistance | _cons | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 15

18 Est.E[lnodds]|sprinter  0
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·sprinter  b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. In terms of your new regression model, what must be true in order for Steve’s claim to be correct? Test the claim. vi. We need to evaluate the change in lnodds for sprinters as we change distance: sprinters: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·1  b9·distance The change in lnodds for a change in distance is thus: sprinters: Est.E[lnodds]|sprinter  b1·distance  b9·distance  (b1  b9)·distance Steve’s claim is that Est.E[lnodds]|sprinter  0 Using the expression above the claim is really about b1  b9  0 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 16

19 horse racing ► The regression is vii. The test is about
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·sprinter  b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. In terms of your new regression model, what must be true in order for Steve’s claim to be correct? Test the claim. vii. The test is about hypothesis H0: 1  9  0 Ha: 1  9  0 set hypotheses The command klincom _b[distance]  _b[sprinterdistance] provides the following: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 17

20 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing Claim: “Older horses would probably be more prone to injury and that older horses are also less likely to win, i.e. have higher odds.” Steve counters that older horses are in fact less likely to get injured (young horses are less disciplined and get minor injuries all the time), but agrees that older horses are less likely to win, all else equal. He reruns the regression with Age in it in addition to all the original variables. In this new regression, the estimated coefficient on injured is 1.305 viii. When Age is omitted from the regression, the coefficient of injured carries an omitted variable bias. The sign of ovb equals the product of (a) the sign of the relation between Age and lnOdds – this is b10 below (b) the sign of the relation between injured and Age – this is a2 below  Alison and Steve agree that Age and odds are positively related, thus b10  0.  Alison believes that injured and Age are also positively related, thus a1  0, therefore the ovb according to her must be positive which corresponds to overestimation, thus b4*  b4.  Steve says that injured and Age are negatively related, therefore he expects a negative ovb which corresponds to underestimation, thus b4*  b4. correlation channel  When Steve reruns the regression, with Age included among the regressors, the coefficient on injured goes down, i.e. b4*  b4. This finding is consistent with Alison’s opinion, but not with Steve’s. direct channel correlation causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 18

21 Est.E[ratio]  b6·days  0.0002077·( 250)   0.051925
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio]  0  1·fairpr  2·bidders  3·rigged  4·length  5·fxcost  6·days Based on the regression in the table, give your best estimate and a 90 percent confidence interval of what will happen to the ratio of the actual price to the estimated cost if the number of days for a project decreases by 250, holding the other independent variables fixed. i. We are asked to evaluate the change in ratio for a change in days with everything else held constant: E[ratio]  6·days where days  250. Thus the estimated change is: Est.E[ratio]  b6·days  ·( 250)   We can obtain the 90% confidence interval for this change using klincom (alternative use the standard deviation for the coefficient on days and the required tvalue): . klincom _b[days]*(-250), level(90) ratio | Coef. Std. Err t P>|t| [90% Conf. Interval] (1) | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 19

22 E[ratio]  2·bidders
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio]  0  1·fairpr  2·bidders  3·rigged  4·length  5·fxcost  6·days Can you claim at the 5 percent significance level that an increase in the number of bidders, holding the other independent variables fixed, would on average decrease the project’s ratio of actual price to estimated cost? ii. We are asked to evaluate the change in ratio for a change in bidders with everything else held constant: E[ratio]  2·bidders The claim is that an increase in bidders result in a decrease in ratio, that is we should test: hypothesis H0: 2  0 Ha: 2  0 set hypotheses Running klincom _b[bidders] gives ( 1) bidders = 0 If Ha: < then Pr(T < t) = .031 If Ha: not = then Pr(|T| > |t|) = .063 If Ha: > then Pr(T > t) = .969 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 20

23 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio]  0  1·fairpr  2·bidders  3·rigged  4·length  5·fxcost  6·days Would it be legitimate to drop the variables FairPr and FxCost from the regression if you wanted to do so? If the answer is yes, write down the new estimated regression equation. iii. We run the multicolinearity tests vif and testparm for fairpr and fxcost: . vif Variable | VIF /VIF fairpr | fxcost | days | length | bidders | rigged | Mean VIF | . testparm fairpr fxcost ( 1) fairpr = 0 ( 2) fxcost = 0 F( 2, 126) = Prob > F = The vif indicates inflated standard errors for fairpr and fxcost. Since the pvalue for the Ftest is definitely higher than any reasonable significance level we can reject the joint hypothesis that the two variable are jointly significant. We will have to drop both variables and re-run the regression without these two variables as regressors. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 21

24 iii. The estimated regression is
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The new regression is: E[ratio]  0  1·bidders  2·rigged  3·length  4·days Would it be legitimate to drop the variables FairPr and FxCost from the regression if you wanted to do so? If the answer is yes, write down the new estimated regression equation. iii. The estimated regression is Figure 4. Regression results ratio | Coef. Std. Err t P>|t| [95% Conf. Interval] bidders | rigged | length | days | _cons | rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of ratio chi2(1) = Prob > chi2 = cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 22

25 E[price]  0  1·fairpr  2·fxcost  3·bidders  4·rigged
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. The variables for which we have values are: fairpr  1,000, fxcost  700, bidders  4, and rigged  0. All of these variables are plausibly related to the winning bid (price). Therefore, we should initially run a regression of price against these four variables. We do not include any slope-dummies (as was requested). E[price]  0  1·fairpr  2·fxcost  3·bidders  4·rigged The estimated regression is Figure 5. Regression results price | Coef. Std. Err t P>|t| [95% Conf. Interval] fairpr | fxcost | bidders | rigged | _cons | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 23

26 iv. We check for heteroskedasticitiy first: rvfplot
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. We check for heteroskedasticitiy first: rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = Prob > chi2 = reject the null The hettest and the results indicate that we reject the null of homoskedasticity. Thus the regression is “tainted” by heteroskedasticity. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 24

27 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. To deal with heteroskedasticity we try log-specifications. Below the first specification is a linear-log and the second is the log-linear. First we generate the log-variables with the exception of rigged which is a dummy variable. . regress price lnfairpr lnfxcost lnbidders rigged linear-log specification price | Coef. Std. Err t P>|t| [95% Conf. Interval] lnfairpr | lnfxcost | lnbidders | rigged | _cons | . regress lnprice fairpr fxcost bidders rigged log-linear specification lnprice | Coef. Std. Err t P>|t| [95% Conf. Interval] fairpr | fxcost | bidders | rigged | _cons | © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 25

28 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. How do we choose between the two (if any at all)? We check for curvature first: linear-log specification: “U”-shaped log-linear specification: “∩”-shaped ► The “U”-shaped rvfplot indicates that the y -variable has to be “logged” thus the next step from a linear-log specification is the log-log specification. ► The “∩”-shaped rvfplot indicates that the x-variable has to be “logged” thus the next step from a log-linear specification is the log-log specification. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 26

29 E[lnratio]  0  1·lnfairpr  2·lnfxcost  3·lnbidders  4·rigged
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. The log-log regression and its estimation are given below. E[lnratio]  0  1·lnfairpr  2·lnfxcost  3·lnbidders  4·rigged Figure 6. Regression results lnprice | Coef. Std. Err t P>|t| [95% Conf. Interval] lnfairpr | lnfxcost | lnbidders | rigged | _cons | rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lnprice chi2(1) = Prob > chi2 = cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 27

30 We exponentiate the above results to find:
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Predict the winning bid and provide an interval that will contain the winning bid with 95 percent confidence. v. The x-variables are: lnfairpr  ln(1000)  , lnfxcost  ln(700)  , lnbidders  ln(4)  , rigged  0 We are asked to estimate and provide an interval for the level of the bidding thus we used these values in the kpredint command: . kpredint _b[_cons]+_b[lnfairpr]* _b[lnfxcost]* _b[lnbidders]* _b[rigged]*0 Estimate: Standard Error of Individual Prediction: Individual Prediction Interval (95%): [ , ] We exponentiate the above results to find: estimate for fairpr: exp( )  917,507 estimate for lower bound: exp( )  716,723 estimate for upper bound: exp( )  1,174,538 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 28

31 There are four possible specifications:
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► ODOT has a road reconstruction project that is in the early planning phase. Just before putting the job up for auction, it learns that an additional pedestrian bridge will be necessary as part of the project. This change will not affect job duration or road length, but will increase fixed costs (FxCost) by 15 percent and overall estimated costs (FairPr) by 5 percent. Develop a regression model to estimate the percentage increase in the winning bid (the Price of the contract) that will ultimately result from the change in projected costs. What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression vi. We are told how fairpr, fxcost, length and days will change (by 5%, 15%, 0%, and 0%, respectively) and all are plausibly related to the winning price, therefore we must initially include at least these variables in our regression. Since we are in the pre-announcement (planning) phase, the number of bidders and whether the auction will be rigged are not under our control, and might react to the changes – we must not include these variables in the initial regression. There are four possible specifications: Model Dependent variable Independent variable standard linear y x log-linear ln(y) linear-log ln(x) log-log © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 29

32 The linear-linear specification fails the heteroskedasticity tests.
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress price fairpr fxcost length days linear-linear specification price | Coef. Std. Err t P>|t| [95% Conf. Interval] fairpr | fxcost | length | days | _cons | rvfplot : linear-linear specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = Prob > chi2 = reject the null The linear-linear specification fails the heteroskedasticity tests. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 30

33 The linear-log specification fails the heteroskedasticity tests.
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress price lnfairpr lnfxcost lnlength lndays linear-log specification price | Coef. Std. Err t P>|t| [95% Conf. Interval] lnfairpr | lnfxcost | lnlength | lndays | _cons | rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = Prob > chi2 = reject the null The linear-log specification fails the heteroskedasticity tests. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 31

34 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress lnprice fairpr fxcost length days log-linear specification lnprice | Coef. Std. Err t P>|t| [95% Conf. Interval] fairpr | fxcost | length | days | _cons | rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = Prob > chi2 = cannot reject the null The log-linear specification passes the heteroskedasticity tests. However the ∩-shaped rvfplot suggests to log the x-variable. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 32

35 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress lnprice lnfairpr lnfxcost lnlength lndays log-log specification lnprice | Coef. Std. Err t P>|t| [95% Conf. Interval] lnfairpr | lnfxcost | lnlength | lndays | _cons | rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = Prob > chi2 = cannot reject the null The log-log specification passes the heteroskedasticity tests and the rvfplot suggests no curvature-related issues © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 33

36 Managerial Economics &
Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression vi. The final test here is for multicollinearity of lnfairpr and lnfxcost. . vif Variable | VIF /VIF lnfairpr | lnfxcost | lndays | lnlength | Mean VIF | . testparm lnfairpr lnfxcost ( 1) lnfairpr = 0 ( 2) lnfxcost = 0 F( 2, 128) = Prob > F = reject the null The vif shows inflated standard errors for lnfair and lnfxcost but the testparm F-test indicates that the two variables are jointly significant so keep them in the regression. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 34

37 vii. The estimated regression is
Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia Using this regression, what is your estimate for the percentage increase in the Price of this contract? vii. The estimated regression is . regress lnprice lnfairpr lnfxcost lnlength lndays log-log specification lnprice | Coef. Std. Err t P>|t| [95% Conf. Interval] lnfairpr | lnfxcost | lnlength | lndays | _cons | We are given the percentage based change in each of the four variables and since we are using a log-log specification we can directly plug the percentage changes for x-variables into the regression to find the corresponding percentage-based change in the y-variable. Thus multiplying the percentage changes by their respective coefficients we get the estimated percentage change in price as 1.0008·5  ·15  ·0  ·0  5.34 Thus price increases by 5.34%. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 35


Download ppt "business analytics II ▌applications cigarettes  car dealership "

Similar presentations


Ads by Google