business analytics II ▌applications cigarettes  car dealership 

Slides:



Advertisements
Similar presentations
C 3.7 Use the data in MEAP93.RAW to answer this question
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econ 488 Lecture 5 – Hypothesis Testing Cameron Kaplan.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Chapter 13 Multiple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Chapter 11 Multiple Regression.
Lecture 23 Multiple Regression (Sections )
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Hypothesis Testing in Linear Regression Analysis
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Lecture 4 Introduction to Multiple Regression
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
IE241 Final Exam. 1. What is a test of a statistical hypothesis? Decision rule to either reject or not reject the null hypothesis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Managerial Economics & Decision Sciences Department introduction  inflated standard deviations  the F  test  business analytics II Developed for ©
Managerial Economics & Decision Sciences Department hypotheses, test and confidence intervals  linear regression: estimation and interpretation  linear.
Managerial Economics & Decision Sciences Department non-linearity  heteroskedasticity  clustering  business analytics II Developed for © 2016 kellogg.
Managerial Economics & Decision Sciences Department cross-section and panel data  fixed effects  omitted variable bias  business analytics II Developed.
SECTION 1 TEST OF A SINGLE PROPORTION
Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management.
Managerial Economics & Decision Sciences Department true and truncated relations  the omitted variable bias effect  spurious regressions  business analytics.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression Analysis: Inference
business analytics II ▌assignment three - solutions pet food 
business analytics II ▌assignment four - solutions mba for yourself 
Chapter 15 Multiple Regression Model Building
business analytics II ▌assignment three - solutions pet food 
QM222 Class 9 Section A1 Coefficient statistics
Chapter 4 Basic Estimation Techniques
business analytics II ▌appendix – regression performance the R2 
Inference for Least Squares Lines
assignment 7 solutions ► office networks ► super staffing
26134 Business Statistics Week 5 Tutorial
business analytics II ▌assignment one - solutions autoparts 
Inference and Tests of Hypotheses
business analytics II ▌panel data models
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
business analytics II ▌applications fuel efficiency 
Multiple Regression Analysis and Model Building
assignment 8 solutions ► yogurt brands Developed for
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Multiple logistic regression
I271B Quantitative Methods
Chapter 9 Hypothesis Testing.
Simple Linear Regression
Chapter 7: The Normality Assumption and Inference with OLS
Applied Economic Analysis
Product moment correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

business analytics II ▌applications cigarettes  car dealership  Managerial Economics & Decision Sciences Department Developed for business analytics II week 9 week 10 ▌applications cigarettes  car dealership  horse racing  orangia  week 3 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

 estimate the model, interpret coefficients session ten applications Developed for business analytics II learning objectives ► linear regression  estimate the model, interpret coefficients  statistical significance, p-value and confidence intervals ► confidence and prediction intervals  klincom and kpredint commands: use and misuse ► dummy variables  definition and interpretation of dummy and slope dummy variables  use of dummy and slope dummy regressions in hypothesis testing ► pitfalls for linear regression  omitted variable bias: identify the bias  multicolinearity: test and correct  spurious regression: identify  heteroskedasticity: identify(test) and correct  curvature: identify and correct ► non-linear models  log specification: definition, estimation and interpretation ► panel data models  assumptions, use and estimation of fixed effects models © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II

Est.E[SALES]  b1·NICOTINE Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► For each store in a random sample of 100 stores, the following information was recorded:  SALES The number of packs sold in a year  NICOTINE The nicotine content of the cigarettes in milligrams per cigarette  STORE Dummy variable that equals 0 for a convenience store and equals 1 for a supermarket ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) Using this regression, estimate the change in sales of cigarette packs in a supermarket if the nicotine content of the cigarettes is reduced by 0.2 milligrams per cigarette. i. Identify the change: Est.E[SALES]  b1·NICOTINE where b1  257 and NICOTINE   0.2, thus Est.E[SALES]  257·( 0.2)   51.4 Remark: A change in level is always related to the change in one or several “x-variables” and their slopes. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 1

b1  Std.Error[b1]tdf,/2  1  b1  Std.Error[b1]tdf,/2 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► For each store in a random sample of 100 stores, the following information was recorded:  SALES The number of packs sold in a year  NICOTINE The nicotine content of the cigarettes in milligrams per cigarette  STORE Dummy variable that equals 0 for a convenience store and equals 1 for a supermarket ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) Provide a 95% interval that contains the true change in sales given this reduction in nicotine content. ii. The general form of an interval with confidence level 1   is Estimate  Std.Error[Estimate]tdf,/2  True Value of Estimate  Estimate  Std.Error[Estimate]tdf,/2 Since Estimate  b1·NICOTINE the above interval can be based on the interval for 1 multiplied by NICOTINE. The interval for 1 is simply b1  Std.Error[b1]tdf,/2  1  b1  Std.Error[b1]tdf,/2 where b1  257, Std.Error[b1]  105.2 and tdf,/2  invttail(97,0.025)  1.9847 The interval for 1 is thus [257  1.9847·105.2, 257  1.9847·105.2]  [48.20956,465.79044] and the interval for 1·NICOTINE is [ 93.158,  9.642]. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 2

cigarettes manufacturing Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) ► A second regression is reported: Est.E[SALES]  2739  335·NICOTINE (137.7) The coefficient for NICOTINE in the first regression is lower than in the second regression. Why is that the case? What does this imply about the types of cigarettes that are sold at convenience stores as compared to supermarkets? iii. The observed difference in estimated coefficients is most likely a result of omitted variable bias where the omitted variable (in the second regression) is STORE: b1  b1*. The overestimation means b2·a1  0 and since we already know that b2  0 it must be the case that a1  0. correlation channel correlation direct channel causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 3

cigarettes manufacturing Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ cigarettes manufacturing ► A regression of the number of packs sold against nicotine content and store type yields the following output (the standard error of each coefficient is reported below the coefficient): Est.E[SALES]  2127  257·NICOTINE  1137·STORE (105.2) (247.3) ► A second regression is reported: Est.E[SALES]  2739  335·NICOTINE (137.7) The coefficient for NICOTINE in the first regression is lower than in the second regression. Why is that the case? What does this imply about the types of cigarettes that are sold at convenience stores as compared to supermarkets? iii. Having a1  0 for relation means that STORE and NICOTINE are positively related:  high level for STORE (i.e., STORE  1) it is likely associated with high levels for NICOTINE  low level for STORE (i.e., STORE  0) it is likely associated with low levels for NICOTINE Thus supermarkets (STORE  1) are likely to sell cigarettes with higher NICOTINE level than do convenience stores (STORE  0). © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 4

car dealership i. The regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership ► You have collected data from a random sample of 62 past transactions (auto.dta), which contain the following variables: • GENDER gender of the buyer, equal to 1 if male and 0 if female • INCOME yearly income of the buyer in $ • AGE age of the buyer in years • COLLEGE a dummy variable equal to 1 if the buyer is a college graduate and 0 otherwise • PRICE the price of the car in $ Run a regression of price on the remaining 4 variables. Report the estimated regression equation. Do not drop any variables from the regression. i. The regression is Est.E[price]  2,280.36  1,444.20·gender  0.1861·income  15.59·age  2,080.86·college Figure 1. Regression results price | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- gender | 1444.197 738.0868 1.96 0.055 -33.79631 2922.19 income | .1860856 .0246546 7.55 0.000 .1367155 .2354556 age | -15.58905 46.07751 -0.34 0.736 -107.8577 76.67957 college | 2080.855 673.0907 3.09 0.003 733.0142 3428.696 _cons | 2280.362 1271.326 1.79 0.078 -265.4246 4826.149 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 5

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership Can you prove at a 10% significance level that the average price of cars bought by 30 year-old male college graduates with income of $90,000 is higher than $20,000. ii. We are asked to evaluate whether the level of selling price is greater than a certain level (20,000): We base the hypothesis on E[price]  0  1 ·gender  2·income  3·age  4·college with gender  1 (male), income  90,000, age  30 and college  1 (college graduate) hypothesis H0: E[price]  20,000 Ha: E[price]  20,000 set hypotheses hypothesis H0: 0  1·1  2·90,000  3·30  4·1  20,000 Ha: 0  1·1  2·90,000  3·30  4·1  20,000 set hypotheses We test a combination of coefficients using either klincom or kpredint (here we deal with average across graduates): . klincom _b[_cons]+_b[gender]*1+_b[income]*90000+_b[age]*30+_b[college]*1-20000 price | Coef. Std. Err. t P>|t| [90% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 2085.445 1254.126 1.66 0.102 -11.49022 4182.381 ------------------------------------------------------------------------------ If Ha: < then Pr(T < t) = .949 If Ha: not = then Pr(|T| > |t|) = .102 If Ha: > then Pr(T > t) = .051 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 6

E[price]  0  1·0  2·80,000  3·45  4·1 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership Jane is a woman and 45 years old, has a college degree and earns $80,000. Provide a range of values that you are 95% confident will contain the price of Jane’s next car.. iii. We are asked to provide an interval for the level of selling price for one individual with certain characteristics. The interval has the form: Est.E[price]  Std.Err.[price]tdf,/2  E[price]  Est.E[price]  Std.Err.[price]tdf,/2 Since we are asked for an interval for the level of the dependent variable we use either klincom or kpredint. Here the question is about the level of selling price for one individual thus we use kpredint. We are given gender  0 (female), income  80,000, age  45 and college  1 (college graduate) thus the interval is for E[price]  0  1·0  2·80,000  3·45  4·1 . kpredint _b[_cons]+_b[gender]*0+_b[income]*80000+_b[age]*45+_b[college]*1 Estimate: 18546.557 Standard Error of Individual Prediction: 2492.0427 Individual Prediction Interval (95%): [13556.327,23536.786] t-ratio: 7.4423108 If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 the prediction interval © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 7

genderincome  gender·income Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. iv. Here the interaction between gender and income is fairly transparent thus a slope dummy variable defined as genderincome  gender·income will help testing the claims above. The regression becomes E[price]  0  1 ·gender  2·income  3·age  4·college  5·genderincome and the estimated regression is shown below. Figure 2. Regression results . generate genderincome=gender*income . regress price gender income genderincome age college price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | 2124.783 2294.148 0.93 0.358 -2470.948 6720.514 income | .2074518 .0725208 2.86 0.006 .0621751 .3527285 age | -18.27614 47.23003 -0.39 0.700 -112.8893 76.33697 college | 2043.064 689.0966 2.96 0.004 662.6371 3423.49 genderincome | -.0212742 .0678361 -0.31 0.755 -.1571662 .1146179 _cons | 1721.126 2195.929 0.78 0.436 -2677.848 6120.101 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 8

E[price]  2·income  5·genderincome Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. v. Based on the regression E[price]  0  1 ·gender  2·income  3·age  4·college  5·genderincome we can now test claims that relate the selling price with the level of income for different genders. In particular we can test claims that relate the change in selling price with the change in income: E[price]  2·income  5·genderincome  The first claim is that income has no impact on selling price for women (gender  0). But for gender  0 we get E[price]  2·income thus “income has no impact on selling price for women” means to test: hypothesis H0: 2  0 Ha: 2  0 set hypotheses From the regression table we find immediately for income pvalue  0.006 thus we reject the null. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 9

v. Based on the regression Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ car dealership How would you modify the regression to allow you to test the following claims: “Women buy the same cars, i.e. cars with the same price, regardless of income level, while men tend to buy more expensive cars the higher income they have”? Run the new regression and report the new estimated regression equation. v. Based on the regression E[price]  0  1 ·gender  2·income  3·age  4·college  5·genderincome we can now test claims that relate the selling price with the level of income for different genders. In particular we can test claims that relate the change in selling price with the change in income: E[price]  2·income  5·genderincome  The second claim is that men (gender  0) tend to buy more expensive cars the higher the income. For gender  1 we get E[price]  2·income  5·income  (2  5)·income thus the claim means to test: hypothesis H0: 2  5  0 Ha: 2  5  0 set hypotheses We need to run klincom in order to test this hypothesis: klincom _b[income]*1  _b[genderincome]*1 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 10

horse racing ► The regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio Based on Steve’s regression, provide a point estimate for the difference in lnodds for two horses that are identical except that one just suffered a minor injury whereas the other did not, assuming the two horses are competing in the same race? i. We are considering two horses identical in all respects except for the number of injuries: no injury: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·0  b5·novice  b6·ratio  b7·noviceratio injury: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·1  b5·novice  b6·ratio  b7·noviceratio Thus lnodds for these two horses is simply b4  1.998479 (the difference between the two equations). Notice that the horses are identical otherwise thus the values for all other variables are the same for the two horses so “they cancel out when taking the difference between the two equations above. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 11

horse racing ► The regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio A horse is participating in a race against seven other horses. Based on Steve’s regression, all other factors in the regression held fixed, how would the odds on that horse be affected if two additional horses joined the race? ii. We are considering two races: initial race has 8 horses while the second has 10 horses. Thus 8 horses: Est.E[lnodds]  b0  b1·distance  b2·8  b3·last  b4·injury  b5·novice  b6·ratio  b7·noviceratio 10 horses : Est.E[lnodds]  b0  b1·distance  b2·10  b3·last  b4·injury  b5·novice  b6·ratio  b7·noviceratio Thus lnodds for these two races is simply lnodds  2·b2  0.1093014. Odds change by 10.93% when two horses are added. Notice that the two equations above are for the same horse thus all the characteristics are identical so “they cancel out when taking the difference between the two equations above. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 12

Est.E[lnodds]  b6·ratio  b7·ratio  (b6  b7)·ratio Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio Steve claims that for novice horses the ratio of past wins is irrelevant for the horse’s odds, all other factors in the regression held fixed. Can you prove him wrong using a 10% level of significance? iii. We are considering novice horses (for which novice  1) and we are interested whether the change in ratio has any effect on lnodds. For novice  1: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·1  b6·ratio  b7·ratio thus Est.E[lnodds]  b6·ratio  b7·ratio  (b6  b7)·ratio Steve claims basically requires a test of the following hypothesis H0: 6  7  0 Ha: 6  7  0 set hypotheses The command klincom _b[ratio]  _b[noviceratio] provides the following: If Ha: < then Pr(T < t) = .053 If Ha: not = then Pr(|T| > |t|) = .106 If Ha: > then Pr(T > t) = .947 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 13

sprinterdistance  sprinter·distance Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injured  b5·novice  b6·ratio  b7·noviceratio Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. What would you add to the regression in Part I to allow you to evaluate this claim?. iv. We are clearly looking at interaction between being a sprinter and the length of the race thus a slope dummy capturing this interaction is required: sprinterdistance  sprinter·distance The regression becomes (we need to include also the dummy sprinter): Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury   b5·novice  b6·ratio  b7·rationovice   b8·sprinter  b9·sprinterdistance © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 14

v. The estimated regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·sprinter  b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. Carry out the modification you suggest in part v. and write down the new estimated regression equation. v. The estimated regression is Figure 3. Regression results . generate sprinterdistance = sprinter*distance . regress lnodds distance starters last injured novice ratio noviceratio sprinter sprinterdistance lnOdds | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+---------------------------------------------------------------- distance | -.0008171 .0002318 -3.52 0.001 -.0012774 -.0003567 starters | .0531778 .0178135 2.99 0.004 .0177987 .0885568 last | .0428266 .0128888 3.32 0.001 .0172284 .0684248 injured | 1.820267 .1748925 10.41 0.000 1.472915 2.167618 novice | .1995128 .31917 0.63 0.533 -.4343864 .8334119 ratio | -3.639829 .6952844 -5.24 0.000 -5.020724 -2.258935 noviceratio | 1.411642 1.290283 1.09 0.277 -1.150971 3.974254 sprinter | -2.806884 .5395038 -5.20 0.000 -3.878385 -1.735383 sprinterdistance | .0019458 .0003504 5.55 0.000 .0012499 .0026416 _cons | 3.119532 .4811813 6.48 0.000 2.163864 4.075199 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 15

Est.E[lnodds]|sprinter  0 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·sprinter  b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. In terms of your new regression model, what must be true in order for Steve’s claim to be correct? Test the claim. vi. We need to evaluate the change in lnodds for sprinters as we change distance: sprinters: Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·1  b9·distance The change in lnodds for a change in distance is thus: sprinters: Est.E[lnodds]|sprinter  b1·distance  b9·distance  (b1  b9)·distance Steve’s claim is that Est.E[lnodds]|sprinter  0 Using the expression above the claim is really about b1  b9  0 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 16

horse racing ► The regression is vii. The test is about Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing ► The regression is Est.E[lnodds]  b0  b1·distance  b2·starters  b3·last  b4·injury  b5·novice  b6·ratio  b7·rationovice  b8·sprinter  b9·sprinterdistance Steve claims that, all else in the regression held fixed, horses that are classified as sprinters have their probability of winning reduced, i.e. have their odds increase, as a race gets longer. In terms of your new regression model, what must be true in order for Steve’s claim to be correct? Test the claim. vii. The test is about hypothesis H0: 1  9  0 Ha: 1  9  0 set hypotheses The command klincom _b[distance]  _b[sprinterdistance] provides the following: If Ha: < then Pr(T < t) = 1 If Ha: not = then Pr(|T| > |t|) = 0 If Ha: > then Pr(T > t) = 0 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 17

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ horse racing Claim: “Older horses would probably be more prone to injury and that older horses are also less likely to win, i.e. have higher odds.” Steve counters that older horses are in fact less likely to get injured (young horses are less disciplined and get minor injuries all the time), but agrees that older horses are less likely to win, all else equal. He reruns the regression with Age in it in addition to all the original variables. In this new regression, the estimated coefficient on injured is 1.305 viii. When Age is omitted from the regression, the coefficient of injured carries an omitted variable bias. The sign of ovb equals the product of (a) the sign of the relation between Age and lnOdds – this is b10 below (b) the sign of the relation between injured and Age – this is a2 below  Alison and Steve agree that Age and odds are positively related, thus b10  0.  Alison believes that injured and Age are also positively related, thus a1  0, therefore the ovb according to her must be positive which corresponds to overestimation, thus b4*  b4.  Steve says that injured and Age are negatively related, therefore he expects a negative ovb which corresponds to underestimation, thus b4*  b4. correlation channel  When Steve reruns the regression, with Age included among the regressors, the coefficient on injured goes down, i.e. b4*  b4. This finding is consistent with Alison’s opinion, but not with Steve’s. direct channel correlation causal indirect channel truncated © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 18

Est.E[ratio]  b6·days  0.0002077·( 250)   0.051925 Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio]  0  1·fairpr  2·bidders  3·rigged  4·length  5·fxcost  6·days Based on the regression in the table, give your best estimate and a 90 percent confidence interval of what will happen to the ratio of the actual price to the estimated cost if the number of days for a project decreases by 250, holding the other independent variables fixed. i. We are asked to evaluate the change in ratio for a change in days with everything else held constant: E[ratio]  6·days where days  250. Thus the estimated change is: Est.E[ratio]  b6·days  0.0002077·( 250)   0.051925 We can obtain the 90% confidence interval for this change using klincom (alternative use the standard deviation for the coefficient on days and the required tvalue): . klincom _b[days]*(-250), level(90) ratio | Coef. Std. Err. t P>|t| [90% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -.0519238 .0304968 -1.70 0.091 -.102458 -.0013895 ------------------------------------------------------------------------------ © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 19

E[ratio]  2·bidders Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio]  0  1·fairpr  2·bidders  3·rigged  4·length  5·fxcost  6·days Can you claim at the 5 percent significance level that an increase in the number of bidders, holding the other independent variables fixed, would on average decrease the project’s ratio of actual price to estimated cost? ii. We are asked to evaluate the change in ratio for a change in bidders with everything else held constant: E[ratio]  2·bidders The claim is that an increase in bidders result in a decrease in ratio, that is we should test: hypothesis H0: 2  0 Ha: 2  0 set hypotheses Running klincom _b[bidders] gives ( 1) bidders = 0 If Ha: < then Pr(T < t) = .031 If Ha: not = then Pr(|T| > |t|) = .063 If Ha: > then Pr(T > t) = .969 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 20

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The regression is: E[ratio]  0  1·fairpr  2·bidders  3·rigged  4·length  5·fxcost  6·days Would it be legitimate to drop the variables FairPr and FxCost from the regression if you wanted to do so? If the answer is yes, write down the new estimated regression equation. iii. We run the multicolinearity tests vif and testparm for fairpr and fxcost: . vif Variable | VIF 1/VIF -------------+---------------------- fairpr | 54.77 0.018259 fxcost | 42.73 0.023405 days | 4.77 0.209548 length | 1.42 0.705534 bidders | 1.41 0.711154 rigged | 1.31 0.761681 Mean VIF | 17.73 . testparm fairpr fxcost ( 1) fairpr = 0 ( 2) fxcost = 0 F( 2, 126) = 0.78 Prob > F = 0.4601 The vif indicates inflated standard errors for fairpr and fxcost. Since the pvalue for the Ftest is definitely higher than any reasonable significance level we can reject the joint hypothesis that the two variable are jointly significant. We will have to drop both variables and re-run the regression without these two variables as regressors. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 21

iii. The estimated regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► The new regression is: E[ratio]  0  1·bidders  2·rigged  3·length  4·days Would it be legitimate to drop the variables FairPr and FxCost from the regression if you wanted to do so? If the answer is yes, write down the new estimated regression equation. iii. The estimated regression is Figure 4. Regression results ratio | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+---------------------------------------------------------------- bidders | -.0080003 .0039566 -2.02 0.045 -.0158291 -.0001715 rigged | .184931 .0247339 7.48 0.000 .1359908 .2338712 length | -.0002443 .0021309 -0.11 0.909 -.0044606 .003972 days | .0000828 .0000589 1.40 0.163 -.0000338 .0001994 _cons | .9272769 .027766 33.40 0.000 .8723372 .9822166 rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of ratio chi2(1) = 0.63 Prob > chi2 = 0.4260 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 22

E[price]  0  1·fairpr  2·fxcost  3·bidders  4·rigged Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. The variables for which we have values are: fairpr  1,000, fxcost  700, bidders  4, and rigged  0. All of these variables are plausibly related to the winning bid (price). Therefore, we should initially run a regression of price against these four variables. We do not include any slope-dummies (as was requested). E[price]  0  1·fairpr  2·fxcost  3·bidders  4·rigged The estimated regression is Figure 5. Regression results price | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- fairpr | .8514863 .0771848 11.03 0.000 .6987629 1.00421 fxcost | .1059598 .1238826 0.86 0.394 -.139163 .3510827 bidders | -17.3139 9.677152 -1.79 0.076 -36.4618 1.833996 rigged | 88.10779 59.66611 1.48 0.142 -29.9518 206.1674 _cons | 93.96637 64.09852 1.47 0.145 -32.86351 220.7963 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 23

iv. We check for heteroskedasticitiy first: rvfplot Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. We check for heteroskedasticitiy first: rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 552.19 Prob > chi2 = 0.0000 reject the null The hettest and the results indicate that we reject the null of homoskedasticity. Thus the regression is “tainted” by heteroskedasticity. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 24

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. To deal with heteroskedasticity we try log-specifications. Below the first specification is a linear-log and the second is the log-linear. First we generate the log-variables with the exception of rigged which is a dummy variable. . regress price lnfairpr lnfxcost lnbidders rigged linear-log specification price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 652.6868 139.2706 4.69 0.000 377.1162 928.2574 lnfxcost | 181.3509 101.7388 1.78 0.077 -19.95669 382.6584 lnbidders | -91.85952 185.6159 -0.49 0.622 -459.1323 275.4132 rigged | -243.3638 209.8808 -1.16 0.248 -658.6488 171.9213 _cons | -3635.201 537.064 -6.77 0.000 -4697.874 -2572.529 . regress lnprice fairpr fxcost bidders rigged log-linear specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fairpr | .0017956 .000211 8.51 0.000 .0013782 .002213 fxcost | -.0018892 .0003386 -5.58 0.000 -.0025591 -.0012193 bidders | .0598305 .0264482 2.26 0.025 .0074982 .1121629 rigged | .5460886 .163071 3.35 0.001 .2234247 .8687525 _cons | 4.643232 .1751851 26.50 0.000 4.296599 4.989866 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 25

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. How do we choose between the two (if any at all)? We check for curvature first: linear-log specification: “U”-shaped log-linear specification: “∩”-shaped ► The “U”-shaped rvfplot indicates that the y -variable has to be “logged” thus the next step from a linear-log specification is the log-log specification. ► The “∩”-shaped rvfplot indicates that the x-variable has to be “logged” thus the next step from a log-linear specification is the log-log specification. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 26

E[lnratio]  0  1·lnfairpr  2·lnfxcost  3·lnbidders  4·rigged Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Write down the estimated regression equation and explain how you came to choose it. iv. The log-log regression and its estimation are given below. E[lnratio]  0  1·lnfairpr  2·lnfxcost  3·lnbidders  4·rigged Figure 6. Regression results lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------+---------------------------------------------------------------- lnfairpr | .9845441 .0189004 52.09 0.000 .9471465 1.021942 lnfxcost | .0174492 .0138069 1.26 0.209 -.0098702 .0447686 lnbidders | -.0480919 .0251899 -1.91 0.058 -.0979344 .0017505 rigged | .1790167 .0284829 6.29 0.000 .1226585 .2353749 _cons | -.0269711 .0728848 -0.37 0.712 -.1711861 .1172439 rvfplot . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of lnprice chi2(1) = 0.09 Prob > chi2 = 0.7582 cannot reject the null © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 27

We exponentiate the above results to find: Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► Develop a regression model to estimate and predict the winning bid (Price) on the final contract for the year, which has the following characteristics: The estimated cost is $1,000,000, of which $700,000 is due to fixed costs, and the four contractors interested in the project are expected not to rig the auction. Predict the winning bid and provide an interval that will contain the winning bid with 95 percent confidence. v. The x-variables are: lnfairpr  ln(1000)  6.907755, lnfxcost  ln(700)  6.55108, lnbidders  ln(4)  1.386294, rigged  0 We are asked to estimate and provide an interval for the level of the bidding thus we used these values in the kpredint command: . kpredint _b[_cons]+_b[lnfairpr]*6.9077+_b[lnfxcost]*6.5510+_b[lnbidders]*1.3862+_b[rigged]*0 Estimate: 6.8216599 Standard Error of Individual Prediction: .12481638 Individual Prediction Interval (95%): [6.5746894,7.0686304] We exponentiate the above results to find: estimate for fairpr: exp(6.8216599)  917,507 estimate for lower bound: exp(6.5746894)  716,723 estimate for upper bound: exp(7.0686304)  1,174,538 © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 28

There are four possible specifications: Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia ► ODOT has a road reconstruction project that is in the early planning phase. Just before putting the job up for auction, it learns that an additional pedestrian bridge will be necessary as part of the project. This change will not affect job duration or road length, but will increase fixed costs (FxCost) by 15 percent and overall estimated costs (FairPr) by 5 percent. Develop a regression model to estimate the percentage increase in the winning bid (the Price of the contract) that will ultimately result from the change in projected costs. What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression vi. We are told how fairpr, fxcost, length and days will change (by 5%, 15%, 0%, and 0%, respectively) and all are plausibly related to the winning price, therefore we must initially include at least these variables in our regression. Since we are in the pre-announcement (planning) phase, the number of bidders and whether the auction will be rigged are not under our control, and might react to the changes – we must not include these variables in the initial regression. There are four possible specifications: Model Dependent variable Independent variable standard linear y x log-linear ln(y) linear-log ln(x) log-log © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 29

The linear-linear specification fails the heteroskedasticity tests. Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress price fairpr fxcost length days linear-linear specification price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fairpr | .5759849 .1048968 5.49 0.000 .3684287 .7835411 fxcost | .3944423 .1492031 2.64 0.009 .0992185 .689666 length | 13.26696 5.850897 2.27 0.025 1.689957 24.84396 days | .9197984 .2924231 3.15 0.002 .3411893 1.498407 _cons | -59.00671 43.61251 -1.35 0.178 -145.3015 27.28809 rvfplot : linear-linear specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 599.25 Prob > chi2 = 0.0000 reject the null The linear-linear specification fails the heteroskedasticity tests. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 30

The linear-log specification fails the heteroskedasticity tests. Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress price lnfairpr lnfxcost lnlength lndays linear-log specification price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 731.5671 216.1821 3.38 0.001 303.814 1159.32 lnfxcost | -70.77947 129.2247 -0.55 0.585 -326.4725 184.9136 lnlength | -170.3642 89.99115 -1.89 0.061 -348.4271 7.69867 lndays | 555.5358 272.7642 2.04 0.044 15.82517 1095.246 _cons | -5698.615 796.5472 -7.15 0.000 -7274.719 -4122.51 rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 87.32 Prob > chi2 = 0.0000 reject the null The linear-log specification fails the heteroskedasticity tests. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 31

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress lnprice fairpr fxcost length days log-linear specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fairpr | .000834 .0002781 3.00 0.003 .0002837 .0013843 fxcost | -.000801 .0003956 -2.02 0.045 -.0015838 -.0000183 length | .0495787 .0155129 3.20 0.002 .0188838 .0802735 days | .0033391 .0007753 4.31 0.000 .001805 .0048732 _cons | 4.712592 .1156327 40.75 0.000 4.483793 4.941391 rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 0.62 Prob > chi2 = 0.4314 cannot reject the null The log-linear specification passes the heteroskedasticity tests. However the ∩-shaped rvfplot suggests to log the x-variable. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 32

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression . regress lnprice lnfairpr lnfxcost lnlength lndays log-log specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 1.000758 .0380092 26.33 0.000 .9255507 1.075966 lnfxcost | .0225473 .0227203 0.99 0.323 -.0224087 .0675034 lnlength | -.0008182 .0158223 -0.05 0.959 -.0321252 .0304889 lndays | -.0674284 .0479575 -1.41 0.162 -.1623206 .0274638 _cons | .1588012 .1400493 1.13 0.259 -.1183103 .4359126 rvfplot : linear-log specification . hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of price chi2(1) = 0.10 Prob > chi2 = 0.7572 cannot reject the null The log-log specification passes the heteroskedasticity tests and the rvfplot suggests no curvature-related issues © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 33

Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia What regression would you use to estimate the increase in Price? Write down the estimated regression equation and explain how you arrived at that regression vi. The final test here is for multicollinearity of lnfairpr and lnfxcost. . vif Variable | VIF 1/VIF -------------+---------------------- lnfairpr | 14.56 0.068685 lnfxcost | 10.32 0.096924 lndays | 7.39 0.135314 lnlength | 1.98 0.506191 Mean VIF | 8.56 . testparm lnfairpr lnfxcost ( 1) lnfairpr = 0 ( 2) lnfxcost = 0 F( 2, 128) = 695.42 Prob > F = 0.0000 reject the null The vif shows inflated standard errors for lnfair and lnfxcost but the testparm F-test indicates that the two variables are jointly significant so keep them in the regression. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 34

vii. The estimated regression is Managerial Economics & Decision Sciences Department session ten applications Developed for business analytics II cigarettes ◄ car dealership ◄ horse racing ◄ orangia ◄ orangia Using this regression, what is your estimate for the percentage increase in the Price of this contract? vii. The estimated regression is . regress lnprice lnfairpr lnfxcost lnlength lndays log-log specification lnprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnfairpr | 1.000758 .0380092 26.33 0.000 .9255507 1.075966 lnfxcost | .0225473 .0227203 0.99 0.323 -.0224087 .0675034 lnlength | -.0008182 .0158223 -0.05 0.959 -.0321252 .0304889 lndays | -.0674284 .0479575 -1.41 0.162 -.1623206 .0274638 _cons | .1588012 .1400493 1.13 0.259 -.1183103 .4359126 We are given the percentage based change in each of the four variables and since we are using a log-log specification we can directly plug the percentage changes for x-variables into the regression to find the corresponding percentage-based change in the y-variable. Thus multiplying the percentage changes by their respective coefficients we get the estimated percentage change in price as 1.0008·5  0.02255·15  0.0008·0  0.0674·0  5.34 Thus price increases by 5.34%. © 2016 kellogg school of management | managerial economics and decision sciences department | business analytics II session ten | page 35