CHAPTER 17 Model Building to accompany Introduction to Business Statistics fourth edition, by Ronald M. Weiers Presentation by Priscilla Chaffe-Stengel Donald N. Stengel © 2002 The Wadsworth Group
Chapter 17 - Learning Objectives Build polynomial regression models to describe curvilinear relationships Apply qualitative variables representing two or three categories. Use logarithmic transforms in constructing exponential and multiplicative models. Identify and compensate for multicollinearity Apply stepwise regression Select the most suitable among competing models © 2002 The Wadsworth Group
Polynomial Models with One Quantitative Predictor Variable Simple linear regression equation: Equation for second-order polynomial model: Equation for third-order polynomial model: Equation for general polynomial model: © 2002 The Wadsworth Group
Polynomial Models with Two Quantitative Predictor Variables First-order model with no interaction: First-order model with interaction: Second-order model with no interaction: Second-order model with interaction: © 2002 The Wadsworth Group
Models with Qualitative Variables Equation for a model with a categorical independent variable with two possible states: where state 1 is shown x = 1 where state 2 is shown x = 0 Equation for a model with a categorical independent variable with three possible states: where state 1 is shown x1 = 1, x2 = 0 where state 2 is shown x1 = 0, x2 = 1 Where state 3 is shown x1 = 0, x2 = 0 © 2002 The Wadsworth Group
Models with Data Transformations Exponential Model: General equation for an exponential model: Corresponding linear regression equation for an exponential model: Multiplicative Model: General equation for a multiplicative model: Corresponding linear regression equation for a multiplicative model: © 2002 The Wadsworth Group
Example, Problem 17.8 International Data Corporation has reported the following costs per gigabyte of hard drive storage space for years 1995 through 2000. Using x = 1 through 6 to represent years 1995 through 2000, fit a second-order polynomial model to the data and estimate the cost per gigabyte for the year 2008. The regression equation will have the form: Year x = Yr y = Cost 1995 1 $261.84 1996 2 137.94 1997 3 69.68 1998 4 29.30 1999 5 13.09 2000 6 6.46 © 2002 The Wadsworth Group
Example, Problem 17.8, cont. Microsoft Excel Output SUMMARY OUTPUT Regression Statistics Multiple R 0.99655892 R Square 0.99312968 Adj R Square 0.98854948 Standard Error 10.5650522 Observations 6 © 2002 The Wadsworth Group
Example, Problem 17.8, cont. Microsoft Excel Output The regression equation is: Coefficients Standard Error t Stat P-value Intercept 387.993 18.8993399 20.529447 0.0002527 x -147.65675 12.3644646 -11.94203 0.0012629 x^2 14.1883929 1.72911255 8.2055924 0.0037879 © 2002 The Wadsworth Group
Example, Problem 17.8, cont. To estimate the cost per gigabyte for the year 2008, evaluate when x = 14. So the cost per gigabyte in 2008 is estimated to be $1101.99. Does this make sense? Of course not. Explanation: Although the polynomial equation provides a good fit for the data during the period 1995-2000, this form is not appropriate to extrapolate the data out to 2008. © 2002 The Wadsworth Group
Example, Problem 17.32 An exponential model will probably be more appropriate to the data used in Problem 17.8. y Log y x $261.84 2.418036 1 137.94 2.13969 2 69.68 1.843108 3 29.30 1.466868 4 13.09 1.11694 5 6.46 0.810233 6 © 2002 The Wadsworth Group
Example, Problem 17.32, cont. Microsoft Excel Output SUMMARY OUTPUT Regression Statistics Multiple R 0.998899423 R Square 0.997800057 Adj R Square 0.997250071 Standard Error 0.03222401 Observations 6 © 2002 The Wadsworth Group
Example, Problem 17.32, cont. Microsoft Excel Output The regression equation is: Coefficients Standard Error t Stat P-value Intercept 2.780829985 0.02999892 92.69767 8.12E-08 x -0.32810028 0.00770301 -42.5938 1.82E-06 © 2002 The Wadsworth Group
Example, Problem 17.32, cont. For x = 14, Based on the exponential model, the cost per gigabyte in 2008 will be $0.0154, or just under 2 cents. © 2002 The Wadsworth Group
Example, Problem 17.27 An efficiency expert has studied 12 employees who perform similar assembly tasks, recording productivity (units per hour), number of years of experience, and which one of three popular assembly methods the individual has chosen to use in performing the task. Given the data, shown on the next slide, determine the linear regression equation for estimating productivity based on the other variables. For any qualitative variables that are used, be sure to specify the coding strategy each will employ. © 2002 The Wadsworth Group
Example, Problem 17.27, cont. 1 75 7 A 97 12 B 2 88 10 C 8 85 3 91 4 9 Worker Prod. Yrs.Exp Method 1 75 7 A 97 12 B 2 88 10 C 8 85 3 91 4 9 102 93 5 13 95 11 112 6 77 86 14 © 2002 The Wadsworth Group
Example, Problem 17.27, cont. The equation for a model with one quantitative variable and a categorical independent variable with three possible states is: where x1 represents the years of experience where state 1 is shown x2 = 1 if method A is used, 0 if otherwise where state 2 is shown x3 = 1 if method B is used, 0 if otherwise where state 3 is shown x2 = 0 and x3 = 0 if method C is used. © 2002 The Wadsworth Group
Example, Problem 17.27, cont. So the data to be analyzed are: Worker y 75 7 2 88 10 3 91 4 93 5 95 11 6 77 © 2002 The Wadsworth Group
Example, Problem 17.27, cont. Worker y x1 x2 x3 7 97 12 1 8 85 10 9 1 8 85 10 9 102 93 13 11 112 86 14 © 2002 The Wadsworth Group
Example, Problem 17.27, cont. Microsoft Excel Output SUMMARY OUTPUT Regression Statistics Multiple R 0.86075031 R Square 0.74089109 Adj R Square 0.64372525 Standard Error 6.0861957 Observations 12 © 2002 The Wadsworth Group
Example, Problem 17.27, cont. Microsoft Excel Output The regression equation is: Coefficients Standard Error t Stat P-value Intercept 75.368984 6.30729302 11.949498 2.214E-06 x1 1.59358289 0.51391877 3.1008459 0.014647 x2 -7.3596257 4.37208671 -1.683321 0.1308108 x3 9.73395722 4.49127957 2.1673016 0.062079 © 2002 The Wadsworth Group
Example, Problem 17.27, cont. The regression equation has an adjusted R-square of 0.644. This indicates that the regression model provides a reasonable explanation for the variation in the data set. Only the coefficient for x1 is significant at the 0.05 level. One might consider removing the assembly method from the model. © 2002 The Wadsworth Group