Example 12.3 Explaining Spending Amounts at HyTex Include/Exclude Decisions
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 Objective To see which potential explanatory variables are useful for explaining current year spending amounts at HyTex with multiple regression.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 CATALOGS.XLS n This file contains data on 100 customers who purchased mail-order products from the HyTex Company in the current year. n Recall from Example 3.11 that HyTex is a direct marketer of stereo equipment, personal computers, and other electronic products. n HyTex advertises entirely by mailing catalogs to its customers, and all of its orders are taken over the telephone. n We want to estimate and interpret a regression equation for Amount Spent based on all of these variables.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 The Data n The company spends a great deal of money on its catalog mailings, and it wants to be sure that this is paying off in sales. n For each customer there are data on the following variables: –Age: age of the customer at the end of the current year –Gender: coded as 1 for males, 0 for females –OwnHome: coded as 1 if customer owns a home, 0 otherwise –Married: coded as 1 if customer is currently married, 0 otherwise
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 The Data -- continued –Close: coded as 1 if customers lives reasonably close to a shopping area that sells similar merchandise, 2 otherwise –Salary: combined annual salary of customer and spouse (if any) –Children: number of children living with customer –PrevCust: coded as a 1 if customer purchased from HyTex during the previous year, 0 otherwise –PrevSpent: total amount of purchase made from HyTex during the previous year –Catalogs: Number of catalogs sent to the customer this year –AmountSpent: total amount of purchase made from HyTex during this year
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 The Data -- continued n With this much data, 1000 observations, we can certainly afford to set aside part of the data set for validation. n Although any split could be used, let’s base the regression on the first 250 observations and use the other 750 for validation.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 The Regression n We begin by entering all of the potential explanatory variables. n Our goal then is exclude variables that aren’t necessary, based on their t-values and p-values. To do this we follow the Guidelines for Including / Excluding Variables in a Regression Equation. n The regression output with all explanatory variables included is provided on the following slide.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 Analysis n This output indicates a fairly good fit. The R 2 value is 79.1% and s e is about $424. n From the p-value column, we see that there are three variables, Age, Own_Home, and Married, that have p-values well above n These are the obvious candidates for exclusion. It is often best to exclude one variable at a time starting with the variable with the highest p-value. n The regression output with all insignificant variables excluded is seen in the output on the next slide.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 Interpretation of Final Regression Equation n The coefficient of Gender implies that an average male customer spent about $130 less than the average female customer. Similarly, an average customer living close to stores with this type of merchandise spent about $288 less than those customers living far form stores. n The coefficient of Salary implies that, on average, about 1.5 cents of every salary dollar was spent on HyTex merchandise.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 Interpretation of Final Regression Equation -- continued n The coefficient of Children implies that $158 less was spent for every extra child living at home. n The PrevCust and PrevSpent terms are somewhat more difficult to interpret. –First, both of these terms are 0 for customers who didn’t purchase from HyTex in the previous year. –For those that did the terms become PrevSpent –The coefficient 0.47 implies that each extra dollar spent in the previous year can be expected to contribute an extra 47 cents in the current year.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 Interpretation of Final Regression Equation -- continued –The median spender last year spent about $900. So if we substitute this for PrevSpent we obtain –Therefore, this “median” spender from last year can be expected to spend about $301 less this year than the previous year nonspender. n The coefficient of Catalog implies that each extra catalog can be expected to generate about $43 in extra spending.
| 12.2 | 12.3a | 12.1a | 12.4 | 12.4a | a12.1a a12.5 Cautionary Notes n When we validate this final regression equation with the 750 customers, using the procedure from Section 11.7, we find R 2 and s e values of 71.8% and $522. n These aren’t bad. They show little deterioration from the values based on the original 250 customers. n We haven’t tried all possibilities yet. We haven’t tried nonlinear or interaction variables, nor have we looked at different coding schemes; we haven’t checked for nonconstant error variance or looked at potential effects of outliers.