Download presentation
Presentation is loading. Please wait.
Published byDwain Terry Modified over 9 years ago
1
General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes? – Assume that Make takes four categorical values (Ford, Honda, BMW, and Sterling). Choose one value as the “foundation” case. Create three 0/1 (“yes”/”no”, so-called “dummy”) variables for the other three cases. These three variables jointly represent the four-valued qualitative Make variable. Here are the details. Here We’ll use this representational trick in order to include “day of game” (either Friday, Saturday, or Sunday) in a model which predicts attendance at a professional indoor soccer team’s home games. Here is the example.Here – Using this trick requires that we extend the “significance level” (with respect to whether a variable “belongs” in the model) to groups of variables. This is done via “analysis of variance” (ANOVA).
2
Discounts on Car Purchases: Does Salesperson Identity Matter? Assume there are five salesfolks: Andy, Bob, Chuck, Dave and Ed Take one (e.g., Andy) as the foundation case, and add four new “dummy” variables D B = 1 only if Bob, 0 otherwise D C = 1 only if Chuck, 0 otherwise D D = 1 only if Dave, 0 otherwise D E = 1 only if Ed, 0 otherwise The coefficient of each (in the most-complete model) will differentiate the average discount that each salesperson gives a customer from the average discount Andy would give the same customer
3
Does Salesperson Identity Matter? Imagine that, after adding the new variables (four new columns of data) to your model, the regression yields: Discount pred = 980 + 9.5 Age – 0.035 Income + 446 Sex + 240 D B + (–300) D C + (–50) D D + 370 D E With similar customers, you’d expect Bob to give a discount $240 higher than would Andy With similar customers, you’d expect Chuck to give a discount $300 lower than would Andy, $540 lower than would Bob, and also lower than would Dave (by $250) and Ed (by $670)
4
Does “Salesperson” Interact with “Sex”? Are some of the salesfolk better at selling to a particular Sex of customer? – Add D B, D C, D D, D E, and D B Sex, D C Sex, D D Sex, D E Sex to the model – Imagine that your regression yields: Discount pred = 980 + 9.5 Age - 0.035 Income + 446 Sex + 240 D B – 350 D C + 75 D D + 10 D E – 375 (D B Sex) – 150 (D C Sex) – 50 (D D Sex) + 450 (D E Sex) – Interpret this back in the “conceptual” model: Discount pred = 980 + 9.5 Age – 0.035 Income + 446 Sex + (240 – 375 Sex) D B + (–350 – 150 Sex) D C + (75 – 50 Sex) D D + (10 + 450 Sex) D E
5
Discount pred = 980 + 9.5 Age – 0.035 Income + 446 Sex + (240 – 375 Sex) D B + (–350 – 150 Sex) D C + (75 – 50 Sex) D D + (10 + 450 Sex) D E – Given a male (Sex=0) customer, you’d expect Bob (D B =1) to give a greater discount (by $240-$375 0 = $240) than Andy – Given a female (Sex=1) customer, you’d expect Bob to give a smaller discount (by $240-$375 1 = -$135) than Andy – Chuck has been giving smaller discounts to both men and women than has Andy, and Dave and Ed have been giving larger discounts than Andy to both sexes – And we could take the same approach to investigate whether “Salesperson” interacts with Age, including also D B Age, D C Age, D D Age, D E Age in our model
6
Outliers An outlier is a sample observation which fails to “fit” with the rest of the sample data. Such observations may distort the results of an entire study. – Types of outliers (three) – Identification of outliers (via “model analysis”) – Dealing with outliers (perhaps yielding a better model) These issues are dealt with here.here
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.