Presentation is loading. Please wait.

Presentation is loading. Please wait.

General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes?

Similar presentations


Presentation on theme: "General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes?"— Presentation transcript:

1 General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes? – Assume that Make takes four categorical values (Ford, Honda, BMW, and Sterling). Choose one value as the “foundation” case. Create three 0/1 (“yes”/”no”, so-called “dummy”) variables for the other three cases. These three variables jointly represent the four-valued qualitative Make variable. Here are the details. Here We’ll use this representational trick in order to include “day of game” (either Friday, Saturday, or Sunday) in a model which predicts attendance at a professional indoor soccer team’s home games. Here is the example.Here – Using this trick requires that we extend the “significance level” (with respect to whether a variable “belongs” in the model) to groups of variables. This is done via “analysis of variance” (ANOVA).

2 Discounts on Car Purchases: Does Salesperson Identity Matter? Assume there are five salesfolks: Andy, Bob, Chuck, Dave and Ed Take one (e.g., Andy) as the foundation case, and add four new “dummy” variables D B = 1 only if Bob, 0 otherwise D C = 1 only if Chuck, 0 otherwise D D = 1 only if Dave, 0 otherwise D E = 1 only if Ed, 0 otherwise The coefficient of each (in the most-complete model) will differentiate the average discount that each salesperson gives a customer from the average discount Andy would give the same customer

3 Does Salesperson Identity Matter? Imagine that, after adding the new variables (four new columns of data) to your model, the regression yields: Discount pred = 980 + 9.5  Age – 0.035  Income + 446  Sex + 240  D B + (–300)  D C + (–50)  D D + 370  D E With similar customers, you’d expect Bob to give a discount $240 higher than would Andy With similar customers, you’d expect Chuck to give a discount $300 lower than would Andy, $540 lower than would Bob, and also lower than would Dave (by $250) and Ed (by $670)

4 Does “Salesperson” Interact with “Sex”? Are some of the salesfolk better at selling to a particular Sex of customer? – Add D B, D C, D D, D E, and D B  Sex, D C  Sex, D D  Sex, D E  Sex to the model – Imagine that your regression yields: Discount pred = 980 + 9.5  Age - 0.035  Income + 446  Sex + 240  D B – 350  D C + 75  D D + 10  D E – 375  (D B  Sex) – 150  (D C  Sex) – 50  (D D  Sex) + 450  (D E  Sex) – Interpret this back in the “conceptual” model: Discount pred = 980 + 9.5  Age – 0.035  Income + 446  Sex + (240 – 375  Sex)  D B + (–350 – 150  Sex)  D C + (75 – 50  Sex)  D D + (10 + 450  Sex)  D E

5 Discount pred = 980 + 9.5  Age – 0.035  Income + 446  Sex + (240 – 375  Sex)  D B + (–350 – 150  Sex)  D C + (75 – 50  Sex)  D D + (10 + 450  Sex)  D E – Given a male (Sex=0) customer, you’d expect Bob (D B =1) to give a greater discount (by $240-$375  0 = $240) than Andy – Given a female (Sex=1) customer, you’d expect Bob to give a smaller discount (by $240-$375  1 = -$135) than Andy – Chuck has been giving smaller discounts to both men and women than has Andy, and Dave and Ed have been giving larger discounts than Andy to both sexes – And we could take the same approach to investigate whether “Salesperson” interacts with Age, including also D B  Age, D C  Age, D D  Age, D E  Age in our model

6 Outliers An outlier is a sample observation which fails to “fit” with the rest of the sample data. Such observations may distort the results of an entire study. – Types of outliers (three) – Identification of outliers (via “model analysis”) – Dealing with outliers (perhaps yielding a better model) These issues are dealt with here.here


Download ppt "General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes?"

Similar presentations


Ads by Google