Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Similar presentations


Presentation on theme: "Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables."— Presentation transcript:

1 Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables

2 Copyright © 2014, 2011 Pearson Education, Inc. 2 25.1 Two-Sample Comparisons Does Wal-Mart discriminate against female employees? Are they paid less than men?  Use multiple regression with a categorical explanatory variable representing gender to analyze pay data.  Regression analysis can adjust the comparison between men and women to account for other variables that may affect pay.

3 Copyright © 2014, 2011 Pearson Education, Inc. 3 25.1 Two-Sample Comparison Example: Mid-Level Managers’ Salaries The average salary for women is $140,000 and the average salary for men is $144,700.

4 Copyright © 2014, 2011 Pearson Education, Inc. 4 25.1 Two-Sample Comparison Example: Mid-Level Managers’ Salaries The 95% confidence for the difference in mean salaries is $740 to $8,590 (since 0 is not in this interval, the difference is significant). Assume conditions for inference are satisfied.

5 Copyright © 2014, 2011 Pearson Education, Inc. 5 25.1 Two-Sample Comparison Confounding Variables  Without a randomized experiment, we must be careful about lurking variables that would account for the significant difference between average salaries (e.g., experience).  Experience is a confounding variable if it is correlated with salary and the two groups (men and women) differ with regard to experience.

6 Copyright © 2014, 2011 Pearson Education, Inc. 6 25.1 Two-Sample Comparison Subsets and Confounding Restrict analysis to a subset of cases with matching levels of the confounding variable (e.g., compare men and women with 5 years of experience).

7 Copyright © 2014, 2011 Pearson Education, Inc. 7 25.1 Two-Sample Comparison Subsets and Confounding  The 95% confidence interval for the difference in average salaries between men and women within the subset of managers with 5 years experience includes 0 (the difference is not significant).  However, the standard error of the difference is much larger; the cases in the subset do not produce a precise estimate.

8 Copyright © 2014, 2011 Pearson Education, Inc. 8 25.2 Analysis of Covariance Regression on Subsets  What about the difference between average salaries for managers with 2, 10 or 15 years experience?  Analysis of covariance: regression that combines categorical and numerical explanatory variables; adjusts the comparison of means for the effects of confounding variables.

9 Copyright © 2014, 2011 Pearson Education, Inc. 9 25.2 Analysis of Covariance Regression on Subsets

10 Copyright © 2014, 2011 Pearson Education, Inc. 10 25.2 Analysis of Covariance Regression on Subsets Simple regressions fit separately to men and women show that estimated salary rises faster with experience for women compared to men.

11 Copyright © 2014, 2011 Pearson Education, Inc. 11 25.2 Analysis of Covariance Combining Regressions  Combining the separate regressions for men and women requires a dummy variable identifying whether a manager is male or female (Group = 1 for men; Group = 0 for women).  Also requires the interaction term Group Years. An interaction term is the product of two explanatory variables in a regression model.

12 Copyright © 2014, 2011 Pearson Education, Inc. 12 25.2 Analysis of Covariance Combining Regressions

13 Copyright © 2014, 2011 Pearson Education, Inc. 13 25.2 Analysis of Covariance Combining Regressions

14 Copyright © 2014, 2011 Pearson Education, Inc. 14 25.2 Analysis of Covariance Interpreting Coefficients  The equation for the group coded as 0 in the dummy variable forms a baseline for comparison.  The slope of the dummy variable is the difference between estimated intercepts in the simple regressions. The slope of the interaction is the difference between estimated slopes in the simple regressions.

15 Copyright © 2014, 2011 Pearson Education, Inc. 15 25.3 Checking Conditions  The scatterplot reveals a linear (weak) association between Salary and Years.  Some caution is necessary regarding lurking variables (e.g., educational background or business aptitude).

16 Copyright © 2014, 2011 Pearson Education, Inc. 16 25.3 Checking Conditions Checking for Similar Variances  Plot the residuals on the fitted values.  Compare side-by-side boxplots of the residuals for each group. The similar variance condition is violated if the IQR in one boxplot is more than twice the length of the other.

17 Copyright © 2014, 2011 Pearson Education, Inc. 17 25.3 Checking Conditions Checking for Similar Variances

18 Copyright © 2014, 2011 Pearson Education, Inc. 18 25.3 Checking Conditions Checking for Similar Variances

19 Copyright © 2014, 2011 Pearson Education, Inc. 19 25.3 Checking Conditions  The similar variance condition is satisfied.  Examining the normal quantile plot confirms that the residuals are nearly normal.

20 Copyright © 2014, 2011 Pearson Education, Inc. 20 25.4 Interactions and Inference  Principle of marginality: if the interaction is statistically significant, retain it as well as both of its components regardless of their level of significance.  If the interaction is not statistically significant, remove it from the regression and re-estimate the equation. A model without an interaction term is simpler to interpret since the lines fit to the groups are parallel.

21 Copyright © 2014, 2011 Pearson Education, Inc. 21 25.4 Interactions and Inference Interactions and Collinearity An interaction in a multiple regression introduces collinearity (see large VIF for Group Years).

22 Copyright © 2014, 2011 Pearson Education, Inc. 22 25.4 Interactions and Inference Interactions and Collinearity Since the interaction in this example is not significant, remove it and re-estimate the MRM.

23 Copyright © 2014, 2011 Pearson Education, Inc. 23 25.4 Interactions and Inference Parallel Fits  The slope for Group estimates the difference between the intercepts for male and female managers.  The coefficient of the dummy variable (1.024) means that the line for men is shifted up from the line for women by $1,024 for all levels of experience.

24 Copyright © 2014, 2011 Pearson Education, Inc. 24 25.4 Interactions and Inference Parallel Fits

25 Copyright © 2014, 2011 Pearson Education, Inc. 25 25.4 Interactions and Inference Parallel Fits  The t-statistic and associated p-value (0.6193) for the slope of Group indicates that it is not statistically significant.  This model finds no statistically significant difference between the average salaries of male and female managers when comparing managers with equal years of experience.

26 Copyright © 2014, 2011 Pearson Education, Inc. 26 4M Example 25.1: PRIMING IN ADVERTISING Motivation FedEx introduced the Courier Pak using two waves of promotion: an ad to raise awareness (i.e., priming) and a visit to existing clients by a sales rep. Management has two questions: (1) How many shipments were generated by a typical one hour contact by the sales rep? and (2) Was the promotion more effective for clients who were already aware of the Courier Pak?

27 Copyright © 2014, 2011 Pearson Education, Inc. 27 4M Example 25.1: PRIMING IN ADVERTISING Method Based on data from 125 customers, fit a multiple regression with a categorical variable. The response is number of shipments using Courier Pak. The explanatory variables are the amount of time spent with the client by a sales rep and a dummy variable indicating whether or not the client was aware of the Courier Pak. The interaction between the explanatory variables is included.

28 Copyright © 2014, 2011 Pearson Education, Inc. 28 4M Example 25.1: PRIMING IN ADVERTISING Method Scatterplot with lines fit separately for each group (clients aware of Courier Pak shown in green).

29 Copyright © 2014, 2011 Pearson Education, Inc. 29 4M Example 25.1: PRIMING IN ADVERTISING Method The association within each group appears linear. The scatterplot suggests an interaction because the slopes appear different. The interaction indicates whether prior awareness of Courier Paks affects how the sales rep visit influenced the client.

30 Copyright © 2014, 2011 Pearson Education, Inc. 30 4M Example 25.1: PRIMING IN ADVERTISING Mechanics – Estimate Model

31 Copyright © 2014, 2011 Pearson Education, Inc. 31 4M Example 25.1: PRIMING IN ADVERTISING Mechanics – Check Conditions Nothing in the plots suggest dependence. Similar variance condition is satisfied.

32 Copyright © 2014, 2011 Pearson Education, Inc. 32 4M Example 25.1: PRIMING IN ADVERTISING Mechanics – Check Conditions Similar variances confirmed.

33 Copyright © 2014, 2011 Pearson Education, Inc. 33 4M Example 25.1: PRIMING IN ADVERTISING Mechanics – Check Conditions Nearly normal condition is satisfied.

34 Copyright © 2014, 2011 Pearson Education, Inc. 34 4M Example 25.1: PRIMING IN ADVERTISING Mechanics Based on the F-statistic we can conclude that the model explains statistically significant variation. The interaction between awareness and hours of contact is statistically significant. Following the principle of marginality, we retain Aware in the model. The interaction implies that the gap between the lines gets wider as the number of contact hours increases.

35 Copyright © 2014, 2011 Pearson Education, Inc. 35 4M Example 25.1: PRIMING IN ADVERTISING Message Priming produces a statistically significant increase in the subsequent use of Courier Paks when followed by a visit from a sales rep. Each additional hour of contact with a sales rep produces about 4.3 more uses of the Courier Paks with priming than without priming.

36 Copyright © 2014, 2011 Pearson Education, Inc. 36 25.5 Regression with Several Groups Example: Estimating Store Sales  Explanatory variables are median household income in surrounding community, size of the local population, and market (urban, suburban, rural).  The response is sales in dollars per square foot.

37 Copyright © 2014, 2011 Pearson Education, Inc. 37 25.5 Regression with Several Groups Scatterplot Matrix Rural – red Suburban – green Urban – blue Association within each group appears linear.

38 Copyright © 2014, 2011 Pearson Education, Inc. 38 25.5 Regression with Several Groups Example: Estimating Store Sales  In general, to distinguish J groups requires J-1 dummy variables.  For this example use two dummy variables: Suburban Dummy = 1 suburban, 0 otherwise Urban Dummy = 1 urban, 0 otherwise Note that rural locations would be coded 0,0.

39 Copyright © 2014, 2011 Pearson Education, Inc. 39 25.5 Regression with Several Groups Example: Estimating Store Sales

40 Copyright © 2014, 2011 Pearson Education, Inc. 40 25.5 Regression with Several Groups Example: Estimating Store Sales  The interpretation of the estimates is similar to the interpretation of models with two groups.  Coefficients associated with dummy variables reflect differences of stores in other locations compared to rural stores.

41 Copyright © 2014, 2011 Pearson Education, Inc. 41 25.5 Regression with Several Groups Estimating Sales for Rural Stores The estimated equation for baseline comparison (stores located in a rural location) is Estimated Sales ($/SqFt) = -388.6992 + 0.0097 Income + 0.2401 Population

42 Copyright © 2014, 2011 Pearson Education, Inc. 42 25.5 Regression with Several Groups Estimating Sales for Urban Stores Consider stores in an urban location. The estimated sales is given by Estimated Sales ($/SqFt) = (-388.6992 + 468.8654) + (0.0097 - 0.0053) Income + 0.2401 Population Estimated Sales ($/SqFt) = 80.1662 + 0.0044 Income + 0.2401 Population

43 Copyright © 2014, 2011 Pearson Education, Inc. 43 25.5 Regression with Several Groups Interpretation of Results  Sales at a given income are higher in urban compared to rural stores, but do not grow as fast with increases in income.  Population has the same effect in every location because the model does not include an interaction term between Population and dummy variables for location.

44 Copyright © 2014, 2011 Pearson Education, Inc. 44 Best Practices  Be thorough in your search for confounding variables.  Consider interactions.  Choose an appropriate baseline group.  Write out the fits for separate groups.

45 Copyright © 2014, 2011 Pearson Education, Inc. 45 Best Practices (Continued)  Be careful interpreting the coefficient of the dummy variable.  Check for comparable variances in the groups.  Use color-coding or different plot symbols to identify subsets of observations in plots.

46 Copyright © 2014, 2011 Pearson Education, Inc. 46 Pitfalls  Don’t use too many dummy variables.  Don’t confuse interaction with correlation.  Don’t think that you have adjusted for all of the confounding factors.  Don’t confuse the different types of slopes.  Don’t forget to check the conditions of the MRM.


Download ppt "Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables."

Similar presentations


Ads by Google