Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using alternative reference categories to test statistical significance of an interaction This podcast is the last in the series on testing statistical.

Similar presentations


Presentation on theme: "Using alternative reference categories to test statistical significance of an interaction This podcast is the last in the series on testing statistical."— Presentation transcript:

1 Using alternative reference categories to test statistical significance of an interaction
This podcast is the last in the series on testing statistical significance of interactions. It assumes familiarity with the concepts included in the podcasts on specifying models to test for interactions, calculating the overall shape of an interaction from regression coefficients Introduction to testing statistical significance of interactions Approaches to testing statistical significance of interactions Please also be familiar with the podcast on using simple slopes for compound coefficients because this lecture builds on concepts from that talk Also review the podcast on choosing a reference category, which will help you decide upon your preferred “base” specification – the version for which you will present a table of detailed coefficients, standard errors, and model GOF statistics. Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

2 Overview Reasons for estimating models with alternate reference categories Review: confidence intervals from simple slopes calculations for compound coefficients Contrasts that can be assessed for statistical significance With “base” specification With a different reference category Suggestions for presenting results of alternate specifications To set the scene for this material, I will start with an example of the substantive question behind an interaction model, to understand the set of contrasts we want to be able to address using inferential statistics. I will then review which contrasts can be formally tested using post-hoc calculations of simple slopes for compound coefficients Next, I will illustrate the additional contrasts that can be formally tested by altering the choice of a reference category within the interaction specification I will close with a brief discussion of how to present the results of these calculations in a succinct and clear manner. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

3 Substantive question behind the interaction model: “Does race modify the association between education and birth weight?” Two questions involved in the interaction: Are differences across racial/ethnic groups within each education level statistically significant? Within cluster, across bar colors Are differences across education levels within each racial/ethnic group statistically significant? Within bar color, across clusters To help understand why we need to consider using alternate specifications of our interaction model, let’s step back and think about the substantive question we’ve been tracing throughout the podcasts on multivariate models and interactions. We’ve been testing an interaction between two independent variables - race/ethnicity and mother’s education – on the dependent variable, birth weight There are two sets of questions related to statistical significance that come out of that overall topic. First [read bullets], gesture within and across clusters The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

4 Why change the reference category?
Standard errors calculated using the simple slopes technique allow formal inferential statistical testing of each of the other groups Only against the reference category Not against one another Such comparisons are important for characterizing how the two independent variables involved in an interaction relate to the dependent variable E.g., how race and educational attainment together relate to birth weight However, the standard errors calculated using the simple slopes technique allow formal inferential statistical testing of each of the other groups only against the reference category, not against one another. For instance, with > HS, non-Hispanic white as the reference category, one cannot assess statistical significance of differences in predicted birth weight across racial/ethnic groups within the < HS group, or of differences across education levels among non-Hispanic blacks. Such comparisons are highly salient when trying to characterize how race and educational attainment together relate to birth weight

5 Model Specification A with interactions: race and education
BW = f (race, education, race_education) Birth weight is a function of race, education, and the race-by-education interaction Also includes controls for other demographic factors To specify the model, need ALL of the main effects and interaction term variables related to race and mother’s education BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS) In this specification, the omitted (reference) category is non-Hispanic white infants born to mothers with > HS As a review, we specify the model with main effects terms for race and education (shown in yellow), and interaction terms (shown in green), the same color coding that I have used throughout the series of podcasts on interactions. For the specification we’ve been using in the related series of podcasts on interactions, we chose non-Hispanic white infants born to mothers with more than a high school education as our reference category. So, the omitted category for race/ethnicity is NHW and the omitted category for educational attainment is > HS. I will refer to this specification as “Specification A”. See the podcast on model misspecification errors for interactions for more on why we start with a model that includes all of these main effects and interaction terms.

6 Which contrasts can be tested with Specification A?
“Is the difference in birth weight for the group shown statistically significantly lower than for NHW & > HS?” Reference category = Non-Hispanic white & > HS As discussed in the podcast on simple slopes, we can conduct formal tests of statistical significance of each group ONLY against the reference category. In specification A, that means each of the other race/education combinations can only be compared against NHW, > HS, not against one another. Here I have drawn a red horizontal reference line to indicate the predicted mean birth weight for that group, which is the estimated constant term for the overall model (beta zero). The red brackets indicate the difference between birth weight for each of the other groups and that value. These are *some* but not all of the substantively important contrasts for this interaction pattern, so if we want to conduct formal statistical tests of other contrasts, we will need to use a different reference category.

7 Mother’s educational attainment
Which contrasts are possible with Non-Hispanic white, >HS as the reference category Predicted birth weight (grams) ,by mother’s education and race/ethnicity, United States, 1988–1994 NHANES III Race/ethnicity Mother’s educational attainment < HS = HS > HS Non-Hispanic white 3,044 3,112 Mexican American 3,027 3,031 3,003 Non-Hispanic black 2,820 2,883 2,937 The circled cell is the reference category (Non-Hispanic white, mother’s education > HS) Yellow-shaded cells can be compared to the reference category based on the standard errors of the associated main effects terms alone Green-shaded cells can be compared to the reference category using standard errors calculated using the simple slope for a compound coefficient To help understand which contrasts are possible with that choice of a reference category, here is a different illustration. In this grid, there is one row for each racial/ethnic group and one column for each education category. Circled in red is predicted birth weight for the reference category, which for Specification A is NHW, > HS The difference in BW between that group and the groups shaded in yellow can be tested using the standard errors on the pertinent main effects terms. Those are included in the standard output from a regression model. Those shaded in green must be tested using standard errors calculated using simple slopes for the compound coefficients computed from one main effect and the interaction term related to the group at hand. Differences between two yellow shaded groups, between two green shaded groups, or one yellow and one green shaded group CANNOT be formally tested using the standard errors generated by Specification A.

8 Reference category = Non-Hispanic white & > HS
Confidence intervals around estimates from Specification A – from simple slopes Reference category = Non-Hispanic white & > HS 95% confidence intervals shown in pink This chart shows the predicted difference in birth weight using the approach explained in the podcast on calculating the overall shape of an interaction based on the regression coefficients. The pink vertical lines are the 95% confidence intervals calculated using simples slopes for the compound coefficients based on Specification A. In red is the reference line at Y = 0, meaning no difference between the reference category (NHW, > HS) and the other groups. The groups for whom the pink range does not overlap the red line have statistically significantly different birth weight from NHW, > HS e.g., NHB at all education levels NHW with < HS or = HS Mex Amer > HS Mexican Americans not statistically significant diff for the two lower education levels However, cannot formally test other relevant contrasts such as differences in birth weight across education levels within NHB group. For that, we need to change the reference category for race.

9 Model Specification B Still testing
the same substantive question about birth weight as a function of race, education including interactions between race and education Change the reference category to NHB & < HS Had been: BW = f (NHB, <HS, =HS, NHB_<HS, NHB_=HS) Specification B: BW = f (NHW, =HS, >HS, NHW_=HS, NHW_>HS) Requires a different set of dummy variables Drop main effect and interaction terms related to NHB and to <HS from the original specification Add main effects and interaction terms related to NHW and >HS in their place Retain the same set of controls for other demographic factors To test those differences across education levels among NHB and across races w/in < HS, we revise our specification to make NHB the omitted category of race, and < HS the omitted category for educational attainment. We will call this Specification B This requires a different set of dummy variables in the model show and explain animation Note that other than changing the reference categories for race and education, Specification B is identical to Specification A in terms of the other independent variables in the model the fact that we are testing interactions between race and education

10 Which contrasts can be formally tested with Specification B?
Reference category = Non-Hispanic black & < HS “Is the difference in birth weight for the group shown statistically significantly different than for NHB & < HS?” With Specification B, we can test a different set of contrasts Again, I have drawn a red horizontal reference line to indicate the birth weight value for that group. The red brackets indicate the difference between birth weight for each of the other groups and that value, which are the contrasts that can be formally tested for statistical significance using the standard errors from simple slopes based on the results of Specification B.

11 Mother’s educational attainment
Which contrasts are possible with Non-Hispanic black, < HS as the reference category? Predicted birth weight (grams), by mother’s education and race/ethnicity, United States, 1988–1994 NHANES III Race/ethnicity Mother’s educational attainment < HS = HS > HS Non-Hispanic white 3,044 3,112 Mexican American 3,027 3,031 3,003 Non-Hispanic black 2,820 2,883 2,937 The circled cell is the reference category (Non-Hispanic black, mother’s education < HS) Yellow-shaded cells can be compared to the reference category based on the standard errors of the associated main effects terms alone Green-shaded cells can be compared to the reference category using standard errors calculated using the simple slope for a compound coefficient Here is a revised version of the grid we saw earlier, to help understand which contrasts are possible with the revised choice of a reference category. The layout is the same as before, with one row for each racial/ethnic group and one column for each education category. Circled in red is predicted birth weight for the reference category, which for Specification B is NHB, < HS The difference in BW between that group and the groups shaded in yellow can be tested using only the standard errors on the pertinent main effects terms. Those shaded in green must be tested using standard errors calculated using simple slopes for the compound coefficients computed from one main effect and one interaction term. As before, differences between two yellow shaded groups, two green shaded groups, or one yellow and one green shaded group CANNOT be formally tested using the standard errors generated by Specification A.

12 Specification C: BW = f (NHW, <HS, >HS, NHW_<HS, NHW_>HS)
Model Specification C Again, testing The same substantive question about birth weight as a function of race, education Including interactions between race and education Change the reference category to NHB & = HS Specification C: BW = f (NHW, <HS, >HS, NHW_<HS, NHW_>HS) Requires yet a different set of dummy variables than Specification B Replace main effect and interaction terms related to =HS from Specification B with main effects and interaction terms related to <HS in their place Retain the same set of controls for other demographic factors Quickly, here is a specification we might use if we wanted to test differences across racial/ethnic groups within the “= HS” group. We keep NHB as the ref cat for race. Again, this requires a change in the set of dummy variables for main effects and interactions, substituting the dummy for < HS in place of = HS, and a dummy for the interaction between NHW and < HS in place of NHW_= HS.

13 Which contrasts can be formally tested with Specification C?
“Is the difference in birth weight for the group shown statistically significantly different than for NHB & = HS?” Reference category = Non-Hispanic black & = HS Here is the modified graph to visually depict the set of contrasts from this specification

14 Mother’s educational attainment
Which contrasts are possible with Non-Hispanic black, = HS as the reference category? Predicted birth weight (grams), by mother’s education and race/ethnicity, United States, 1988–1994 NHANES III Race/ethnicity Mother’s educational attainment < HS = HS > HS Non-Hispanic white 3,044 3,112 Mexican American 3,027 3,031 3,003 Non-Hispanic black 2,820 2,883 2,937 The circled cell is the reference category (Non-Hispanic black, mother’s education < HS) Yellow-shaded cells can be compared to the reference category based on the standard errors of the associated main effects terms alone Green-shaded cells can be compared to the reference category using standard errors calculated using the simple slope for a compound coefficient Here is a revised version of the grid we saw earlier, to help understand which contrasts are possible with the revised choice of a reference category. Again, the reference category is circled in red, which for Specification C is NHB, = HS So this version allows us to formally test differences within the = HS group across racial/ethnic groups – a set of contrasts we could not do with either of the other two specifications.

15 Similarities across models with different reference categories
Alternate specifications with different reference categories Produce identical results in terms of The overall direction and magnitude of differences in the DV across groups Overall model fit F-statistic R2 statistic You might be wondering whether the choice of a reference category affects the shape or size of the pattern among the two IVs and the DV. It does NOT. These alternate specifications with different reference categories, when done correctly, result in identical results in terms of the overall direction and magnitude of differences across groups, and identical overall model fit (F statistic, R2 statistic). In other words, you can use any of them to obtain the coefficients you need to calculate the overall shape of the association between race/ethnicity, education and birth weight. See the podcast on calculating the overall shape of an interaction pattern from regression coefficients for a review. And you will arrive at the same conclusions about overall model GOF regardless of which version you use.

16 Differences across models with different reference categories
Alternate specifications with different reference categories differ in terms of Which contrasts can be formally evaluated for statistical significance based on the estimated standard errors for each i and the associated variance-covariance matrix The estimated constant, which depends on the choice of omitted (reference) category The constant term will equal the predicted value of the DV for that category, with all other IVs in the model = 0 This is true whether the model includes Only main effects dummies Both main effects and interaction terms The important thing that the alternative specifications allow you to do is to test statistical significance of different sets of contrasts across groups defined by the two independent variables involved in the interaction, based on the estimated standard errors for each ^i and the associated variance covariance matrix. The constant term will change across models, reflecting the predicted value of the dependent variable for cases in the reference category. However, when used in conjunction with the coefficients on the main effect and interaction terms, it will yield the same predicted values of the DV for all groups, regardless of choice of reference category.

17 Presenting results of statistical tests from alternative specifications
No one set of is and associated standard errors (from any one specification) will allow readers to conduct inferential tests for all possible contrasts Estimate the models and conduct post-hoc tests for your substantive hypotheses behind the scenes Do NOT present the is, standard errors, and simple slopes calculations for each of the alternative specifications! Create a table of detailed results from your “base” specification Supplement it with a table or chart reporting the predicted values of the dependent variable for pertinent values of the IVs in the interaction, calculated from the is Use symbols to denote which values are statistically significantly different from one another Once you have estimated the alternative specifications, you might be wondering how to present all of those results. You clearly do not want to present coefficients and standard errors from ALL of those specifications. Your audience will lose the forest for the trees, and the alternative specifications (if done correctly) all yield the same overall shape of the pattern among the IVs and DV No one set of ^is and associated standard errors (from any one specification) will allow readers to conduct inferential tests for all possible contrasts, creating an awkward situation for an author seeking to communicate the results efficiently. Decide on the “base” specification and present detailed results from that model in a table. The podcast on choosing a reference category will help you identify pertinent theoretical and empirical criteria to decide upon your preferred “base” specification, the version for which you will present a table of detailed coefficients, standard errors, and model GOF statistics. Then create a table or chart that conveys the overall pattern of the association among the IVs and DV, using symbols to denote which values are statistically significant different from one another.

18 Chart of results of post-hoc testing with alternative reference categories
* denotes statistically significantly different at p < 0.05 from non-Hispanic white > HS † denotes statistically significantly different at p < 0.05 from non-Hispanic black < HS £ denotes statistically significantly different at p < 0.05 from non-Hispanic black = HS ¥ denotes statistically significantly different at p < 0.05 from Mexican American = HS †£ *† * * * Here is an example of such a chart. It would be accompanied by a prose description of the direction, magnitude, and statistical significance of the overall interaction pattern. * † * *

19 Summary By estimating a series of models with different reference categories, can test statistical significance of different substantively important contrasts among groups defined by the interaction Use the simple slope technique to calculate standard errors for compound coefficients involved in the interaction Overall specification must be otherwise identical Same other control variables Same amount of detail in which main effects and interactions are conceptually tested Same analytic sample Conduct the post-hoc tests behind the scenes and communicate the results with a chart of the overall pattern, with symbols for statistical significance The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

20 Suggested resources Cohen, Jacob, Patricia Cohen, Stephen G. West, and Leona S. Aiken Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge, chapters 8 and 9. Figueiras, Adolfo, Jose Maria Domenech-Massons, and Carmen Cadarso Regression Models: Calculating the Confidence Interval of Effects in the Presence of Interactions. Statistics in Medicine 17: 2099–2105. Miller, J. E The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. University of Chicago Press, chapters 5 (tables), 6 (charts), and 16. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.

21 Suggested online resources
Podcasts on Introduction to interactions Choosing a reference category Specifying a model to test for interactions Calculating the overall shape of an interaction from regression coefficients Introduction to testing statistical significance of interactions Approaches to testing statistical significance of interactions Conducting post-hoc tests of compound coefficients using simple slopes As mentioned at the outset of this podcast, the material covered here is best understood as part of a set of techniques related to statistical significance testing and to interactions. Please review these other podcasts if any of the steps are confusing to you.

22 Suggested exercise Estimate an OLS model with an interaction between a three-category independent variable (IV1) and a two-category independent variable (IV2) Create a grid with a row for each category of IV1 and a column for each category of IV2 Shade the cells to show which contrasts can be formally tested for statistical significance based on this specification Standard errors for main effects terms Standard errors calculated for compound coefficients using the simple slopes technique Fill the values of the compound coefficients and standard errors into each cell Create a chart with 95% confidence intervals around each compound coefficient The online study guide for WAMA II does not include problem set questions or suggested course extensions related to the concepts and skills in this podcast. For those of you who want to practice applying these skills, [read next three slides]

23 Suggested exercise, cont.
Respecify the model, changing the reference category for the three-category IV involved in the interaction Create a grid to display compound coefficients and standard errors from Specification 2 Create a chart with 95% confidence intervals around each compound coefficient based on the results of Specification 2 Comment on which tests of statistical significance are possible with each of the two model specifications Describe which differences are statistically significant

24 Suggested exercise, cont.
Calculate the predicted value of the dependent variable (including the intercept term) separately for Specification 1 Specification 2 Compare Specifications 1 and 2 in terms of Their GOF statistics The overall shape of the interaction pattern Create a chart to summarize the findings of your post-hoc testing Write a 1 – 2 paragraph description of the direction, magnitude, and statistical significance of the interaction between IV1 and IV2

25 Contact information Jane E. Miller, PhD Online materials available at The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.


Download ppt "Using alternative reference categories to test statistical significance of an interaction This podcast is the last in the series on testing statistical."

Similar presentations


Ads by Google