Introduction to interactions in regression models: Concepts and equations Jane E. Miller, PhD Interactions in regression models occur when the association.

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Dummy Variables Dummy variables refers to the technique of using a dichotomous variable (coded 0 or 1) to represent the separate categories of a nominal.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Planning a speech and designing effective slides Jane E. Miller, PhD.
Analysis of frequency counts with Chi square
Chapter 4 Multiple Regression.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Organizing data in tables and charts: Different criteria for different tasks Jane.
Logarithmic specifications Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Paper versus speech versus poster: Different formats for communicating research.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating effective tables and charts Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction patterns from logit coefficients: Interaction between two.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Comparing overall goodness of fit across models
Lecture 3-3 Summarizing r relationships among variables © 1.
Multiple Regression 1 Sociology 5811 Lecture 22 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating the shape of a polynomial from regression coefficients Jane E. Miller,
The Chicago Guide to Writing about Numbers, 2 nd edition. Summarizing a pattern involving many numbers: Generalization, example, exception (“GEE”) Jane.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Preparing speaker’s notes and practicing your talk Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Implementing “generalization, example, exception”: Behind-the-scenes work for summarizing.
The Chicago Guide to Writing about Numbers, 2 nd edition. Basics of writing about numbers: Reporting one number Jane E. Miller, PhD.
The Chicago Guide to Writing about Numbers, 2 nd edition. Differentiating between statistical significance and substantive importance Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Writing prose to present results of interactions Jane E. Miller, PhD.
Examining Relationships in Quantitative Research
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Data structure for a discrete-time event history analysis Jane E. Miller, PhD.
Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Criteria for choosing a reference category Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Standardized coefficients Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Choosing tools to present numbers: Tables, charts, and prose Jane E. Miller, PhD.
The Chicago Guide to Writing about Numbers, 2 nd edition. Choosing a comparison group Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Testing statistical significance of differences between coefficients Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns between two categorical independent.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Visualizing shapes of interaction patterns with continuous independent variables.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Presenting results Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Creating charts to present interactions Jane E. Miller, PhD.
Approaches to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Model specification Jane E. Miller, PhD.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Calculating interaction effects from OLS coefficients: Interaction between 1 categorical.
Overview of categorical by categorical interactions: Part I: Concepts, definitions, and shapes Interactions in regression models occur when the association.
Learning Objectives For two quantitative IVs, you will learn:
Bivariate & Multivariate Regression Analysis
Calculating interaction effects from OLS coefficients: Interaction between two categorical independent variables Jane E. Miller, PhD As discussed in the.
Introduction to Regression Analysis
Using alternative reference categories to test statistical significance of an interaction This podcast is the last in the series on testing statistical.
Multiple Regression Analysis and Model Building
SIMPLE LINEAR REGRESSION MODEL
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Creating variables and specifying models to test for interactions between two categorical independent variables This lecture is the third in the series.
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Building Models: Mediation and Moderation Analysis
Logistic Regression.
Regression and Categorical Predictors
Overview of categorical by continuous interactions: Part II: Variables, specifications, and calculations Interactions in regression models occur when.
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Testing whether a multivariate specification can be simplified
Forecasting Plays an important role in many industries
Presentation transcript:

Introduction to interactions in regression models: Concepts and equations Jane E. Miller, PhD Interactions in regression models occur when the association between one independent variable and the dependent variable DIFFERS depending on values of a second independent variable. The example we will trace in this series of lectures investigates whether the association between socioeconomic status and birthweight is the same for all racial/ethnic groups, where birthweight is our dependent variable (or outcome), and race and socioeconomic status are our independent variables (or predictors). Later in this lecture, I will mention a couple of other examples of questions that can be addressed using interaction specification in a regression model.

Overview What is an interaction? Model specifications: equations for Definitions Synonyms Model specifications: equations for Main-effects-only models Model with interactions Illustrative charts In this module, we will cover the basic conceptual definition of an interaction, Learn some synonyms See how to specify models with interactions as contrasted from those that include only main effects of the constituent variables. I will then illustrate how a model with interactions affects the implied shape of the association between the two IVs and the DVs.

What is an interaction? The association between one independent variable (X1) and the dependent variable (Y) differs depending on the value of a second independent variable (X2). Can be thought of as an exception to a general pattern: X1 is associated with Y in one way when X2 = 1, but in a different way when X2 = 2. X1 is sometimes termed the “focal predictor” X2 is referred to as the “modifier” or “modifying variable.” In the most general terms, we say that an interaction occurs when the level or shape of an association between one IV and the DV differs depending on the value of a second IV. One way to think about this is that one IV (which we refer to as X1) is associated with the DV (called Y) in one way when a second IV (called X2 is equal to 1, than when that second IV =2. The association between X1 and Y could take on a different DIRECTION depending on the value of X2, or that association could vary in SIZE depending on the value of X2.

Statistical interactions defined When X1 and X2 not only potentially have separate effects on Y, but also have a joint effect that is different from the simple sum of their respective individual effects. The association between X1and Y is conditional on X2. The specific combinations of values of X1 and X2 determine the value of Y. Put differently, an interaction occurs when two independent variables, X1 and X2 not only each have separate effects on Y, but [read slide]

Three general shapes of interaction patterns Size: The effect of X1 on Y is larger for some values of X2 than for others; Direction: the effect of X1 on Y is positive for some values of X2 but negative for other values of X2; The effect of X1 on Y is non-zero (either positive or negative) for some values of X2 but is not statistically significantly different from zero for other values of X2. Interactions can occur in any of three broad types: The association between X1 and Y can differ In terms of size – e.g., larger for some values of X2 than for others. Direction – a positive association for some values of X2 and a negative association for others. Or a non-zero (positive or negative) association for some values of the modifying variable but a zero (or not stat sig) association for other values of the modifier. Now let’s look at an example of each in turn.

Example interaction topics: Magnitude The interaction can occur in terms of magnitude. The size of the association between X1 and Y depends on values of X2. Birth weight increases more rapidly with family income for non-Hispanic white than for Latino infants. The steepness of the income (X1)/birth weight (Y) gradient depends on ethnicity (X2).

Example interaction topics: Direction The interaction can occur in terms of direction. The direction of the association between X1 and Y depends on values of X2. Being married is associated with higher earnings for men, but lower earnings for women. The association between marital status (X1) and earnings (Y) works in opposite directions for each of the two genders (X2).

Example interaction topics: Effect for some but not all groups The interaction can occur in terms of magnitude. The association between X1 and Y is statistically significant only for some values of X2. The harmful effect of secondhand smoke (X1) on childhood asthma (Y) is ameliorated if the child was breast-fed (X2 =1) but remains for children who were not breast-fed (X2 =2). Breast-feeding modifies the smoke/asthma association.

Synonyms for “interaction” Terminology for interactions varies by discipline. Common synonyms include: Effects modification Moderating effect Modifying effect Joint effect Contingency effect Conditioning effect Heterogeneity of effects Before we go on to look at some illustrations of hypothetical interaction patterns, let me mention some synonyms for “interaction” that you might have encountered. One is “effects modification”, which as the term suggests, asks whether the effect of one IV (e.g., mother’s education) on the DV (birthweight) is modified by race. This type of pattern is sometimes termed an “elaboration paradigm”, meaning that the simple two way association between X and Y (e.g., education and birthweight) needs to be elaborated by taking into account a second IV (race), because no one pattern characterizes the education/birthweight association for all racial ethnic groups.

Recognizing when an interaction specification should be tested Could be based on Theory of how X1, X2, and Y are related to one another. E.g., different mechanisms linking X1 and Y for different values of X2 Previous studies of the same topic. Empirical evidence in your own data: Three-way association among X1, X2, and Y As we will see in later podcasts, interactions take a lot of work to specify and then calculate the overall pattern, so we need some basis for deciding when it is important to test for interactions – in other words, for which variables we want to test for interactions. Theory Previous studies Own data: Three-way assn among X1, X2 and Y If any of the following suggest that the direction or size of an association between one independent variable and the dependent variable varies depending on values of a second independent variable.

Specifying an interaction model Multivariate regression specifications to test for interactions include a combination of “main effects terms” and “interaction terms.” We will go into more detail about how to specify a model with interactions in a later module, but first, let’s get acquainted with some terminology used for interactions, and learn how the various pieces are defined and specified.

Main-effects-only specification A main-effects-only model implies that controlling for other covariates (Xi), the effect of X1 on Y is the same for all values of X2, and the effect of X2 is the same for all values of X1. Its specification can be written: Y = β0 + β1X1 + β2X2, where X1 is the main effect term for the first independent variable (IV) X2 is the main effect term for a second IV

Example: Main-effects-only model If Y is birth weight in grams X1 is family income in dollars X2 is a dummy variable for black race coded 1 for black infants 0 for white infants, the reference category Birth weight = β0 + β1Income + β2Black implies that the slope of the income/birth weight curve is the same for black as for white infants the income/birth weight (X1/Y) association is not modified by race Let’s apply these general concepts to a specific example. Our dependent variable, Y, is birth weight in grams Our first independent variable is family income in dollars Our second IV is a dummy variable for black race. Substituting those specific concepts into the equation, we get: This specification implies [read last bullet]

Main effects of race and income, but no interaction with income Income/birth weight curves for blacks and whites have same slope (their curves are parallel) But different intercepts Birth weight (grams) White Black Suppose we obtain a positive coefficient on income and a negative coefficient on black race. If we were to graph those results, we would obtain Upward sloping parallel lines for the association between income and birth weight for each racial group. They slope upward because of the positive income coeff) The line for blacks is lower than the line for whites because of the negative coefficient on black race. The income/birth weight lines are parallel because the main effects only specification imposes that assumption of no difference in slope . If we want to test whether those slopes differ, we need a specification that includes an interaction between race and income. Income ($)

Interaction specification A model with interactions implies that controlling for other covariates, the effect of X1 on Y is different for different values of X2. Y = β0 + β1X1 + β2X2 + β3X1 _ X2, where X1 is the main effect term for the focal IV in the interaction, X2 is the main effect term for the modifying IV, X1 _X2 is the interaction term between the focal and modifying IVs. A specification with main effects and interaction means that…

Interaction term The value of the interaction term variable is defined as the product of the two component variables: X1_ X2 = X1 × X2 When naming an interaction term variable, I often use an “_” to connect the names of the two component variables. E.g., <HS_black would be the interaction between the two variables “<HS” and “black.” The way we implement this is to create a new variable, called an interaction term that is calculated as the product of the two component variables. e.g., the interaction term X1_X2 takes on the value X1 * X2 for each case. I often use the naming convention [read]

Example calculation of interaction term E.g., for case #1, if age (X1) = 27, income (X2) =$10,000, the interaction term age_income = 27 × 10,000 = 27,000. See podcast on creating variables to test for interactions for additional detailed examples. As a concrete example, if we are calculating an interaction term between age in years and income in dollars, written age_income. If respondent #1 is 27 years old with an income of $10,000, the value of age_income for that case would be 27 * 10,000 or 27,000. See the podcast [read last bullet]

Contingency of coefficients in an interaction model Y = β0 + β1X1 + β2X2 + β3X1 _ X2, Inclusion of the interaction term X1_ X2 means that the βis on the main effects terms X1 and X2 no longer apply to all values of X1 and X2. The main effects and interactions βis for X1 and X2 are contingent upon one another and cannot be considered separately. Once we include that interaction term X1_X2 in the model, the coefficients involving X1 and X2 can no longer be interpreted independently of one another. Another way to think about this is that the main effects coefficients for X1 and X2 no longer apply to all values of those variables, but rather must be interpreted for specific combinations of values of X1 and X2.

Implications for interpreting main effects and interaction coefficients Y = β0 + β1X1 + β2X2 + β3X1_X2 In the interaction model: β1 estimates the effect of X1 on Y when X2 = 0, β2 estimates the effect of X2 on Y when X1 = 0, β3 must also be considered in order to calculate the shape of the overall pattern among X1, X2, and Y. E.g., when X1 and X2 take on other values. See podcast on calculating the shape of an interaction pattern. The implications of the interaction specification are [read rest of bullets] This will be explained in detail in the podcasts on calculating the overall shape of the interaction pattern from regression coefficients.

Example: Interaction model BW = β1Income + β2Black + β3Income_black, If β3 is statistically significantly different from zero, the slope of the income/birth weight curve is different for black than for white infants. β1 estimates the association between income and birth weight among whites (e.g., when Black = 0) β2 estimates the difference in birth weight for blacks compared to whites at income = 0. β3 estimates how predicted birth weight deviates from the value implied by β1 and β2 alone, for different combinations of race and income. Applying these general concepts to our topic example, the equation for a model with main effects and interactions would read.

Main effects of race and income, and interaction of race with income Birth weight/income curves for blacks and whites have Different slopes and Different intercepts White Black Birth weight (grams) By including the interaction between income and race, the model allows not only for different intercepts by race, but also different slopes of the income/birth weight curves by race. In this hypothetical picture, white infants weigh more than black infants at all income levels, both blacks and whites have increasing birth weight with increasing income, but the increase in birth weight per $ income is lower for blacks than for whites. – a shallower positive slope. Income ($)

Summary Interactions occur when the association between one IV and the DV differs depending on the values of a second IV. Can occur in terms of Direction Magnitude Statistical significance Can be tested using multivariate regression models involving main effects and interaction terms (variables).

Suggested resources Chapter 16, Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Chapters 8 and 9 of Cohen et al. 2003. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, 3rd Edition. Florence, KY: Routledge.

Suggested online resources Podcasts on Visualizing shapes of interactions Creating variables and specifying models to test for interactions Calculating overall shape of an interaction pattern from regression coefficients Now that you’ve been introduced to these general concepts about interactions, I recommend that you watch the next in the series of podcasts, starting with the one on visualizing shapes of interaction patterns.

Suggested practice exercises Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Question #1 in the problem set for Chapter 16 Reviewing” exercise #1 in the suggested course extensions for Chapter 16

Contact information Jane E. Miller, PhD jmiller@ifh.rutgers.edu Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html