Beyond Bivariate: Exploring Multivariate Analysis
3 Topics Covered 1.Logic of introducing a third variable 2.Multiple linear regression: Which independent (predictor) variables are significantly related to the dependent (outcome) variable? 3.Logistic regression: Binary outcome variable
A Focal Relationship Residential mobility and school achievement This is a negative or inverse relationship: Higher residential mobility Low achievement WHY?
The 0-Order Bivariate Relationship We are going to call our initial bivariate relationship the 0-order relationship: Residential mobility School achievement
Spurious Relationship/Explanation Could there be variables that are associated with high levels of residential mobility and with low school achievement, creating an apparent but spurious relationship between residential mobility and achievement — thus EXPLAINING AWAY the initial bivariate relationship?
Spurious Relationship Do taller people like action movies more than shorter people do? What is the third variable? Do days of high lemonade sales have more drowning fatalities than days with low lemonade sales? What is the third variable?
Intervening Variables: Interpretation What variables can you suggest that “go in between” residential mobility and school achievement that might help us understand our focal relationship better? These intervening variables do NOT explain away the relationship — they clarify why/how it comes about.
Intervening Variables: Interpretation Examples Why do women have lower incomes than men? Maybe they have not acquired the technical and managerial skills that men have. Maybe they are less interested in promotions into management than men are. (These interpretations suggest that gender discrimination in salary decisions is not the only reason women have lower incomes than men.)
The Difference between Interpretation (Intervening) and Explanation (Spurious) Gender height movie preferences Gender, the third variable, explains away the spurious height movie preference relationship. Gender career choices income Career choices, the intervening third variable, contributes to interpreting the initial relationship between gender and income.
Specification or Interaction Effects Sometimes when we introduce a third variable, we find that the initial bivariate (0-order) relationship is different for different categories of the third variable.
Specification: Examples [1] In research on school achievement we (Prof. Bootcheck and I) looked at the relationship between living in a nuclear family and grades. For whites, this relationship was positive. For all other racial-ethnic categories, there was no relationship.
Specification: Examples [2] Can you think of a variable we could introduce into our statistical analysis technique of the relationship between residential mobility and school achievement that might have different bivariate relationships (one strong, one absent) for different categories of the third variable?
Specification in a Crosstab In a crosstab, this specification or interaction effect would show up as a strong/significant relationship in one of the tables for the layer variable (the third variable), and it would be “Not Significant” in the table for the other category of the layer variable. In other words, the chi-square for one partial table is significant, but it is not significant for the other partial.
Suppressed Effects [1] Introducing a third variable can reveal its suppressed effects, which work in opposing directions, cancelling each other out. Fictitious example: Religious intensity and death penalty views 0-order: There appears to be no relationship.
Suppressed Effects [2] When we introduce region (north or south), we see that the effects are opposite: For people living in the north of this fictitious country, high religious intensity goes with opposition to the death penalty. For people living in the south, high religious intensity goes with support for the death penalty. The two inverse or opposed relationships cancel each other out, unless we break the data down by the regional variable.
Final Possibility: Replication It is possible that the initial bivariate relationship persists when we introduce the third variable. The partial tables for the categories of the third (layer) variable look just the same as the initial two-variable table.
Multivariate or Multiple Linear Regression We specify two or more independent variables. Each may have a significant and maybe moderate or even strong correlation with the dependent variable. When they are placed in the regression model, “only the strongest survive.” If they do not have a relationship with the DV independent of their relationship with each other, they will not be significant in the model.
Examples from the Country Data Set Look at adjusted R 2. Which variables have significant coefficients? What do the relative sizes of the betas tell you? Hard to visualize. Building models—all variables entered at the same time or stepwise. See Nardi (2006, p. 97), which is cited in Garner (2010, p. 333).
Logistic Regression [1] Currently, logistic regression is a very popular statistical analysis! It involves a dichotomous (or binary) outcome variable. We can compute an overall odds ratio for the two possible outcomes of this variable. It involves examining predictor variables (IVs) to see if each one is related to a change in the odds ratio from its overall level. EXAMPLE: Does growing up in a bilingual family raise or lower an individual’s probability of completing high school, compared to the overall odds of doing so?
Logistic Regression [2] Independent variables need to be interval-ratio or dummied variables (categoric variable broken down into binary variables). Alert: Which categories are defined as 0 and 1 for all the binary variables? Negative coefficients mean lower odds. The odds ratio falls below 1.
Logistic Regression: Example 1 Are income, race-ethnicity, gender, region, and religion related to a vote for the Republican presidential candidate? What characteristics raise the odds and which lower the odds of a Republican vote? Which categories are labelled 1? Which 0? (This will make a difference in how to read the table of coefficients.)
Logistic Regression: Example 2 What individual characteristics are related to experiencing foreclosure on one’s home? Binary outcome = foreclosed or not foreclosed so logistic regression Contrast this to a question that could be answered with linear regression. What neighbourhood characteristics are related to a high foreclosure rate?