Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Building the Logistic Regression Model Dichotomous Predictors Interactions Post-Hoc GLH Tests © Andrew Ho, Harvard Graduate School of Education Unit 4c– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 4c Today’s Topic Area
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 3 The Bivariate Distribution of HOME on HUBSAL RQ: In 1976, were married Canadian women who had children at home and husbands with higher salaries more likely to work at home rather than joining the labor force (when compared to their married peers with no children at home and husbands who earn less)?
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 4 The Bivariate Distribution of HOME on CHILD Scatterplots don’t work very well with dichotomous outcomes and dichotomous predictors. Instead, try a 2x2 table with the “tabulate” command. Note (1,1) is in the lower right for tables but upper right for scatterplots. Scatterplots don’t work very well with dichotomous outcomes and dichotomous predictors. Instead, try a 2x2 table with the “tabulate” command. Note (1,1) is in the lower right for tables but upper right for scatterplots. Specifies conditional percentages by rows (and joint probabilities by cells): Given that there is a child present, the sample probability of being a homemaker is 86.58%. Given that there is no child present, the sample probability of being a homemaker is 35.29%. Specifies conditional percentages by rows (and joint probabilities by cells): Given that there is a child present, the sample probability of being a homemaker is 86.58%. Given that there is no child present, the sample probability of being a homemaker is 35.29%.
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 5 Sample Probabilities, Odds, Log-Odds, and Odds Ratios Are Children Present in the Home? Sample Probability Homemaker Sample Log-Odds (Logit) Sample Difference in Log-Odds Sample Odds Ratio Sample Log- Odds Ratio No Child35.29% Children86.58% I recommend understanding the logit scale (nonlinear in probability): -2 is around 10%, -1 is around 25%, 0 is 50%, 1 is 75%, 2 is 90%. I recommend understanding the logit scale (nonlinear in probability): -2 is around 10%, -1 is around 25%, 0 is 50%, 1 is 75%, 2 is 90%. We note that an increment from No Child (0) to Children (1) increments the log-odds by 2.47.
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 6 Modeling a Dichotomous Outcome on a Dichotomous Predictor Are Children Present in the Home? Sample Probability Homemaker Sample Log-Odds (Logit) Sample Difference in Log- Odds No Child35.29% Children86.58%1.864
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 7 Building the logistic regression model Our old friend eststo: Beginning with the baseline model, no predictors, constant only (Model 1). Adding main effects separately (Models 2 and 3), together (Model 4), and an interaction (Model 5) At each step, save the “deviance” (-2*loglikelihood) Our old friend eststo: Beginning with the baseline model, no predictors, constant only (Model 1). Adding main effects separately (Models 2 and 3), together (Model 4), and an interaction (Model 5) At each step, save the “deviance” (-2*loglikelihood)
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 8 Interpretation of Main Effects
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 9 Interpretation of Fit Statistics
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 10 Graphical Representation of Model 4 It is always good practice to only plot fitted curves in the range of the data whose relationships they describe. It is particularly important for graphing logistic regression models on the probability metric, where there are clearly nonlinear relationships. See today’s code for details. Label curves. It is always good practice to only plot fitted curves in the range of the data whose relationships they describe. It is particularly important for graphing logistic regression models on the probability metric, where there are clearly nonlinear relationships. See today’s code for details. Label curves. No Children Children How do we interpret the varying gap? As an interaction? No! There is no interaction in Model 4. The scale is not what it seems. This is actually a linear model in the log-odds. The distance is just as large at the extremes as it is in the center, it just doesn’t seem that way, since we are plotting on the probability metric. How do we interpret the varying gap? As an interaction? No! There is no interaction in Model 4. The scale is not what it seems. This is actually a linear model in the log-odds. The distance is just as large at the extremes as it is in the center, it just doesn’t seem that way, since we are plotting on the probability metric.
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 11 Contrasting Graphical Representations of Model 4 No Children Children No Children Children
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 12 Interpretation of Model 5 No Children Children
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 13 Contrasting Graphical Representations of Model 5 No Children Children No Children Children
Foll © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 14 Post-Hoc GLH Tests: Gaps Between Conditional Logistic Curves No Children Children No Children Children
Foll © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 15 Post-Hoc GLH Tests: Conditional Slopes No Children Children No Children Children Are these “slopes” 0 in the population? Are these slopes 0 in the population?
Foll © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 16 Even more “general” GLH Tests: Any two points. No Children Children No Children Children Does a wife with 1+ child and a low-income husband ($1K) have the same population probability of being a homemaker as... a wife with no children but a more wealthy husband ($35K)?
© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 17 Foll Revisiting Model Fit and Error Variance