Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
© Andrew Ho, Harvard Graduate School of Education Unit 4b– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 4b Today’s Topic Area
© Andrew Ho, Harvard Graduate School of EducationUnit 4a – Slide 3 The Bivariate Distribution of HOME on HUBSAL RQ: In 1976, were married Canadian women who had children at home and husbands with higher salaries more likely to work at home rather than joining the labor force (when compared to their married peers with no children at home and husbands who earn less)?
Unit 4b – Slide 4 This will be our statistical model for relating a categorical outcome to predictors. We will fit it to data using Nonlinear Regression Analysis … This will be our statistical model for relating a categorical outcome to predictors. We will fit it to data using Nonlinear Regression Analysis … Logistic Regression Model dichotomous outcome We consider the non-linear Logistic Regression Model for representing the hypothesized population relationship between the dichotomous outcome, HOME, and predictors … underlying probability that the value of the outcome HOME equals 1 The outcome being modeled is the underlying probability that the value of the outcome HOME equals 1 determines the slope but is not equal to it Parameter 1 determines the slope of the curve, but is not equal to it (in fact, the slope is different at every point on the curve). determines the intercept but is not equal to it Parameter 0 determines the intercept of the curve, but is not equal to it. The Logistic Regression Model © Andrew Ho, Harvard Graduate School of Education
Unit 4b – Slide 5 Building the Logistic Regression Model: The Unconditional Model We recall from multilevel modeling that we wish to maximize our likelihood, “maximum likelihood.” Because the likelihoods are a product of many, many small probabilities, we maximize the sum of log-likelihoods, an attempt at making a negative number as positive as possible. Later, we’ll use the difference in -2*loglikelihoods (the deviance) in a statistical test to compare models. We recall from multilevel modeling that we wish to maximize our likelihood, “maximum likelihood.” Because the likelihoods are a product of many, many small probabilities, we maximize the sum of log-likelihoods, an attempt at making a negative number as positive as possible. Later, we’ll use the difference in -2*loglikelihoods (the deviance) in a statistical test to compare models.
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 6 Building the Logistic Regression Model
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 7 Graphical Interpretation of the Logistic Regression Model Comparing local polynomial, linear, and logistic fits to the data.
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 8 The Likelihood Ratio Chi-Square Our Log Likelihood from our baseline model, with no predictors, is Deviance = -2*loglikelihood = Our Log Likelihood from our baseline model, with no predictors, is Deviance = -2*loglikelihood = Our Log Likelihood from our 1-predictor model is The loglikelihood of the data is less negative (more likely) given the model parameter estimates. Deviance = -2*loglikelihood = The deviance has dropped (and will always drop). Our Log Likelihood from our 1-predictor model is The loglikelihood of the data is less negative (more likely) given the model parameter estimates. Deviance = -2*loglikelihood = The deviance has dropped (and will always drop).
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 9
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 10 Interpreting Model Results Graphically, Formulaically Husband's income in 1976 Canadian Dollars Estimated probability that the wife is a homemaker $10,00064% $20,00080% $30,00090% $40,00095%
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 11 Interpreting Logistic Model Parameter Estimates – Interpreting Sign
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 12 Object Is it an Easter Egg? (0 = no;1 = yes) Probability of picking an Easter Egg at random, p Odds of picking an Easter Egg (vs. not an Easter Egg), (p/1-p) Log-Odds of picking an Easter Egg (vs. not an Easter Egg), Log(p/1-p) Probability, Odds, and Log-Odds: Formulaically
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 13 One issue with probabilities is that their range of admissible values is restricted, to falling between 0 and 1. This was one of our clues that a linear model would be inappropriate. The logit transformation stretches the probability scale, facilitating a linear relationship p p Probability Theoretical Range Minimum Maximum Formula Quantity -- -- ++ ++ Log(Odds) or “logit” Notice that a log-odds transformation of a probability leads to a scale with an unrestricted range 0 0 ++ ++ Odds p p 1 Probability, Odds, and Log-Odds: By Range
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 14 From Probabilities to Odds PercentageProbabilityOdds 10%0.101/ %0.251/ %0.501/11 75%0.753/13 90%0.909/19
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 15 From Probabilities to Log-Odds (Logits) PercentageProbabilityLogits 10% % % % %
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 16 The Logistic Function as the Inverse of the Logit Function
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 17 General Relationship Our Model
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 18 Interpreting Coefficients in Terms of Logits (Log-Odds)
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 19 Interpreting Model Results in Terms of Odds When the husband earns $10K/year, the fitted odds that the woman is a homemaker is 1.77 to 1. When the husband earns $10K/year, for every woman in the workforce, we estimate that 1.77 are homemakers. When the husband earns $10K/year, the estimated probability that the woman is a homemaker is 1.77 times the estimated probability that the woman works outside the home.
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 20 Interpreting Model Results in Terms of Odds Ratios Husband's income in 1976 Canadian Dollars Estimated probability that the wife is a homemaker Estimated odds that the wife is a homemaker Estimated Odds Ratio $10,00064% $20,00080% $30,00090% $40,00095%20.15 We can calculate the ratio of odds at regular intervals: How much greater are the odds that a wife is a homemaker when the husband’s salary is $20,000 vs. $10,000? This odds ratio is 3.99/1.77= This is not a typo! Successive odds ratios are constant!
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 21 From Log-Odds to Odds Ratios
© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 22 Four Ways to Interpret Slope Coefficients in a Logistic Regression Model Pick Prototypical Odds Estimated odds of being a homemaker across prototypical husband’s income levels: Pick Prototypical Odds Estimated odds of being a homemaker across prototypical husband’s income levels: Husband's income in 1976 Canadian Dollars Estimated probability that the wife is a homemaker Estimated odds that the wife is a homemaker Estimated Odds Ratio $10,00064% $20,00080% $30,00090% $40,00095%20.15 Log-Odds/Logits Two women whose husband’s 1976 salaries differ by $1000 differ by.081 in their fitted log-odds of being a homemaker. Log-Odds/Logits Two women whose husband’s 1976 salaries differ by $1000 differ by.081 in their fitted log-odds of being a homemaker. Pick Prototypical Probabilities Estimated probabilities of being a homemaker across prototypical husband’s income levels: Pick Prototypical Probabilities Estimated probabilities of being a homemaker across prototypical husband’s income levels: