Multinomial Logistic Regression David F. Staples
Outline Review of Logistic Regression BCS Example Extension to Multiple Response Groups Nominal Categories Ordinal Categories Model Fitting & Interpretation Shallow Lake Trophic Status
Logistic Regression Based on a Binomial Random Variable: Y = {0,1} Prob(Y = 1) = p Prob(Y = 0) = 1-p p(x) = P(Y i = 1|X i ) =, where Xβ = β 0 + β 1 x 1 +…+ β k x k.
Logistic Regression Based on a Binomial Random Variable: Y = {0,1} Prob(Y = 1) = p Prob(Y = 0) = 1-p p(x) = P(Y i = 1|X i ) =, where Xβ = β 0 + β 1 x 1 +…+ β k x k. A logit transformation is used to linearize p(x): = β 0 + β 1 x 1 +…+ β k x k = Xβ → The β’s give the additive effect of X’s on the Log Odds Log Odds of ‘Success’
Logistic Regression Example Model p as a function of Macrophyte Patch Area glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept e e e-06 Patch_area 1.765e e Dichotomous Variable is the Presence/Absence of BCS Y = 1 if BCS Present Y = 0 if BCS Absent p = Prob(BCS Present)
Interpreting Logistic Regression glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept e e e-06 Patch_area 1.765e e Effect of Patch Area on P(BCS) Non-Linear Transformation Value of Intercept Value of Other Variables
Interpreting Logistic Regression For the average size patch area (8374), the log odds ratio would be: * 8374 = exponentiate to get the Odds of Success: exp(-.955) = p/1-p = 0.38, Solve for p, Prob(BCS Present|Area=8374) =.28 glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept e e e-06 Patch_area 1.765e e
Interpreting Logistic Regression When p = 0.5, the log odds equals 0, – *Area = 0. Thus, the patch area for p =.50 is 2.433/ = glm(BCS ~ Patch_area, family = binomial) Estimate SE z Pr(>|z|) Intercept e e e-06 Patch_area 1.765e e
Multinomial Logistic Regression Logistic Regression with > 2 response categories Model Probabilities Relative to ‘Reference’ Category Response May be Nominal or Ordinal NominalOrdinal
Shallow Lake Trophic Status 3 Categories Defining Lake State: Y = 1 if Lake Clear Y = 2 if Lake Shifting States Y = 3 if Lake Turbid
Nominal (un-ordered) Multinomial Logistic library(nnet) multinom(StateNom ~ TP) (Int) TP Std. Errors: (Int) TP Residual Deviance: AIC:
Nominal (un-ordered) Multinomial Logistic Library(nnet) multinom(StateNom ~ TP) (Int) TP For TP = 50 p(Shifting) is about 16% of p(Clear)
Nominal (un-ordered) Multinomial Logistic For TP = 50 p(Turbid) is about 30% of p(Clear) Library(nnet) multinom(StateNom ~ TP) (Int) TP
Nominal (un-ordered) Multinomial Logistic Odds of Shifting State vs. Clear State
Ordinal Multinomial Logistic a.k.a. Proportional Odds Model 3 Ordered Status Categories: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid
Ordinal Multinomial Logistic a.k.a. Proportional Odds Model library(MASS) StateOrd = as.ordered(StateNom) polr(StateOrd ~ TP) Value SE t value TP Intercepts: Value SE t value 1| | Residual Deviance: AIC: Ordered Status Categories: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid Assume Same Slope => Fewer Parameters
m2 = polr(StateOrd ~ TP) newd = data.frame(TP = seq(0,600)) prd = predict(m2, newdata=newd, type='p') matplot(newd$TP,prd)
Nominal/Ordinal Comparison
Nominal (un-ordered) Multinomial Logistic Library(nnet) multinom(StateNom ~ TP) (Intercept) TP Std. Errors: (Intercept) TP Residual Deviance: AIC: For J = 3 Categories defining lake state: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid
Ordinal Multinomial Logistic a.k.a. Proportional Odds Model For J = 3 Categories defining lake state: Y = 1 if lake clear Y = 2 if lake shifting states Y = 3 if lake turbid (State 2 is Intermediate between 1 & 3) Library(MASS) StateOrd = as.ordered(StateNom) polr(StateOrd ~ TP, Hess = T) Value SE t value TP Intercepts: Value SE t value 1| | Residual Deviance: AIC: