Basic Introduction LOGISTIC REGRESSION Mike Bailey 2/19/2019
Course at statistics.com 2/19/2019
BASICS Response dichotomous Predictors X categorical (usually make these dichotomous Design variables) real-valued Predictors X 2/19/2019
PREDICTING PROBABILITIES 2/19/2019
LOGISTIC MODEL 2/19/2019
LOGIT FUNCTION So, if we can estimate p(x) and take the logit, we have a linear function of the x’s. We can use regression to estimate b’s 2/19/2019
ODDS p(x)/(1-p(x)) is the ODDS that Y=1 given x 2/19/2019
CASE 1: DICHOTOMOUS x data contingency table Y X 1 Y=0 Y=1 X=0 a d X=1 data contingency table Y=0 Y=1 X=0 a d X=1 c b 2/19/2019
ODDS what are the odds of Y=1 when X=1? 2/19/2019
ODDS RATIO when X=1 when X=0 Y=0 Y=1 X=0 a d X=1 c b when X=1 when X=0 ratio of odds for Y = 1 odds ratios have easily-understood interpretation 2/19/2019
EXAMPLE Y = 1 if the baby has low birth weight X = 1 if the mother has frequent prenatal care ODDS RATIO: the increase in P[Y=1] when X=1 “Low birth weight occurs half as often (O.R. = ½) when the mother has adequate prenatal care.” 2/19/2019
2/19/2019
THE MAGIC CONTINUES... b1 = ln(O. R.) the logit is linear in x 2/19/2019
USING R G <- glm(formula = weight ~ prenatal, family = binomial(link = logit) ) 2/19/2019
DATA Save out of Excel as a .csv file y x1 x2 x3 x4 x5 marine army navy iraqi 200 1 100 90 300 50 150 Save out of Excel as a .csv file > eof2 <-read.csv(file="e:datafile2.csv", header = TRUE) 2/19/2019
RESULTS > g2 <- glm(formula = y ~ x1+x2+x3+x4+x5, family = binomial(link=logit), data = eof2) > g2 Call: glm(formula = y ~ x1 + x2 + x3 + x4 + x5, family = binomial(link = logit), data = eof) Coefficients: (Intercept) x1 x2 x3 x4 x5 -950.506 3.714 -3.716 NA 951.613 -75.118 Degrees of Freedom: 36 Total (i.e. Null); 32 Residual Null Deviance: 29.31 Residual Deviance: 3.802 AIC: 13.8 H0: The model doesn’t explain the variability in the data Deviance statistic ~ sum of squares ~ c2 2/19/2019
ARMY vs. USMC > SERV2 <- glm(formula = y ~ marine + army, family = binomial(link=logit), data = eof2) > SERV2 Call: glm(formula = y ~ marine + army, family = binomial(link = logit), data = eof2) Coefficients: (Intercept) marine army -1.757e+01 1.577e+01 2.312e-09 Degrees of Freedom: 36 Total (i.e. Null); 34 Residual Null Deviance: 29.31 Residual Deviance: 28.71 AIC: 34.71 2/19/2019
AOR > region2 <- glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh + eagle, family = binomial(link=logit), data = eof2) > region2 Call: glm(formula = y ~ raleigh + topeka + denver + mobile + oshkosh + eagle, family = binomial(link = logit), data = eof2) Coefficients: (Intercept) raleigh topeka denver mobile oshkosh eagle -1.957e+01 1.693e+01 -2.086e-08 1.847e+01 3.913e+01 -2.086e-08 NA Degrees of Freedom: 36 Total (i.e. Null); 31 Residual Null Deviance: 29.31 Residual Deviance: 20.84 AIC: 32.84 2/19/2019
EXAMPLE Fear of Violence in Children Y = 1 iff the interview-ee anticipates being the victim of violence in the next 6 months Predictors are demographic Age (Design variable, 2-year categories) Race (Design variable) Below the Poverty Line (Dichotomous) Sex (Dichotomous) Two-parent home (Dichotomous) Recent victim (Dichotomous) 2/19/2019
EXAMPLE Fear of Violence in Children source: poster display, Gornto Teletechnet Center, ODU 2/19/2019
EARLY SEXUAL EXPERIENCE AND IQ Y=1 if the subject had sexual experience Predictors (X) are... design variables for intervals of the AHVPT (IQ) design variables for age (HS, Undergrad, Grad) design variables for specific universities source: http://www.gnxp.com/blog/2007/04/intercourse-and-intelligence.php 2/19/2019
RESULTS IQ of 100 was 5x more likely to have intercourse than an IQ 130 (odds ratios) Each IQ point increases the odds of virginity by 2.7% for males, 1.7% for females (estimates of b) Probability of virginity (predicted values of Y) Age 19 males: 20% Age 19 females: 25% College aged: 13% Princeton undergrads: 44% Harvard undergrads: 41% MIT graduate students: 35% 2/19/2019
RESULTS 2/19/2019
SUMMARY Logistic regression produces odds ratios, predicted values, and regression coefficients Odds ratios are easily interpreted Predictors (x’s) are often categorical or dichotomous 2/19/2019