Unit 32: The Generalized Linear Model

Slides:



Advertisements
Similar presentations
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Advertisements

Prof. Navneet Goyal CS & IS BITS, Pilani
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logit & Probit Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Logistic Regression Example: Horseshoe Crab Data
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
Log-linear and logistic models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
An Introduction to Logistic Regression
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Logistic Regression Analysis Gerrit Rooks
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Chapter 13 Understanding research results: statistical inference.
Logistic Regression and Odds Ratios Psych DeShon.
Simple Statistical Designs One Dependent Variable.
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Logistic Regression: Regression with a Binary Dependent Variable.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Unit 10: Model Assumptions
Transforming the data Modified from:
BINARY LOGISTIC REGRESSION
Chapter 4 Basic Estimation Techniques
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Learning Objectives For two quantitative IVs, you will learn:
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
Basic Estimation Techniques
Non-linear relationships
Generalized Linear Models
12 Inferential Analysis.
Interactive Models: Two Quantitative Variables
Checking Regression Model Assumptions
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Checking Regression Model Assumptions
SAME THING?.
Ass. Prof. Dr. Mogeeb Mosleh
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Undergraduated Econometrics
What is Regression Analysis?
Introduction to Logistic Regression
12 Inferential Analysis.
Simple Linear Regression
Logistic Regression with “Grouped” Data
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Generalized Additive Model
Presentation transcript:

Unit 32: The Generalized Linear Model

What are the assumptions of general linear models? What are the consequences of violating each of these assumptions? What options exist when these assumptions are violated?

The general linear model make the 5 assumptions below The general linear model make the 5 assumptions below. When these assumptions are met, OLS regression coefficients are MVUE (Minimum Variance Unbiased Estimators) and BLUE (Best Linear Unbiased Estimators). 1. Exact X: The IVs are assumed to be known exactly (i.e., without measurement error) 2. Independence: Residuals are independently distributed (prob. of obtaining a specific observation does not depend on other observations) 3. Normality: All residual distributions are normally distributed 4. Constant variance: All residual distributions have a constant variance, SEE2 5. Linearity: All residual distributions (i.e., for each Y') are assumed to have means equal to zero

Problems and Solutions Exact X: Biased parameters (to the degree that measurement error exists). Use reliable measures Independence: Inaccurate standard errors, degrees of freedom and significance tests. Use repeated measures or linear mixed effects models or ANCOVA Normality: Inefficient (with large N). Use power transformations, generalized linear models Constant variance: Inefficient and inaccurate standard errors. Use power transformations, SE corrections, weighted least squares , generalized linear models Linearity: Biased parameter estimates. Use power transformations, polynomial regression, generalized linear models

glm(formula, family=familytype(link=linkfunction), data=) Default Link Function binomial (link = "logit") gaussian (link = "identity") Gamma (link = "inverse") inverse.gaussian (link = "1/mu^2") poisson (link = "log") quasi (link = "identity", variance = "constant") quasibinomial quasipoisson

An Example Predicting admission to grad program in engineering based on quantitative GRE, GPA, and Undergraduate Institution Rank n mean sd min max admit 400 0.42 0.49 0.00 1 gre 400 587.70 115.52 220.00 800 gpa 400 3.39 0.39 2.05 4 rank 400 2.48 0.94 1.00 4 rank2 400 1.47 0.50 1.00 2

Describe the effect of GPA on admission to grad school mLM = lm(admit ~ gpa, data=d) summary(mLM) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.94317 0.18202 -10.68 <2e-16 *** gpa 0.69691 0.05339 13.05 <2e-16 *** --- Residual standard error: 0.4137 on 398 degrees of freedom Multiple R-squared: 0.2998, Adjusted R-squared: 0.298 F-statistic: 170.4 on 1 and 398 DF, p-value: < 2.2e-16 Describe the effect of GPA on admission to grad school

What are the problems with using a general linear model to assess the effects of these predictors on admission outcomes? Residuals will not be normal (not efficient) Residual variance often will not be constant (not efficient, SE are inaccurate) Relationship will not be linear (parameter estimates biased) Y is not constrained between 0 – 1 (model may make nonsensical predictions

plot(d$gpa,d$admit, type='p', pch=20)

plot(d$gre,jitter(d$admit,1), type='p', pch=20)

abline(mLM)

ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: Level of Significance = 0.05 Call: gvlma(x = model) Value p-value Global Stat 42.36416 1.402e-08 Skewness 5.12561 2.358e-02 Kurtosis 13.81465 2.018e-04 Link Function 23.39426 1.320e-06 Heteroscedasticity 0.02965 8.633e-01 Decision Global Stat Assumptions NOT satisfied! Skewness Assumptions NOT satisfied! Kurtosis Assumptions NOT satisfied! Link Function Assumptions NOT satisfied! Heteroscedasticity Assumptions acceptable.

modelAssumptions(mLM,'normal')

modelAssumptions(mLM,'constant')

modelAssumptions(mLM,linear')

Estimate Std. Error z value Pr(>|z|) mGLM = glm(admit ~ gpa, data = d, family = binomial(logit)) summary(mGLM) Deviance Residuals: Min 1Q Median 3Q Max -2.0618 -0.8447 -0.3531 0.7644 2.3527 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 543.58 on 399 degrees of freedom Residual deviance: 400.03 on 398 degrees of freedom AIC: 404.03 Number of Fisher Scoring iterations: 5

What other non-linear shapes do you know how to model in the general linear model? Simple monotone relationships with power transforms Quadratic, cubic, etc relationships with polynomial regression

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 ***

Linear Regression Residuals are gaussian Link function is identity Y = 1 * (b0 + b1X1 + … + bkXk) Logistic Regression Residuals are binomial Link function is logit (a transformation of the logistic function) Logistic Function  = eb0 + b1X1 + … + bkXk 1 + eb0 + b1X1 + … + bkXk = probability of Y = 1 e = 2.718 (approx)

You are likely familiar with logs using base 10 Logs and Exponentials You are likely familiar with logs using base 10 log10(10) [1] 1 log10(100) [1] 2 log10(1000) [1] 3 log10(1) [1] 0 log10(15) [1] 1.176091 log10(0) [1] –Inf log10(.1) [1] -1

Logs and Exponentials The natural log (often abbreviated ln) is similar but uses base e (approx 2.718) rather than base 10. log(2.718282) [1] 1 log(10) [1] 2.302585 log(1) [1] 0 log(0) [1] -Inf

Logs and Exponentials The inverse of the natural log is the exponential function: exp(). This function simply raises e to the power of X (whatever value you provide). exp(1) [1] 2.718282 exp(2) [1] 7.389056 exp(0) [1] 1 exp(-1) [1] 0.3678794 Logitstic regression uses natural logs and exponentials for the transformations of Y and Xs

 = e(b0 + b1X1 + … + bnXn) 1 + e(b0 + b1X1 + … + bnXn) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 ***  = e(-14.5 + 4.1*X1) 1 + e(-14.5 + 4.1*X1)

 = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = 0 b1 = 0

 = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = 0 b1 = 0 to 1

 = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = 0 b1 = -1 to 0

 = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = -5 to 5 b1 = 0

 = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = -5 to 5 b1 = 1

Odds =  / (1 - ) What are the odds of obtaining a head on a fair coin toss? What would the odds of obtaining a head be If I altered the coin to have a probability of heads = 0.67 = Odds/(Odds +1) What is the probability of an event that has an odds of 3? Odds can range from 0 - infinity Odds = 0.5 / (1-0.5) = 0.5/0.5 = 1 [1:1] Odds = 0.67 / (1-0.67) = 0.67/0.33 = 2 [2:1]  = 3 / (3+1) = .75

Odds =  / (1 - ) What are the approximate odds of getting into grad school with a GPA of 3.5? Odds = 0.5 / (1 - 0.5) = 1 [1:1]

Logistic Function (probability Y=1)  = e(b0 + b1X1 + … + bkXk) 1 + e(b0 + b1X1 + … + bkXk) 0 <  < 1 Convert  to Odds (Odds of Y=1) ( / 1-  ) = eb0 + b1X1 + … + bkXk 0 < Odds < INF

( / 1-  ) = e(b0 + b1X1 + … + bnXn) = e(-14.5 + 4.1*3.5) = e(-0.15) What are the predicted odds of getting into grad school with a GPA of 3.5? Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 *** ( / 1-  ) = e(b0 + b1X1 + … + bnXn) = e(-14.5 + 4.1*3.5) = e(-0.15) = 0.86 [0.86:1] NOTE: Probability of 0.50 occurs with Odds of 1

Logistic Function (probability Y=1)  = e(b0 + b1X1 + … + bkXk) 1 + e(b0 + b1X1 + … + bkXk) 0 <  < 1 Convert  to Odds (Odds of Y=1) ( / 1-  ) = e(b0 + b1X1 + … + bkXk) 0 < Odds < INF Convert Odds to Log-Odds (Logit function; log-odds of Y=1) ln( / 1-  ) = b0 + b1X1 + … + bkXk -INF < Logit < INF

NOTE: Odds of 1 occur when logit = 0 What are the log-odds (logit) of getting into grad school with a GPA of 3.5? Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 *** Logit = b0 + b1X1 + … + bkXk = b0 + b1*GPA = -14.5 + (4.1 * 3.5) = -0.15 NOTE: Odds of 1 occur when logit = 0

Log-odds are not very intuitive. However, they are a linear function of our regressors. GLM estimates the parameters in the logit function. Logit (Y) = -14.5 + 4.1 * GPA We transform the logit function to either odds or probability functions to convey relationships in meaningful units Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 ***

Logit (Y) Odds (Y) Probability(Y) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 *** Logit (Y) Odds (Y) Probability(Y)

Log-odds are linear function of Xs but these parameter estimates are not very descriptive. Logit (Admission) = -14.5 + 4.1*GPA Not clear how to interpret a parameter estimate of 4.1 for GPA The log-odds of admission increases by 4.1 units for every 1 point increase on GPA??

Odds and probability are more descriptive but they are not linear functions of the Xs so their parameter estimates arent very useful to describe effect of Xs. Cant make simple statement about unit change in Odds or Probability as a result of unit change in GPA ( / 1-  ) = e(-14.5 + 4.1*GPA)  = e(-14.5 + 4.1*GPA) 1 + e(-14.5 + 4.1*GPA)

This is where the Odds Ratio comes in! Odds are defined at a specific point for X Odds ratio = change in odds for a change in X of some magnitude c Odds ratio () = odds(X+c) odds(X) = e(b0 + b1(X+c)) e(b0 + b1(X)) = e(c*b1)

What is the odds ratio for a change in GPA of 1.0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 *** = e(c * b1) = e (1.0 * 4.1) = 60.3 The odds of getting into grad school increase by a factor of 60.3 for every 1 point increase in GPA This is the preferred descriptor for the effect of X. For a quantitative variable, choose a meaningful value for c (though often 1 is most appropriate)

Odds ratio () = e(c*b1) When b1 = 0, odds ratio =1 indicating no change in odds for change in X If b1 > 0, odds ratio is > 1 indicating a increase in odds with increasing X If b1 < 0, odds ratio is < 1 indicting a decrease in odds with increasing X Odds ratio is never negative

mGLM = glm(admit ~ gpa, data= d, family= binomial(logit)) summary(mGLM) Deviance Residuals: Min 1Q Median 3Q Max -2.0618 -0.8447 -0.3531 0.7644 2.3527 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -14.4968 1.5071 -9.619 <2e-16 *** gpa 4.1238 0.4335 9.513 <2e-16 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 543.58 on 399 degrees of freedom Residual deviance: 400.03 on 398 degrees of freedom AIC: 404.03 Number of Fisher Scoring iterations: 5

There are three common tests for a parameter in logistic regression 1. Z test: Reported by summary() in R z = bj / SEj 2. Wald test: Reported by SPSS Wald = bj2 / SEj2 Wald is asymptotically distributed as Chi-square with 1 df. Both 1 and 2 are not preferred as they have higher type 2 error rates than the third option 3. Likelihood ratio test: Reported by Anova() in R

Deviance is the maximum likelihood generalization of SSE from OLS The likelihood ratio test involves a comparison of two models deviances To test the effect of GPA, compare Deviances C: Intercept only (null model) A Model with GPA LR test = Model C – Model A Distributed as chi-square with df = df(A) – df(C)

Null deviance: 543.58 on 399 degrees of freedom Residual deviance: 400.03 on 398 degrees of freedom LR test = Model C – Model A = 543.58 – 400.03 = 143.55 with 1df pchisq(143.54,df=1, lower.tail=FALSE) [1] 4.478824e-33 Anova(mGLM, type=3) Analysis of Deviance Table (Type III tests) Response: admit LR Chisq Df Pr(>Chisq) gpa 143.54 1 < 2.2e-16 ***

Maximum likelihood estimation (MLE) The regression coefficients are estimated using maximum likelihood estimation It is iterative (Newton’s method) Model may fail to converge if: Ratio of predictors to infrequence cases (10 “events” per predictor) High multicollinearity Sparseness (cells with zero events). Bigger problem for categorical predictor Complete separation: Predictors perfectly predict the criterion MLE can yield biased parameters with small samples. Definitions of what is not small vary but > 200 is a reasonable minimum.

Model assumptions Exact X Independence Logistic function and logit correctly specify form of relationship (equivalent of the correct fit in linear regression). Logistic function is almost always correct for dichotomous data. You can examine the logit function directly to assess shape of relationship.

What about categorical variables? Handled exactly as in linear regression. Contrast codes, dummy codes. Issues of family-wise error rates apply as before for non-orthogonal and unplanned contrasts. Holm-bonferroni is available. Odds ratio is for contrast (assuming unit weighted)

str(d$rank2) int [1:400] 2 1 1 2 1 1 2 1 1 1 ... d$rank2 = factor(d$rank2) Factor w/ 2 levels "1","2": 2 1 1 2 1 1 2 1 1 1 ... contrasts(d$rank2) = varContrasts(d$rank2, Type='POC', POCList = list(c(1,-1))) POC1 1 0.5 2 -0.5

mRank2 = glm(admit~ rank2, data=d, family= binomial(logit)) summary(mRank2) Deviance Residuals: Min 1Q Median 3Q Max -1.1455 -1.1455 -0.9212 1.2096 1.4574 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.3567 0.1030 -3.464 0.000533 *** rank2POC1 0.5623 0.2059 2.730 0.006326 ** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 543.58 on 399 degrees of freedom Residual deviance: 536.03 on 398 degrees of freedom AIC: 540.03 Number of Fisher Scoring iterations: 4

Anova(mRank2, type=3) Analysis of Deviance Table (Type III tests) Response: admit LR Chisq Df Pr(>Chisq) rank2 7.5509 1 0.005998 ** --- Odds1 = exp(mRank2$coefficients[1] + mRank2$coefficients[2]* 0.5) 0.9272727 Odds2 = exp(mRank2$coefficients[1] + mRank2$coefficients[2]* -0.5) 0.5284553

dNew = data.frame(rank2 = factor(c('1','2'))) p = predict(mRank2, dNew, type='response') p 1 2 0.4811321 0.3457447 p[1] / (1-p[1]) #odds for rank = 1 1 0.9272727 p[2] / (1-p[2]) #odds for rank = 2 2 0.5284553 (p[1] / (1-p[1])) / (p[2] / (1-p[2])) #odds ratio 1.754685 exp(mRank2$coefficients[2]) #c = 1 rank2POC1

What about multiple predictors? What about interactions? Nothing new!

d$cGPA = d$gpa - mean(d$gpa) d$rank2 = factor(d$rank2) contrasts(d$rank2) = varContrasts(d$rank2, Type='POC', POCList = list(c(1,-1))) POC1 1 0.5 2 -0.5 m = glm(admit~ cGPA*rank2, data=d, family = binomial(logit))

summary(m) Deviance Residuals: Min 1Q Median 3Q Max -2.0136 -0.7829 -0.2957 0.7243 2.7099 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.6096 0.1378 -4.422 9.76e-06 *** cGPA 4.3658 0.4619 9.452 < 2e-16 *** rank2POC1 0.9551 0.2757 3.465 0.000531 *** cGPA:rank2POC1 -0.9738 0.9238 -1.054 0.291814 --- (Dispersion parameter for binomial family taken to be 1) Null deviance: 543.58 on 399 degrees of freedom Residual deviance: 387.13 on 396 degrees of freedom AIC: 395.13 Number of Fisher Scoring iterations: 5

Anova(m, type=3) #likelihood ratio test Analysis of Deviance Table (Type III tests) Response: admit LR Chisq Df Pr(>Chisq) cGPA 148.504 1 < 2.2e-16 *** rank2 12.902 1 0.0003283 *** cGPA:rank2 1.134 1 0.2868907 --- exp(m$coefficients[2]) #odds ratio for 2nd coefficient cGPA 78.71607 exp(m$coefficients[3]) #odds ratio for 3rd coefficient (rank2) rank2POC1 2.59898

We analyzed admission (yes vs We analyzed admission (yes vs. no) in a generalized linear model (GLM) that included GPA, School rank (High rank vs. Low rank) and their interaction as regressors. We used the binomial family with the logit link function for the GLM because the dependent variable, admission, was dichotomous. GPA was mean centered and School rank was coded using centered, unit weighted contrast codes to represent the contrast of High vs. Low rank. We report the raw parameter estimates from the GLM and the odds ratio to quantify effect size of significant effects. The effect of GPA was significant, b= 4.37, SE= 0.46, 2(1) = 148.50, p< .001, which indicates that the odds of admission increase by a factor of 78.7 for every one point increase in GPA. The effect of School rank was significant, b= 0.96, SE= 0.28, 2(1) = 12.90, p< .001, which indicates that the odds of admission increase by a factor of 2.6 for students at high relative to low ranked undergraduate institutions. The interaction between GPA and School rank was not significant, b= -0.97, SE= 0.92, 2(1) = 12.90, p= .282. See figure 1 for display of the probability of admission as a function of GPA and school rank.

What about multi-level models? glmer(formula, data, family = gaussian, start = NULL, verbose = FALSE, nAGQ = 1, doFit = TRUE, subset, weights, na.action, offset, contrasts = NULL, model = TRUE, control = list(), ...)