Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 32: The Generalized Linear Model

Similar presentations


Presentation on theme: "Unit 32: The Generalized Linear Model"— Presentation transcript:

1 Unit 32: The Generalized Linear Model

2 What are the assumptions of general linear models?
What are the consequences of violating each of these assumptions? What options exist when these assumptions are violated?

3 The general linear model make the 5 assumptions below
The general linear model make the 5 assumptions below. When these assumptions are met, OLS regression coefficients are MVUE (Minimum Variance Unbiased Estimators) and BLUE (Best Linear Unbiased Estimators). 1. Exact X: The IVs are assumed to be known exactly (i.e., without measurement error) 2. Independence: Residuals are independently distributed (prob. of obtaining a specific observation does not depend on other observations) 3. Normality: All residual distributions are normally distributed 4. Constant variance: All residual distributions have a constant variance, SEE2 5. Linearity: All residual distributions (i.e., for each Y') are assumed to have means equal to zero

4 Problems and Solutions
Exact X: Biased parameters (to the degree that measurement error exists). Use reliable measures Independence: Inaccurate standard errors, degrees of freedom and significance tests. Use repeated measures or linear mixed effects models or ANCOVA Normality: Inefficient (with large N). Use power transformations, generalized linear models Constant variance: Inefficient and inaccurate standard errors. Use power transformations, SE corrections, weighted least squares , generalized linear models Linearity: Biased parameter estimates. Use power transformations, polynomial regression, generalized linear models

5 glm(formula, family=familytype(link=linkfunction), data=)
Default Link Function binomial (link = "logit") gaussian (link = "identity") Gamma (link = "inverse") inverse.gaussian (link = "1/mu^2") poisson (link = "log") quasi (link = "identity", variance = "constant") quasibinomial quasipoisson

6 An Example Predicting admission to grad program in engineering based on quantitative GRE, GPA, and Undergraduate Institution Rank n mean sd min max admit gre gpa rank rank

7 Describe the effect of GPA on admission to grad school
mLM = lm(admit ~ gpa, data=d) summary(mLM) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** gpa <2e-16 *** --- Residual standard error: on 398 degrees of freedom Multiple R-squared: , Adjusted R-squared: 0.298 F-statistic: on 1 and 398 DF, p-value: < 2.2e-16 Describe the effect of GPA on admission to grad school

8 What are the problems with using a general linear model to assess the effects of these predictors on admission outcomes? Residuals will not be normal (not efficient) Residual variance often will not be constant (not efficient, SE are inaccurate) Relationship will not be linear (parameter estimates biased) Y is not constrained between 0 – 1 (model may make nonsensical predictions

9 plot(d$gpa,d$admit, type='p', pch=20)

10 plot(d$gre,jitter(d$admit,1), type='p', pch=20)

11 abline(mLM)

12

13 ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: Level of Significance = 0.05 Call: gvlma(x = model) Value p-value Global Stat e-08 Skewness e-02 Kurtosis e-04 Link Function e-06 Heteroscedasticity e-01 Decision Global Stat Assumptions NOT satisfied! Skewness Assumptions NOT satisfied! Kurtosis Assumptions NOT satisfied! Link Function Assumptions NOT satisfied! Heteroscedasticity Assumptions acceptable.

14 modelAssumptions(mLM,'normal')

15

16 modelAssumptions(mLM,'constant')

17 modelAssumptions(mLM,linear')

18 Estimate Std. Error z value Pr(>|z|)
mGLM = glm(admit ~ gpa, data = d, family = binomial(logit)) summary(mGLM) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 399 degrees of freedom Residual deviance: on 398 degrees of freedom AIC: Number of Fisher Scoring iterations: 5

19

20

21 What other non-linear shapes do you know how to model in the general linear model?
Simple monotone relationships with power transforms Quadratic, cubic, etc relationships with polynomial regression

22

23

24

25 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 ***

26 Linear Regression Residuals are gaussian Link function is identity Y = 1 * (b0 + b1X1 + … + bkXk) Logistic Regression Residuals are binomial Link function is logit (a transformation of the logistic function) Logistic Function  = eb0 + b1X1 + … + bkXk 1 + eb0 + b1X1 + … + bkXk = probability of Y = 1 e = (approx)

27 You are likely familiar with logs using base 10
Logs and Exponentials You are likely familiar with logs using base 10 log10(10) [1] 1 log10(100) [1] 2 log10(1000) [1] 3 log10(1) [1] 0 log10(15) [1] log10(0) [1] –Inf log10(.1) [1] -1

28 Logs and Exponentials The natural log (often abbreviated ln) is similar but uses base e (approx 2.718) rather than base 10. log( ) [1] 1 log(10) [1] log(1) [1] 0 log(0) [1] -Inf

29 Logs and Exponentials The inverse of the natural log is the exponential function: exp(). This function simply raises e to the power of X (whatever value you provide). exp(1) [1] exp(2) [1] exp(0) [1] 1 exp(-1) [1] Logitstic regression uses natural logs and exponentials for the transformations of Y and Xs

30  = e(b0 + b1X1 + … + bnXn) 1 + e(b0 + b1X1 + … + bnXn)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 ***  = e( *X1) 1 + e( *X1)

31  = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = 0 b1 = 0

32  = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = 0 b1 = 0 to 1

33  = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = 0 b1 = -1 to 0

34  = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = -5 to 5 b1 = 0

35  = e(b0 + b1X1) 1 + e(b0 + b1X1) b0 = -5 to 5 b1 = 1

36 Odds =  / (1 - ) What are the odds of obtaining a head on a fair coin toss? What would the odds of obtaining a head be If I altered the coin to have a probability of heads = 0.67 = Odds/(Odds +1) What is the probability of an event that has an odds of 3? Odds can range from 0 - infinity Odds = 0.5 / (1-0.5) = 0.5/0.5 = 1 [1:1] Odds = 0.67 / (1-0.67) = 0.67/0.33 = 2 [2:1]  = 3 / (3+1) = .75

37 Odds =  / (1 - ) What are the approximate odds of getting into grad school with a GPA of 3.5? Odds = 0.5 / ( ) = 1 [1:1]

38 Logistic Function (probability Y=1)
 = e(b0 + b1X1 + … + bkXk) 1 + e(b0 + b1X1 + … + bkXk) 0 <  < 1 Convert  to Odds (Odds of Y=1) ( / 1-  ) = eb0 + b1X1 + … + bkXk 0 < Odds < INF

39 ( / 1-  ) = e(b0 + b1X1 + … + bnXn) = e(-14.5 + 4.1*3.5) = e(-0.15)
What are the predicted odds of getting into grad school with a GPA of 3.5? Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 *** ( / 1-  ) = e(b0 + b1X1 + … + bnXn) = e( *3.5) = e(-0.15) = 0.86 [0.86:1] NOTE: Probability of 0.50 occurs with Odds of 1

40 Logistic Function (probability Y=1)
 = e(b0 + b1X1 + … + bkXk) 1 + e(b0 + b1X1 + … + bkXk) 0 <  < 1 Convert  to Odds (Odds of Y=1) ( / 1-  ) = e(b0 + b1X1 + … + bkXk) 0 < Odds < INF Convert Odds to Log-Odds (Logit function; log-odds of Y=1) ln( / 1-  ) = b0 + b1X1 + … + bkXk -INF < Logit < INF

41 NOTE: Odds of 1 occur when logit = 0
What are the log-odds (logit) of getting into grad school with a GPA of 3.5? Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 *** Logit = b0 + b1X1 + … + bkXk = b0 + b1*GPA = (4.1 * 3.5) = -0.15 NOTE: Odds of 1 occur when logit = 0

42 Log-odds are not very intuitive.
However, they are a linear function of our regressors. GLM estimates the parameters in the logit function. Logit (Y) = * GPA We transform the logit function to either odds or probability functions to convey relationships in meaningful units Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 ***

43 Logit (Y) Odds (Y) Probability(Y)
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 *** Logit (Y) Odds (Y) Probability(Y)

44 Log-odds are linear function of Xs but these parameter estimates are not very descriptive.
Logit (Admission) = *GPA Not clear how to interpret a parameter estimate of 4.1 for GPA The log-odds of admission increases by 4.1 units for every 1 point increase on GPA??

45 Odds and probability are more descriptive but they are not linear functions of the Xs so their parameter estimates arent very useful to describe effect of Xs. Cant make simple statement about unit change in Odds or Probability as a result of unit change in GPA ( / 1-  ) = e( *GPA)  = e( *GPA) 1 + e( *GPA)

46 This is where the Odds Ratio comes in!
Odds are defined at a specific point for X Odds ratio = change in odds for a change in X of some magnitude c Odds ratio () = odds(X+c) odds(X) = e(b0 + b1(X+c)) e(b0 + b1(X)) = e(c*b1)

47 What is the odds ratio for a change in GPA of 1.0
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 *** = e(c * b1) = e (1.0 * 4.1) = 60.3 The odds of getting into grad school increase by a factor of 60.3 for every 1 point increase in GPA This is the preferred descriptor for the effect of X. For a quantitative variable, choose a meaningful value for c (though often 1 is most appropriate)

48 Odds ratio () = e(c*b1) When b1 = 0, odds ratio =1 indicating no change in odds for change in X If b1 > 0, odds ratio is > 1 indicating a increase in odds with increasing X If b1 < 0, odds ratio is < 1 indicting a decrease in odds with increasing X Odds ratio is never negative

49 mGLM = glm(admit ~ gpa, data= d, family= binomial(logit))
summary(mGLM) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** gpa <2e-16 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 399 degrees of freedom Residual deviance: on 398 degrees of freedom AIC: Number of Fisher Scoring iterations: 5

50 There are three common tests for a parameter in logistic regression
1. Z test: Reported by summary() in R z = bj / SEj 2. Wald test: Reported by SPSS Wald = bj2 / SEj2 Wald is asymptotically distributed as Chi-square with 1 df. Both 1 and 2 are not preferred as they have higher type 2 error rates than the third option 3. Likelihood ratio test: Reported by Anova() in R

51 Deviance is the maximum likelihood generalization of SSE from OLS
The likelihood ratio test involves a comparison of two models deviances To test the effect of GPA, compare Deviances C: Intercept only (null model) A Model with GPA LR test = Model C – Model A Distributed as chi-square with df = df(A) – df(C)

52 Null deviance: 543.58 on 399 degrees of freedom
Residual deviance: on 398 degrees of freedom LR test = Model C – Model A = – = with 1df pchisq(143.54,df=1, lower.tail=FALSE) [1] e-33 Anova(mGLM, type=3) Analysis of Deviance Table (Type III tests) Response: admit LR Chisq Df Pr(>Chisq) gpa < 2.2e-16 ***

53 Maximum likelihood estimation (MLE)
The regression coefficients are estimated using maximum likelihood estimation It is iterative (Newton’s method) Model may fail to converge if: Ratio of predictors to infrequence cases (10 “events” per predictor) High multicollinearity Sparseness (cells with zero events). Bigger problem for categorical predictor Complete separation: Predictors perfectly predict the criterion MLE can yield biased parameters with small samples. Definitions of what is not small vary but > 200 is a reasonable minimum.

54 Model assumptions Exact X Independence Logistic function and logit correctly specify form of relationship (equivalent of the correct fit in linear regression). Logistic function is almost always correct for dichotomous data. You can examine the logit function directly to assess shape of relationship.

55 What about categorical variables?
Handled exactly as in linear regression. Contrast codes, dummy codes. Issues of family-wise error rates apply as before for non-orthogonal and unplanned contrasts. Holm-bonferroni is available. Odds ratio is for contrast (assuming unit weighted)

56 str(d$rank2) int [1:400] d$rank2 = factor(d$rank2) Factor w/ 2 levels "1","2": contrasts(d$rank2) = varContrasts(d$rank2, Type='POC', POCList = list(c(1,-1))) POC1 1 0.5 2 -0.5

57 mRank2 = glm(admit~ rank2, data=d,
family= binomial(logit)) summary(mRank2) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) *** rank2POC ** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 399 degrees of freedom Residual deviance: on 398 degrees of freedom AIC: Number of Fisher Scoring iterations: 4

58 Anova(mRank2, type=3) Analysis of Deviance Table (Type III tests) Response: admit LR Chisq Df Pr(>Chisq) rank ** --- Odds1 = exp(mRank2$coefficients[1] + mRank2$coefficients[2]* 0.5) Odds2 = exp(mRank2$coefficients[1] + mRank2$coefficients[2]* -0.5)

59 dNew = data.frame(rank2 = factor(c('1','2')))
p = predict(mRank2, dNew, type='response') p p[1] / (1-p[1]) #odds for rank = 1 1 p[2] / (1-p[2]) #odds for rank = 2 2 (p[1] / (1-p[1])) / (p[2] / (1-p[2])) #odds ratio exp(mRank2$coefficients[2]) #c = 1 rank2POC1

60 What about multiple predictors?
What about interactions? Nothing new!

61 d$cGPA = d$gpa - mean(d$gpa)
d$rank2 = factor(d$rank2) contrasts(d$rank2) = varContrasts(d$rank2, Type='POC', POCList = list(c(1,-1))) POC1 1 0.5 2 -0.5 m = glm(admit~ cGPA*rank2, data=d, family = binomial(logit))

62 summary(m) Deviance Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-06 *** cGPA < 2e-16 *** rank2POC *** cGPA:rank2POC --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 399 degrees of freedom Residual deviance: on 396 degrees of freedom AIC: Number of Fisher Scoring iterations: 5

63 Anova(m, type=3) #likelihood ratio test
Analysis of Deviance Table (Type III tests) Response: admit LR Chisq Df Pr(>Chisq) cGPA < 2.2e-16 *** rank *** cGPA:rank --- exp(m$coefficients[2]) #odds ratio for 2nd coefficient cGPA exp(m$coefficients[3]) #odds ratio for 3rd coefficient (rank2) rank2POC1

64

65 We analyzed admission (yes vs
We analyzed admission (yes vs. no) in a generalized linear model (GLM) that included GPA, School rank (High rank vs. Low rank) and their interaction as regressors. We used the binomial family with the logit link function for the GLM because the dependent variable, admission, was dichotomous. GPA was mean centered and School rank was coded using centered, unit weighted contrast codes to represent the contrast of High vs. Low rank. We report the raw parameter estimates from the GLM and the odds ratio to quantify effect size of significant effects. The effect of GPA was significant, b= 4.37, SE= 0.46, 2(1) = , p< .001, which indicates that the odds of admission increase by a factor of 78.7 for every one point increase in GPA. The effect of School rank was significant, b= 0.96, SE= 0.28, 2(1) = 12.90, p< .001, which indicates that the odds of admission increase by a factor of 2.6 for students at high relative to low ranked undergraduate institutions. The interaction between GPA and School rank was not significant, b= -0.97, SE= 0.92, 2(1) = 12.90, p= See figure 1 for display of the probability of admission as a function of GPA and school rank.

66 What about multi-level models?
glmer(formula, data, family = gaussian, start = NULL, verbose = FALSE, nAGQ = 1, doFit = TRUE, subset, weights, na.action, offset, contrasts = NULL, model = TRUE, control = list(), ...)


Download ppt "Unit 32: The Generalized Linear Model"

Similar presentations


Ads by Google