Lecture 7 GLMs II Binomial Family Olivier MISSA, Advanced Research Skills.

Slides:



Advertisements
Similar presentations
Lecture 4 Linear Models III Olivier MISSA, Advanced Research Skills.
Advertisements

Lecture 2 Linear Models I Olivier MISSA, Advanced Research Skills.
Logistic Regression Example: Horseshoe Crab Data
Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia
Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Generalized Linear Models (GLM)
Logistic Regression.
Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Tobacco budworm Heliothis virescens The tobacco budworm, Heliothis virescens (Fabricius), is a native species and is found throughout the eastern and southwestern.
Gl
Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Genetic Association and Generalised Linear Models Gil McVean, WTCHG Weds 2 nd November 2011.
1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.
Lecture 8 Generalized Additive Models Olivier MISSA, Advanced Research Skills.
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
MATH 3359 Introduction to Mathematical Modeling Download/Import/Modify Data, Logistic Regression.
Logistic Regression and Generalized Linear Models:
BIOL 582 Lecture Set 18 Analysis of frequency and categorical data Part III: Tests of Independence (cont.) Odds Ratios Loglinear Models Logistic Models.
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
New Ways of Looking at Binary Data Fitting in R Yoon G Kim, Colloquium Talk.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
MATH 3359 Introduction to Mathematical Modeling Project Multiple Linear Regression Multiple Logistic Regression.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Lecture 5 Linear Mixed Effects Models
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Linear Modelling III Richard Mott Wellcome Trust Centre for Human Genetics.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Lecture Slide #1 Logistic Regression Analysis Estimation and Interpretation Hypothesis Tests Interpretation Reversing Logits: Probabilities –Averages.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店.
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1 Stats 330: Lecture 23.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Chapter 17.1 Poisson Regression Classic Poisson Example Number of deaths by horse kick, for each of 16 corps in the Prussian army, from 1875 to 1894.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
04/19/2006Econ 6161 Econ 616 – Spring 2006 Qualitative Response Regression Models Presented by Yan Hu.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Logistic Regression and Odds Ratios Psych DeShon.
R by Example CNR SBCS workshop Gary Casterline. Experience and Applications Name and Dept/Lab Experience with Statistics / Software Your research and.
> xyplot(cat~time|id,dd,groups=ses,lty=1:3,type="l") > dd head(dd) id ses cat pre post time
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.
Transforming the data Modified from:
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
CHAPTER 7 Linear Correlation & Regression Methods
SAME THING?.
PSY 626: Bayesian Statistics for Psychological Science
Logistic Regression with “Grouped” Data
Presentation transcript:

Lecture 7 GLMs II Binomial Family Olivier MISSA, Advanced Research Skills

2 Outline Continue our Introduction to Generalized Linear Models. In this lecture: Illustrate the use of GLMs for proportion and binary data.

3 Binary & Proportion data tend to follow the Binomial distribution The Canonical link of this glm family is the logit function: The variance reaches a maximum for intermediate values of p and a minimum at either 0% or 100%. Reminder

4 In R, binary/proportion data can be entered into a model as a response in three different ways: as a numeric vector (holding the number or proportion of successes) as a logical vector or a factor (TRUE or the first factor level will be considered successes). as a two-column matrix (the first column holding the number of successes and the second column the number of failures). Three ways to work with binary data

5 Toxicity to tobacco budworm (moth) of different doses of trans-cypermethrin. Batches of 20 moths (of each sex) were put in contact for three days with increasing doses of the pyrethroid. 1 st Example Dose (micrograms) Sex Male Female Number of dead moths out of 20 tested > (dose <- rep(2^(0:5), 2)) [1] > numdead <- c(1,4,9,13,18,20,0,2,6,10,12,16) > (sex <- factor( rep( c("M","F"), c(6,6) ) )) [1] M M M M M M F F F F F F Levels: F M > SF <- cbind(numdead, numalive=20-numdead)

6 1 st Example > modb <- glm(SF ~ sex*dose, family=binomial) > summary(modb) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-07 *** sexM dose e-06 *** sexM:dose ** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 11 degrees of freedom Residual deviance: on 8 degrees of freedom AIC: What is modelled is the proportion of successes

7 1 st Example > ldose <- log2(dose) > modb2 <- glm(SF ~ sex*ldose, family=binomial) > summary(modb2) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-08 *** sexM ldose e-08 *** sexM:ldose (Dispersion parameter for binomial family taken to be 1) Null deviance: on 11 degrees of freedom Residual deviance: on 8 degrees of freedom AIC:

8 1 st Example > drop1(modb2, test="Chisq") Single term deletions Model: SF ~ sex * ldose Df Deviance AIC LRT Pr(Chi) sex:ldose > modb3 <- update(modb2, ~. – sex:ldose) > summary(modb3) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-13 *** sexM ** ldose e-16 *** --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 11 degrees of freedom Residual deviance: on 9 degrees of freedom AIC:

9 1 st Example > drop1(modb3, test="Chisq") Single term deletions Model: SF ~ sex + ldose Df Deviance AIC LRT Pr(Chi) sex ** ldose < 2.2e-16 *** > shapiro.test(residuals(modb3), type="deviance") Shapiro-Wilk normality test data: residuals(modb3, type = "deviance") W = , p-value =

10 1 st Example > par(mfrow=c(2,2)) > plot(modb3)

11 1 st Example > plot( c(0,1) ~ c(1,32), type="n", log="x", xlab="dose", ylab="Probability") > text(dose, numdead/20, labels=as.character(sex) ) > ld <- seq(0,32,0.5) > lines (ld, predict(modb3, data.frame(ldose=log2(ld), sex=factor(rep("M", length(ld)), levels=levels(sex))), type="response") ) > lines (ld, predict(modb3, data.frame(ldose=log2(ld), sex=factor(rep("F", length(ld)), levels=levels(sex))), type="response"), lty=2, col="red" )

12 1 st Example > modbp <- glm(SF ~ sex*ldose, family=binomial(link="probit")) > AIC(modbp) [1] > modbc <- glm(SF ~ sex*ldose, family=binomial(link="cloglog")) > AIC(modbc) [1] > AIC(modb3) [1]

13 > summary(modb3) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) e-13 *** sexM ** ldose e-16 *** --- > exp(modb3$coeff) ## careful it may be misleading (Intercept) sexM ldose ## odds ration: p / (1-p) > exp(modb3$coeff[1]+modb3$coeff[2]) ## odds for males (Intercept) st Example logit scale Every doubling of the dose will lead to an increase in the odds of dying over surviving by a factor of 2.899

14 Erythrocyte Sedimentation Rate in a group of patients. Two groups : 20 (ill) mm/hour Q: Is it related to globulin & fibrinogen level in the blood ? 2 nd Example > data("plasma", package="HSAUR") > str(plasma) 'data.frame': 32 obs. of 3 variables: $ fibrinogen: num $ globulin : int $ ESR : Factor w/ 2 levels "ESR 20": > summary(plasma) fibrinogen globulin ESR Min. :2.090 Min. :28.00 ESR < 20:26 1st Qu.: st Qu.:31.75 ESR > 20: 6 Median :2.600 Median :36.00 Mean :2.789 Mean : rd Qu.: rd Qu.:38.00 Max. :5.060 Max. :46.00

15 2 nd Example > stripchart(globulin ~ ESR, vertical=T, data=plasma, xlab="Erythrocyte Sedimentation Rate (mm/hr)", ylab="Globulin blood level", method="jitter" ) > stripchart(fibrinogen ~ ESR, vertical=T, data=plasma, xlab="Erythrocyte Sedimentation Rate (mm/hr)", ylab="Fibrinogen blood level", method="jitter" )

16 2 nd Example > mod1 <- glm(ESR~fibrinogen, data=plasma, family=binomial) > summary(mod1) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * fibrinogen * --- (Dispersion parameter for binomial family taken to be 1) Null deviance: on 31 degrees of freedom Residual deviance: on 30 degrees of freedom AIC: > mod2 <- glm(ESR~fibrinogen+globulin, data=plasma, family=binomial) > AIC(mod2) [1] factor

17 2 nd Example > anova(mod1, mod2, test="Chisq") Analysis of Deviance Table Model 1: ESR ~ fibrinogen Model 2: ESR ~ fibrinogen + globulin Resid. Df Resid. Dev Df Deviance P(>|Chi|) > summary(mod2) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) * fibrinogen * globulin (Dispersion parameter for binomial family taken to be 1) Null deviance: on 31 degrees of freedom Residual deviance: on 29 degrees of freedom AIC: The difference in terms of Deviance between these models is not significant, which leads us to select the least complex model

18 2 nd Example > shapiro.test(residuals(mod1, type="deviance")) Shapiro-Wilk normality test data: residuals(mod1, type = "deviance") W = , p-value = 5.465e-07 > par(mfrow=c(2,2)) > plot(mod1)