Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

Logistic Regression Example: Horseshoe Crab Data
Logistic Regression.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Chapter 12 Multiple Regression
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Ch. 14: The Multiple Regression Model building
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression and Generalized Linear Models:
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Lecture 5: SLR Diagnostics (Continued) Correlation Introduction to Multiple Linear Regression BMTRY 701 Biostatistical Methods II.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
Lecture 4: Inference in SLR (continued) Diagnostic approaches in SLR BMTRY 701 Biostatistical Methods II.
Chapter 14 Introduction to Multiple Regression
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Logistic regression. Analysis of proportion data We know how many times an event occurred, and how many times did not occur. We want to know if these.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 12: Cox Proportional Hazards Model
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Logistic Regression and Odds Ratios Psych DeShon.
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Stats Methods at IC Lecture 3: Regression.
Transforming the data Modified from:
Logistic regression.
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
Chapter 12 Simple Linear Regression and Correlation
CHAPTER 7 Linear Correlation & Regression Methods
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Log Linear Modeling of Independence
CHAPTER 29: Multiple Regression*
Chapter 12 Simple Linear Regression and Correlation
SAME THING?.
When You See (This), You Think (That)
Logistic Regression with “Grouped” Data
Introductory Statistics
Presentation transcript:

Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II

Poisson distribution  Used for count data  generally, rare events  in space or time  upper limit is theoretically infinite  Examples: earthquakes, hurricanes cancer incidence (spatial) absences in school year AIDS deaths in a region  Assessing disease in different groups: Probability, Risk, Rate, Incidence, Prevalence

The Poisson distribution  Probability mass function  Approximates a binomial for rare event  Notice it has only ONE parameter: λ  Mean = variance = λ

Simple poisson distribution example  The infection rate at a Neonatal Intensive Care Unit (NICU) is typically expressed as a number of infections per patient days. This is obviously counting a number of events across both time and patients.  assume that the probability of getting an infection over a short time period is proportional to the length of the time period. In other words, a patient who stays one hour in the NICU has twice the risk of a single infection as a patient who stays 30 minutes.  assume that for a small enough interval, the probability of getting two infections is negligible.  assume that the probability of infection does not change over time or over infants.  assume independence. The probability of seeing an infection in one child does not increase or decrease the probability of seeing an infection in another child. If an infant gets an infection during one time interval, it doesn't change the probability that he or she will get another infection during a later time interval.

Poisson regression  Based on the idea that the log of probability of disease is a linear function of risk factors  The rate ratio (“relative risk”) is modeled  Interpretation of slope:

Implementation  r i is the rate  Often we observe a number of events a geographic region, time, or number of person-years  Need to account for these differences rates based on smaller “exposure” are less precise adjustment is made

Implementation  Unless there is uniform time, space, etc., the following is generally implemented: “OFFSET”

Offset term  Notice: NO COEFFICIENT on offset  Adjusts for population size or space  Example: breast cancer incidence per county in south carolina cases are the number of women (& men) diagnosed within in a county in SC in one year. the offset would be the population size in the county in the year (probably estimated)

Caveat  Standard poisson regression relies on poisson assumption about the variance  If events tend to occur in clusters, than there is “overdispersion”  This leads to a more general form of model: log- linear model (later)

Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004).  Objective: To determine whether a multi-facted systems intervention would eliminate catheter-related bloodstream infections (CR-BSIs)  Design: prospective cohort in surgical ICU at JHU including all patients with central venous catheter in ICU.  Two ICUs  Interventions: educating staff creating catheter insertion cart asking providers daily if catheters could be removed implementing checklist to ensure adherence to guidelines empowering nurses to stop catheter insertion if violation of guidelines was observed.

Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004).  Analysis Poisson regression Outcome is rate of CR-BSIs Data structure  number of infections per quarter in ICU  number of catheter days (counting every patient who has catheter at 12am each day). Patients each counted only once  indicator of control vs. intervention ICU Intervention not implemented until 1 st quarter 1999.

Dataset. list | quarter ncase cathdays rate dataset quartern | | | 1. | Qtr | 2. | Qtr | 3. | Qtr | 4. | Qtr | 5. | Qtr | | | 6. | Qtr | 7. | Qtr | 8. | Qtr | 9. | Qtr | 10. | Qtr | | | 11. | Qtr | 12. | Qtr | 13. | Qtr | 14. | Qtr | 15. | Qtr | | | 16. | Qtr |

Observed Data

R code data <- read.csv("csicu7.csv") plot(data$quartern, data$rate, xlab="Quarter", ylab="Rate of Infection per 1000 catheter days", pch=16) points(data$quartern[data$dataset==1], data$rate[data$dataset==1], pch=16, col=2) lines(data$quartern[data$dataset==0], data$rate[data$dataset==0], col=1) lines(data$quartern[data$dataset==1], data$rate[data$dataset==1], col=2) legend(12,22, c("Intervention ICU","Control ICU"), col=c(1,2), pch=c(16,16)) abline(v=5, lty=3)

Estimating the Poisson regression  Want to model change in rates  However, the first 4 quarters there was no intervention.  Based on the observed data and on the data structure, what model is appropriate?

Poisson regression model What is the model for IV=0 and quarter<5? IV=0 and quarter ≥5? IV=1 and quarter<5? IV=1 and quarter ≥5?

R code ncase <- data$ncase cathdays <- data$cathdays control <- data$dataset intervention <- 1- control quartern <- data$quartern # create knot for spline model k1 5,quartern-5,0) # FIT MODEL WITH INTERACTIONS WITH TIME FOR BOTH GROUPS reg <- glm(ncase~intervention*quartern+ intervention*k1, family=poisson, offset=log(cathdays)) summary(reg)

Results Call: glm(formula = ncase ~ intervention * quartern + intervention * k1, family = poisson, offset = log(cathdays)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** intervention quartern k intervention:quartern intervention:k Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: on 39 degrees of freedom Residual deviance: on 34 degrees of freedom AIC:

Fitted model, rate scale

R code fit.early.0 <- b[1] + b[3]*seq(1,5,1) fit.late.0 <- (b[1]-b[4]*5) + (b[3]+b[4])*seq(5,20,1) fit.early.1 <- (b[1]+b[2]) + (b[3]+b[5])*seq(1,5,1) fit.late.1 <- (b[1]+b[2]-b[4]*5-b[6]*5) + (b[3]+b[4]+b[5]+b[6])*seq(5,20,1) fit.early.0 rate.early.0 <- exp(fit.early.0)*1000 rate.early.0 rate.early.1 <- exp(fit.early.1)*1000 rate.late.0 <- exp(fit.late.0)*1000 rate.late.1 <- exp(fit.late.1)*1000 # add lines to plot for fitted control ICU lines(seq(1,5,1), rate.early.0, col=2) lines(seq(5,20,1), rate.late.0, col=2) # add lines to plot for fitted intervention ICU lines(seq(1,5,1), rate.early.1, col=1) lines(seq(5,20,1), rate.late.1, col=1)

Fitted model, linear predictor scale

Real question  Is the change in infection rates different in the two ICUs?  That is, are the slopes after Q5 different?  How to test that: slope in control ICU: β 3 + β 4 slope in intervention ICU : β 3 + β 4 + β 5 + β 6  What is the hypothesis test?

Linear Combination of Coefficients > estimable(reg, c(0,0,0,0,1,1)) Estimate Std. Error X^2 value DF Pr(>|X^2|) ( )

Example: Breast Cancer Incidence in SC  Cunningham et al.  Hypothesize that there are differences in subtypes of breast cancer by race ER + vs. ER- Grades 1, 2, 3 Stage 1, 2, 3, 4  Incidence of breast cancer varies by age  Data: Tumor registry data for SC (and Ohio) Census data for SC

Poisson modeling  Rate of incidence per cancer type  Modeled as a function of ER, grade and race > summary(reg1) Call: glm(formula = nc ~ age + age2 + age3 + bl + er + gr + age * bl + age2 * bl + age3 * bl + age * er + age2 * er + age3 * er + age * gr + age2 * gr + age3 * gr + bl * er + bl * gr + er * gr, family = poisson, offset = log(9 * popn))

Results

Confidence Intervals

Incidence Ratio for AA vs. EA