Lecture 21: poisson regression log-linear regression BMTRY 701 Biostatistical Methods II
Poisson distribution Used for count data generally, rare events in space or time upper limit is theoretically infinite Examples: earthquakes, hurricanes cancer incidence (spatial) absences in school year AIDS deaths in a region Assessing disease in different groups: Probability, Risk, Rate, Incidence, Prevalence
The Poisson distribution Probability mass function Approximates a binomial for rare event Notice it has only ONE parameter: λ Mean = variance = λ
Simple poisson distribution example The infection rate at a Neonatal Intensive Care Unit (NICU) is typically expressed as a number of infections per patient days. This is obviously counting a number of events across both time and patients. assume that the probability of getting an infection over a short time period is proportional to the length of the time period. In other words, a patient who stays one hour in the NICU has twice the risk of a single infection as a patient who stays 30 minutes. assume that for a small enough interval, the probability of getting two infections is negligible. assume that the probability of infection does not change over time or over infants. assume independence. The probability of seeing an infection in one child does not increase or decrease the probability of seeing an infection in another child. If an infant gets an infection during one time interval, it doesn't change the probability that he or she will get another infection during a later time interval.
Poisson regression Based on the idea that the log of probability of disease is a linear function of risk factors The rate ratio (“relative risk”) is modeled Interpretation of slope:
Implementation r i is the rate Often we observe a number of events a geographic region, time, or number of person-years Need to account for these differences rates based on smaller “exposure” are less precise adjustment is made
Implementation Unless there is uniform time, space, etc., the following is generally implemented: “OFFSET”
Offset term Notice: NO COEFFICIENT on offset Adjusts for population size or space Example: breast cancer incidence per county in south carolina cases are the number of women (& men) diagnosed within in a county in SC in one year. the offset would be the population size in the county in the year (probably estimated)
Caveat Standard poisson regression relies on poisson assumption about the variance If events tend to occur in clusters, than there is “overdispersion” This leads to a more general form of model: log- linear model (later)
Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004). Objective: To determine whether a multi-facted systems intervention would eliminate catheter-related bloodstream infections (CR-BSIs) Design: prospective cohort in surgical ICU at JHU including all patients with central venous catheter in ICU. Two ICUs Interventions: educating staff creating catheter insertion cart asking providers daily if catheters could be removed implementing checklist to ensure adherence to guidelines empowering nurses to stop catheter insertion if violation of guidelines was observed.
Example: Catheter-Related Bloodstream Infections in the ICU (Critical Care Medicine, 2004). Analysis Poisson regression Outcome is rate of CR-BSIs Data structure number of infections per quarter in ICU number of catheter days (counting every patient who has catheter at 12am each day). Patients each counted only once indicator of control vs. intervention ICU Intervention not implemented until 1 st quarter 1999.
Dataset. list | quarter ncase cathdays rate dataset quartern | | | 1. | Qtr | 2. | Qtr | 3. | Qtr | 4. | Qtr | 5. | Qtr | | | 6. | Qtr | 7. | Qtr | 8. | Qtr | 9. | Qtr | 10. | Qtr | | | 11. | Qtr | 12. | Qtr | 13. | Qtr | 14. | Qtr | 15. | Qtr | | | 16. | Qtr |
Observed Data
R code data <- read.csv("csicu7.csv") plot(data$quartern, data$rate, xlab="Quarter", ylab="Rate of Infection per 1000 catheter days", pch=16) points(data$quartern[data$dataset==1], data$rate[data$dataset==1], pch=16, col=2) lines(data$quartern[data$dataset==0], data$rate[data$dataset==0], col=1) lines(data$quartern[data$dataset==1], data$rate[data$dataset==1], col=2) legend(12,22, c("Intervention ICU","Control ICU"), col=c(1,2), pch=c(16,16)) abline(v=5, lty=3)
Estimating the Poisson regression Want to model change in rates However, the first 4 quarters there was no intervention. Based on the observed data and on the data structure, what model is appropriate?
Poisson regression model What is the model for IV=0 and quarter<5? IV=0 and quarter ≥5? IV=1 and quarter<5? IV=1 and quarter ≥5?
R code ncase <- data$ncase cathdays <- data$cathdays control <- data$dataset intervention <- 1- control quartern <- data$quartern # create knot for spline model k1 5,quartern-5,0) # FIT MODEL WITH INTERACTIONS WITH TIME FOR BOTH GROUPS reg <- glm(ncase~intervention*quartern+ intervention*k1, family=poisson, offset=log(cathdays)) summary(reg)
Results Call: glm(formula = ncase ~ intervention * quartern + intervention * k1, family = poisson, offset = log(cathdays)) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) <2e-16 *** intervention quartern k intervention:quartern intervention:k Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: on 39 degrees of freedom Residual deviance: on 34 degrees of freedom AIC:
Fitted model, rate scale
R code fit.early.0 <- b[1] + b[3]*seq(1,5,1) fit.late.0 <- (b[1]-b[4]*5) + (b[3]+b[4])*seq(5,20,1) fit.early.1 <- (b[1]+b[2]) + (b[3]+b[5])*seq(1,5,1) fit.late.1 <- (b[1]+b[2]-b[4]*5-b[6]*5) + (b[3]+b[4]+b[5]+b[6])*seq(5,20,1) fit.early.0 rate.early.0 <- exp(fit.early.0)*1000 rate.early.0 rate.early.1 <- exp(fit.early.1)*1000 rate.late.0 <- exp(fit.late.0)*1000 rate.late.1 <- exp(fit.late.1)*1000 # add lines to plot for fitted control ICU lines(seq(1,5,1), rate.early.0, col=2) lines(seq(5,20,1), rate.late.0, col=2) # add lines to plot for fitted intervention ICU lines(seq(1,5,1), rate.early.1, col=1) lines(seq(5,20,1), rate.late.1, col=1)
Fitted model, linear predictor scale
Real question Is the change in infection rates different in the two ICUs? That is, are the slopes after Q5 different? How to test that: slope in control ICU: β 3 + β 4 slope in intervention ICU : β 3 + β 4 + β 5 + β 6 What is the hypothesis test?
Linear Combination of Coefficients > estimable(reg, c(0,0,0,0,1,1)) Estimate Std. Error X^2 value DF Pr(>|X^2|) ( )
Example: Breast Cancer Incidence in SC Cunningham et al. Hypothesize that there are differences in subtypes of breast cancer by race ER + vs. ER- Grades 1, 2, 3 Stage 1, 2, 3, 4 Incidence of breast cancer varies by age Data: Tumor registry data for SC (and Ohio) Census data for SC
Poisson modeling Rate of incidence per cancer type Modeled as a function of ER, grade and race > summary(reg1) Call: glm(formula = nc ~ age + age2 + age3 + bl + er + gr + age * bl + age2 * bl + age3 * bl + age * er + age2 * er + age3 * er + age * gr + age2 * gr + age3 * gr + bl * er + bl * gr + er * gr, family = poisson, offset = log(9 * popn))
Results
Confidence Intervals
Incidence Ratio for AA vs. EA