Design and Analysis of Clinical Study 10. Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia
Uses of Cohort Study Identification of risk factors (or prognostic factors)? Uses of risk factors (or prognostic factors) Relation of risk measures to risk factors “It is estimated that smoking is responsible for 100,000 deaths from lung cancer annually”.
Risk Risk = –Prospective chance (probability) –Rate of occurrence (incidence) of a health- related event Measures of risk –Incidence density (including mortality rate) –Cumulative incidence(including "attack rate") –Case-fatality rate –Survival rate
Criteria to be Fulfilled in Cohort Studies Observation must take place over a meaningful period of time All members of the cohort must be observed. Drop-outs distort the study.
Types of Cohort Past PresentFuture Concurrent Cohort Assembled now Followed into future Historical Cohort Assembled from past records Followed till now Retrospective Cohort Exposure and outcome
Analysis of Cohort Studies Exposed Time Diseased (n=39) Healthy Not exposed Diseased (n=6) Healthy n = n =
-Medication: Corticosteroids Modifiable risk factors Non-modifiable risk factors Fracture - Bone-related factors: BMD, bone strength indice… - Fall and fall-related factors - Prior fracture - Lifestyle: smoking, alcohol - Advancing age - Family history - Genetics Intervention strategies Identify high-risk group
Prospective Cohort Study
Basal cohort(s) Sampling from defined population, or Stratified assembly, or Matched assembly Observation for defined period under specified observational protocol Time of data collection: prospective vs. Retrospective cohort studies
Prospective Cohort Study
Factors in Prospective cohort study Event (e.g. disease) Person at risk, population at risk Person-years
Population at risk (N=200)
Week 1 O O
Week 2 O O O O O
Week 3 O O O O O O O
Person-time Person-time = # persons x duration Time (week) Incidence rate (IR). During ( )=20 person-years, there were 2 incident cases: IR = 2/20 = o x
Estimation of Incidence Rates Consider a study where P patient-years have been followed and N cases (eg deaths, survivors, diseased, etc.) were recorded. Assumption: Poisson distribution. The estimate of incidence rate is: I = N / P Standard error of I is: 95% confidence interval of “true” incidence rate: I x SD(I)
Relative Risk Incidence rate of ischemic heart disease (IHD) 2750 kcal _____________________________________________________________ _ Person-years New cases _____________________________________________________________ _ Estimate rate SD of est. rate Relative risk (RR): L = log(RR) = Standard error of log(RR) 95% of L: L ± 1.96xSE = ± 1.96x = , % of RR: = exp(0.3055), exp(1.51) = 1.36, 4.53
Analysis of Difference in Incidence Rates Incidence rate of ischemic heart disease (IHD) 2750 kcal _____________________________________________________________ _ Person-years New cases _____________________________________________________________ _ Estimate rate SD of est. rate Difference: D = 15.1 – 6.1 = 8.93 Standard error (SE) of D 95% of D = D ± 1.96xSE = 8.93 ± 1.96x0.032 = 3.65, 14.2
Logistic Regression Analysis using R fracture <- read.table(“fracture.txt”, header=TRUE, na.string=”.”) attach(fulldata) results <- glm(fx ~ bmd, family=”binomial”) summary(results) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) bmd (Dispersion parameter for binomial family taken to be 1) Null deviance: on 136 degrees of freedom Residual deviance: on 135 degrees of freedom AIC:
Incidence Density Number of new cases ID = ––––––––––––––––––– Population time Number of new cases 7 ID = ––––––––––––––––––– = –––––– Population time ?
Incidence Rate Population time at risk: –200 people for 3 weeks = 600 person-wks –But 2 people became cases in 1st week 3 people became cases in 2nd week 2 people became cases in 3rd week Only 193 people at risk for 3 weeks
Incidence Rate Population-time: –2 people who became cases in 1st week were at risk for 0.5 weeks each = 0.5 = 1.0 –3 people who became cases in 2nd week were at risk for 1.5 weeks each = 1.5 = 4.5 –2 people who became cases in 3rd week were at risk for 2.5 weeks each = 2.5 = 5.0 –Non-cases = 3 = 579 –TOTAL POPULATION – TIME = Person-weeks Incidence rate: 7 ID = –––––– = cases / person-wk average over 3 weeks
Incidence Proportion 7 3-week CI = –––– = Number of new cases CI = ––––––––––––––––––– Population at risk
Summary of Cohort Study’s Results Exposure statusTotal childrenDeveloped disease (disabled) Incidence Low ApgarN1N1 X1X1 I 1 = X 1 /N 1 High ApgarN2N2 X2X2 I 2 = X 2 /N 2 Relative risk (RR) = I 1 / I 2
Person-time Person-time = # persons x duration Time Incidence rate (IR). During ( )=20 person-years, there were 2 incident cases: IR = 2/20 =
Incidence
Estimation of Incidence Rates Consider a study where P patient-years have been followed and N cases (eg deaths, survivors, diseased, etc.) were recorded. Assumption: Poisson distribution. The estimate of incidence rate is: I = N / P Standard error of I is: 95% confidence interval of “true” incidence rate: I x SD(I)
Relative Risk Incidence rate of ischemic heart disease (IHD) 2750 kcal _____________________________________________________________ _ Person-years New cases _____________________________________________________________ _ Estimate rate SD of est. rate Relative risk (RR): L = log(RR) = Standard error of log(RR) 95% of L: L ± 1.96xSE = ± 1.96x = , % of RR: = exp(0.3055), exp(1.51) = 1.36, 4.53
Analysis of Difference in Incidence Rates Incidence rate of ischemic heart disease (IHD) 2750 kcal _____________________________________________________________ _ Person-years New cases _____________________________________________________________ _ Estimate rate SD of est. rate Difference: D = 15.1 – 6.1 = 8.93 Standard error (SE) of D 95% of D = D ± 1.96xSE = 8.93 ± 1.96x0.032 = 3.65, 14.2
Logistic Regression Analysis using R fracture <- read.table(“fracture.txt”, header=TRUE, na.string=”.”) attach(fulldata) results <- glm(fx ~ bmd, family=”binomial”) summary(results) Deviance Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) bmd (Dispersion parameter for binomial family taken to be 1) Null deviance: on 136 degrees of freedom Residual deviance: on 135 degrees of freedom AIC:
Dubbo Osteoporosis Epidemiology Study 1989 – 1993: –Recruit 3000 individuals –Measure bone mineral density (BMD) –Classified BMD into normal and osteoporosis 1989 – 2005: –Record the number of fractures –Analysis of association between BMD and fracture
Dubbo Osteoporosis Epidemiology Study 1287women Low BMD 345 (27%) Not Low BMD 942 (73%) Fx = 137 (40%) No Fx = 208 (60%) No Fx = 751 (80%) Fx = 191 (20%) 42%
Advantages and Disadvantages of Cohort Studies Direct calculation of incidence Time sequence can be established Different outcomes for one agent can be determined Large numbers to be measured over a long time Subclinical disease may escape diagnosis Advantages Disadvantages