SJS SDI_141 Design of Statistical Investigations Stephen Senn 14 Case Control Studies
SJS SDI_142 Case-Control Study Definition The observational epidemiologic study of persons with the disease (or other outcome variable) of interest and a suitable control (comparison, reference) group of person with the disease. The relationship of the attribute to the disease is examined by comparing the diseased or nondiseased group with regard to how frequently the disease is present, or if quantitative, the levels of the attribute in each group. In short the past history of exposure to a suspected risk factor is compared between cases and controls, persons who resemble the cases in such respects as age and sex but do not have the disease or condition of interest. Last, J.M. A Dictionary of Epidemiology
SJS SDI_143 Schematic Representation of Cohort Study Each point represents a member of the cohort of 10,000 persons
SJS SDI_ cases and 200 controls are sampled from diseased and healthy persons respectively
SJS SDI_145 The number of cases and controls is a foregone conclusion. Exposure becomes the random variable and is studied as a function of status Note that axes have been exchanged to reflect this
SJS SDI_146 Smoking and Lung-Cancer Obs_7 Famous study of Hill and Doll Sampled 1357 cases of lung cancer from four hospitals in the United Kingdom Sampled 1357 hospital-based controls Compared the two groups as regards smoking history
SJS SDI_147 Doll and Hill Data Obs_7
SJS SDI_148 In General
SJS SDI_149 A Model for Case-Control Studies Number exposed Number unexposed Probability case if exposed Probability case if unexposed Probability recorded if case Probability recorded if control
SJS SDI_1410 Expectations etc.
SJS SDI_1411 Notes Thus the odds-ratio can be estimated even though n E, n U, and are unknown. However, although the assumption that and are equal is not needed, an assumption that they do not vary with exposure is needed.
SJS SDI_1412 Sources for Controls ( Rothman ) Population –using population register Neighbourhood –For example one or two control from neighbourhood of case Not suitable for environmental exposure Random digit dialing Hospitals or clinics
SJS SDI_1413 Cohort and Case Control Studies Cohort Case Control Complete population Can calculate incidence rates Usually expensive Convenient for studying many diseases Can be prospective or retrospective Sampled population Can calculate ratios only Usually less expensive Convenient for studying many exposures Can be prospective or retrospective Rothman p 91
SJS SDI_1414 The Delta Method
SJS SDI_1415 Variance of a Logit
SJS SDI_1416 Variance of the log-odds ratio The log-odds ratio is the difference between two logits. Since these are independent, the variance of their difference is the sum of their variances. Thus, in terms of our previous table, we have Note the implications of the variance formula. The variance cannot be reduced beyond the reciprocal of the entry in a given cell by increasing the frequencies of the other cells.
SJS SDI_1417 S-Plus Analysis Obs_7 #Doll and Hill options(contrasts=c("contr.treatment", "contr.poly")) #set contrast options #To analyse the famous case-control study Outcome<-factor(c("case","case","control","control")) Exposure<-factor(rep(c("smoker","non-smoker"),2)) Freq<-c(1350,7,1296,61) Doll.Hill<-data.frame(Outcome, Exposure, Freq) Doll.Hill OR<-Freq[1]*Freq[4]/(Freq[2]*Freq[3]) l.OR<-log(OR) var<-(1/Freq[1]+1/Freq[2]+1/Freq[3]+1/Freq[4]) SE<-sqrt(var) t<-l.OR/SE LCL<-exp(l.OR-1.96*SE) UCL<-exp(l.OR+1.96*SE) results.1<-data.frame(l.OR,var,SE,t,LCL,OR,UCL) results.1
SJS SDI_1418 #Fit results using a log-linear model fit1<-glm(Freq~Exposure*Outcome,family=poisson) summary(fit1,cor=F) #Prepare data to perform logistic regression Y<-c(Freq[1],Freq[3]) N<-c(Freq[1]+Freq[2],Freq[3]+Freq[4]) Exposure2<-factor(c("Smoker","Non-smoker")) P<-Y/N DollHill.2<-data.frame(Y,N,P,Exposure2) DollHill.2 #Logistic regression fit2<-glm(P~Exposure2,family=binomial,weight=N) summary(fit2,cor=F)
SJS SDI_1419 > Doll.Hill Outcome Exposure Freq 1 case smoker case non-smoker 7 3 control smoker control non-smoker 61 > results.1 l.OR var SE t LCL OR UCL Call: glm(formula = Freq ~ Exposure * Outcome, family = poisson) Coefficients: Value Std. Error t value (Intercept) Exposure Outcome Exposure:Outcome
SJS SDI_1420 > DollHill.2 Y N P Exposure Smoker Non-smoker Call: glm(formula = P ~ Exposure2, family = binomial, weights = N) Coefficients: Value Std. Error t value (Intercept) Exposure (Dispersion Parameter for Binomial family taken to be 1 )
SJS SDI_1421 Questions Why did Hill and Doll choose a case-control study rather than a cohort study? We now believe that the choice of controls used in the Hill and Doll study led to an underestimate of odds ratio for lung cancer and smoking why? Consider the recent controversy over breast implants and connective tissue disease. What difficulty does press-coverage cause for any case- control study in this field? Why do epidemiologists rarely use more than three controls per case?