Case-Control Studies and Odds Ratio STAT 6395 Spring 2008 Filardo and Ng
Types of Epidemiologic studies
A study in which a group of persons with a disease (cases) and a comparison group of persons without the disease (controls) are compared with respect to the history of past exposures to factors of interest Case-Control studies
A study in which a group of persons with a disease (cases) and a comparison group of persons without the disease (controls) are compared with respect to the history of past exposures to factors of interest Case-Control studies PresentPastTime Type of studies Observational Case-Control
Case-Control studies (study schema) Type of studies Observational Case-Control
Cohort studies A study in which a group of persons exposed to a factor of interest and a group of persons not exposed are followed Type of studies Observational Cohort studies and compared with respect to the incidence rate of the disease or other condition of interest Time
Comparison is fundamental in determining the relationship between an exposure and a disease Cohort study Incidence rate in exposed group is not enough (Needs nonexposed comparison group) Case-control study Past exposures of a group of cases is not enough (Needs a control group for comparison)
Fundamental difference between a case- control study and a cohort study A case-control study starts with people: a) with; and b) without the disease of interest and compares their past exposures A cohort study starts with people: a) with; and b) without the exposure of interest and compares their future disease
Time
The fundamental difference between a case-control study and a cohort study is not the calendar time period during which exposures took place …for example, in a both a retrospective cohort study and a case-control study, the calendar time period during which exposures took place is in the past
How do we measure past exposures in a case-control study? Interview Medical records/charts Assays of biological specimens (e.g. nested case- control studies) Goal is to measure exposures that occurred before the onset of disease
Selection of cases: incident vs. prevalent Incident (newly diagnosed) cases Risk factors might contribute to the development of the disease Prevalent (existing) cases Cannot distinguish between risk factors for the development of the disease and risk factors for cure or survival More difficult to know which came first, the exposure or the disease Exposure measurements problematic Prevalence = Number of a given disease at a particular time point or during a particular time period (this is a proportion and not a rate)
Selection of cases Definition of source population -- the population that gives rise to the cases Case definition -- need to have definite medical criteria for who is a case of the disease Case identification -- need to put a system in place for finding all cases who meet the case definition and are members of the source population Source Population Cases and Controls
Selection of appropriate controls is the major methodological challenge in case-control studies In a case-control study, we want to determine whether exposures of interest differ between the case group and the source population Controls should be selected from the source population that gave rise to the cases. Selection of controls Source Population Cases and Controls
The controls should be representative of the source population with respect to the exposures of interest. Ideally, controls should be a random sample of the source population. Selection of controls (continued) Prevalent cases of the disease should not be eligible to be controls
Study I: Controls selected such that they have a higher level of exposure than the source population, producing an an artifactual result that the exposure is negatively associated with the disease
Study II: Controls selected such that they have a lower level of exposure than the source population, producing an artifactual result that the exposure is positively associated with the disease
Study III: Controls selected such that they have the same level of exposure as the source population, producing the unbiased (true) result that the exposure is not associated with the disease
Case-control studies classified by type of source population (population that gives rise to the cases) Population-based case-control studies Hospital (or clinic)-based case-control studies Nested case-control studies Case-Control studies Population-based, hospital-based, nested
Population-based case-control studies Source population: all residents of a defined geographic area who do not have Disease X Cases: all new cases of Disease X that occur among residents of a defined geographic area over a specified period of time Controls: sample (ideally random) of the source population over the same period of time Case-Control studies Population-based, hospital-based, nested
Population-based: Parkinson’s disease Source population: residents of Texas who do not have Parkinson’s disease Cases: all new cases of Parkinson’s disease among Texas residents identified over a 3 year period through a rapid-reporting system Controls: sample (ideally random) of residents of Texas who do not develop Parkinson’s disease over the same 3 year period Case-Control studies Population-based, hospital-based, nested
Selection of controls in population-based case- control studies Random sample from a population registry Neighborhood controls -- sample of persons who reside in the same neighborhoods as the cases Often done by matching Case-Control studies Population-based, hospital-based, nested
Random selection of telephone numbers (random digit dialing) Selection of controls in population-based case- control studies (continued) members of the source population does not have the same probability of being contacted (not a random sample): Case-Control studies Population-based, hospital-based, nested persons without a landline have zero probability of being selected households vary the amount of time someone is home and number of telephones many people screen their telephone calls
Source population: all people without Disease X who would attend Hospital A if they had Disease X Cases: all new cases of Disease X identified in Hospital A over a specified period of time Controls (most commonly): sample of patients in Hospital A with diagnoses other than Disease X over the same period of time Hospital-based case-control studies Case-Control studies Population-based, hospital-based, nested
Source population: all persons who would attend Baylor University Medical Center (BUMC) if they had Parkinson’s disease Note that this source population is, in practice, impossible to identify Cases: all new cases of Parkinson’s disease seen at BUMC over a 3 year period Controls: sample of patients at BUMC with diagnoses other than Parkinson’s disease over the same 3 year period Hospital-based: Parkinson’s disease Case-Control studies Population-based, hospital-based, nested
Source population: all persons who would attend the hospital if they developed the disease of interest Selection of controls in hospital-based case- control studies Source population in a hospital-based case-control study is usually not identifiable A random sample of the general population will not necessarily correspond to a random sample of the source population because it does not take into account the referral patterns of the hospital Furthermore, referral pattern depends on the disease Case-Control studies Population-based, hospital-based, nested
Hospital-based controls are patients without the disease from the same hospital Selection of controls in hospital-based case- control studies (continued) Hospital-based controls are a nonrandom sample of the source population, most of whom are healthy Nonrandom sampling of the source population introduces the possibility that the distribution of the exposure of interest among the controls is not the same as it is in the source population Case-Control studies Population-based, hospital-based, nested
Hospital-based controls may not reflect the exposure distribution in the source population Exposures of interest may cause or prevent the diseases for which patients in the control group were hospitalized Case-Control studies Population-based, hospital-based, nested
Persons with an exposure of interest may be more or less likely than persons without the exposure to be hospitalized for their disease if they develop it (this could also be an issue for cases) Case-Control studies Population-based, hospital-based, nested Hospital-based controls may not reflect the exposure distribution in the source population
Hospital-based controls unrepresentative of the exposure distribution in the source population: Parkinson’s disease Controls: random sample of persons hospitalized for other diseases, many of whom were hospitalized for heart disease Low folic acid intake is a risk factor for heart disease This control group would have a lower proportion of persons with high folic acid intake than the source population Case-Control studies Population-based, hospital-based, nested
Controls selected such that they have a lower level of exposure than the source population, producing an artifactual result that the exposure is positively associated with the disease Case-Control studies Population-based, hospital-based, nested
Limit the controls to those hospitalized for diseases for which there is no suspicion of a relationship with the exposures of interest Selection of controls in hospital-based case- control studies (continued) Case-Control studies Population-based, hospital-based, nested
Include a variety of diseases in the control group, so as to dilute the biasing effects of including a disease that might related to the exposure, unbeknownst to the investigator Selection of controls in hospital-based case- control studies (continued) Case-Control studies Population-based, hospital-based, nested
Excluded diseases should only apply to the diagnosis at the current hospitalization Selection of controls in hospital-based case- control studies (continued) Case-Control studies Population-based, hospital-based, nested
Excluded diseases should only apply to the diagnosis at the current hospitalization: Parkinson’s disease Controls: persons hospitalized due to traumatic injury, who are believed to be representative of the source population with respect to folic acid intake Persons with a history heart disease should not be excluded from this traumatic injury control group. This would cause the control group to have an over- representation of persons with high folic acid intake Case-Control studies Population-based, hospital-based, nested
Controls selected such that they have a higher level of exposure than the source population, producing an an artifactual result that the exposure is negatively associated with the disease Case-Control studies Population-based, hospital-based, nested
Nested case-control studies (nested within a concurrent cohort study) Source population: the subjects in an ongoing concurrent cohort study who did not have Disease X at baseline Cases: all new cases of Disease X that occurred in the cohort over a defined period of follow-up Controls: random sample of subjects in the cohort who did not develop Disease X over the defined period of follow-up Case-Control studies Population-based, hospital-based, nested
Exposures measured by assay of stored biologic specimens collected from the subjects at baseline Nested case-control studies (nested within a concurrent cohort study) Nested case-control study has advantage of cohort studies: exposure measured at baseline before development of disease
Collection of additional exposure information not collected at baseline requires labor- intensive data collection activities, such as abstraction from records Nested case-control studies (nested within a concurrent cohort study) Nested case-control study has advantage of cohort studies: exposure measured at baseline before development of disease
Nested case-control studies (nested within a concurrent cohort study) biologic specimens collected from the subjects at baseline cases and controls identified specimens used to assess exposure and compare it among study groups Time Case-Control studies Population-based, hospital-based, nested
Nested case-control: Parkinson’s disease Source population: the members of the Nurses’ Health Study cohort who donated blood samples in and had no history of Parkinson’s disease Cases: all new cases of Parkinson’s disease that developed in this source population from 1991 to 2000 Controls: random sample of the Nurses’ Health Study cohort who did not develop Parkinson’s disease from 1991 to 2000 Measurement of exposure: serum folic acid level at baseline Case-Control studies Population-based, hospital-based, nested
Advantages of the nested case-control studies over the concurrent cohort study itself Cost: suppose there were 32,000 women in the source population, 200 cases, and 200 controls. Causality: the exposure occurred before the disease Further research: preservation of precious biologic specimens –remaining specimens available for other studies Case-Control studies Population-based, hospital-based, nested
Accounting for confounders in selecting controls Matching -- selection of controls such that they are similar to cases with respect to factors other than the exposures of interest
Matching Common matching factors: age, sex, race, socioeconomic status Accounting for confounders Matching
Matching Frequency matching: selection of controls such that the distributions of the matching factors (e.g., age, sex) are similar in the case and control groups Accounting for confounders Matching
Matching Individual matching: each control is individually matched to a case with respect to specific factors, resulting in matched case- control pairs For example, for each case, select a control of the same race, sex, age (within 3 years), neighborhood (within 3 blocks) Accounting for confounders Matching
Matching is intuitively appealing, but its implications are complicated … In a case-control study, the association between matching factors and disease cannot be studied Accounting for confounders Matching
Matching is intuitively appealing, but its implications are complicated … Overmatching can occur if a matching factor is associated with the exposure of interest, thus making the controls artifactually like the cases with respect to that exposure Accounting for confounders Matching
Matching is intuitively appealing, but its implications are complicated … Matching must be taken into account in the analysis through special analytic techniques We will cover some of these techniques in this course Accounting for confounders Matching
Conventional data layout for case-control study (2x2 table)
Estimating relative risk in a case-control study In a case-control study, we cannot measure incidence rates in the exposed and nonexposed groups, and therefore cannot calculate the relative risk directly In a case-control study, the odds ratio is a good approximation of the relative risk in some circumstances …
Odds Probability that cases were exposed = a/(a+c) Probability that cases were not exposed = c/(a+c) Odds of a case having been exposed = [a/(a+c)]/[c/(a+c)] = a/c Similarly, odds of a control having been exposed = b/d
Odds Ratio (OR) The ratio of the odds that the cases were exposed to the odds that the controls were exposed = (a/c)/(b/d) = ad/bc The odds ratio is the cross-product ratio in the 2x2 table
Interpretation of the Odds Ratio The odds ratio is a good approximation of the relative risk when the disease being studied occurs infrequently (which is the situation in most circumstances case-control studies are conducted) ONLY in this case the interpretation of the odds ratio in case- control studies is the same as the interpretation of the relative risk in cohort studies
OR = 1 Risk in exposed = risk in nonexposed No association OR > 1 Risk in exposed > risk in nonexposed Positive association The larger the OR, the stronger the association May or may not be causal Interpretation of the Odds Ratio
OR < 1 Risk in exposed < risk in nonexposed Negative association The smaller the OR, the larger the negative association May or may not be causal If causal, indicates a protective effect Interpretation of the Odds Ratio
Interpretation of the Odds Ratio: Example OR = (59 X 44) / (33 X 17) = 4.63 Patients that eat ¾ of served or less are 4.63 times more likely to be dependent feeding than patients that eat more than ¾ of served food
A further example of the calculation and interpretation of the odds ratio is given by Bland & Altman (Bland J.M. & Altman D.G. (2000) The odds ratio. British Medical Journal 320, 1468.) Interpretation of the Odds Ratio
The odds ratio may be a misleading approximation to relative risk if the event rate is high (Deeks (1996) and Davies et al. (1998)) Interpretation of the Odds Ratio
Since the odds ratio is difficult to interpret, why is it so widely used? Odds ratios can be calculated for case- control studies whilst relative risks are not available for such studies. Interpretation of the Odds Ratio
Attributable risk percent (exposed) using odds ratio [(OR - 1)/OR] x 100 Tells us what percent of the disease among the exposed is due to the exposure
Attributable risk percent (population) using odds ratio _ P x (OR-1) _ x 100 P x (OR-1) + 1 where P is the population prevalence of the exposure P can be estimated by the prevalence of the exposure in the controls Tells us what percent of the disease in the total population is due to the exposure