Lecture 1: Fundamentals of epidemiologic study design and analysis Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II Department of Public Health Sciences Medical University of South Carolina Spring 2015
Basic study designs Ecologic Cohort Case-control Cross-sectional Longitudinal Cohort Case-control Randomized controlled trial
Other study designs Case-cohort Nested case-control Case-crossover
Ecologic studies Unit of observation: geographical area No individual information available Analyze correlations between: Mean value of exposure of interest Rate of disease of interest Vulnerable to “ecologic fallacy”
Ecologic fallacy “Marginal” information known Individual information not known Exp Not exp Dis ? 50 No dis 950 200 800 1000
Ecologic fallacy (exposure appears related to disease) Not exp Dis ? 50 No dis 950 200 800 1000 Exp Not exp Dis ? 150 No dis 850 500 1000 Region 1 Region 2
Ecologic fallacy unmasked (there is actually no association) Exp Not exp Dis 10 40 50 No dis 190 760 950 200 800 1000 Exp Not exp Dis 75 150 No dis 425 850 500 1000 Region 1 Region 2
Ecologic fallacy Scenario 2: example from Szklo book, page 16 Are poor people bad drivers??? (see next slide)
Ecologic studies Great for generating hypotheses Aggregate measures (mean across people) Environmental (physical exposures) Global measures (sociopolitical, etc.) e.g. dietary fat intake/breast cancer; vitamin D/prostate cancer Mixed individual-ecologic study Some variables are measured using an ecologic criterion (neighborhood characteristics, etc.)
Cross-sectional studies Single timepoint May be baseline data from a cohort study Assess association between exposure and disease of interest Limited to prevalent disease outcomes Reflects incidence rate and duration/survival Exposure data more vulnerable to recall bias Less time-consuming, less expensive
Cross-sectional studies Disadvantage: concurrent exposure and disease information restricts causal inference May be able to collect historical exposure data in questionnaire (vulnerable to recall bias)
Cohort studies Assemble individuals without disease Assess exposure status Follow individuals over time Observe incident disease events Avoid recall bias More expensive, time-consuming
Cohort studies Can estimate disease risk, disease rate Can estimate proportion exposed (if population-based sample) Can evaluate numerous outcomes Can evaluate numerous exposures Cohort study can be basis for more efficient study designs (case-cohort, nested case-control)
Cohort studies Occupational exposures: Can use occupational cohorts to evaluate specific exposures (e.g. chemicals, radiation) at high doses in humans Not representative of general population Vulnerable to healthy worker survival effect Exposed group and comparison group may have comparability problems
Cohort studies Retrospective cohort studies Historical exposure data is available Medical records are available through time Cohort is assembled and followed through historical time to simulate a prospective cohort study Often used in occupational studies Less expensive and time-consuming
Case-control studies Individuals recruited into study based on disease status Case definition can be critical Historical exposure information obtained Exposure compared between cases and controls Vulnerable to recall bias
Case-control studies Selection of control group is critical Population-based, hospital-based? Matching? Controls should be representative of the population from which the cases occurred (people who would have been recruited as cases if they had had the disease of interest)
Case-control studies See Figure 1-19, Szklo page 26: survival bias (see next slide)
Case-cohort study Based in cohort study Sub-cohort is identified: subset of participants at baseline Sub-cohort may include eventual cases Individuals who become cases are compared to sub-cohort
Case-cohort study Advantages: less expensive than cohort study, if lab tests are done on selected stored samples instead of all samples Exposures assessed before incident disease (no recall bias) Sub-cohort can be comparison group for more than one case group (e.g. different disease)
Case-cohort study See Figure 1-21, Szklo page 28 (see next slide)
Nested case-control study Based in cohort study As cases arise, one or more (matched) controls are selected Individuals may serve as a control at one timepoint, then serve as a case at a later timepoint Advantages similar to case-cohort design
Nested case-control study See Figure 1-20, page 27 Szklo (see next slide)
Case-crossover design All individuals have the disease of interest Exposure for each individual is compared at one timepoint (e.g. just before diagnosis) versus another timepoint (e.g. one year earlier) Useful for acute effects of exposure (e.g. environmental, psychological, physical)
Randomized controlled trials Experiment May test medical, behavioral, social intervention Compare outcomes between groups Randomization should eliminate the possibility of bias or confounding, even from unknown confounders Findings may not be generalizable, depending on sampling strategy/recruitment into study
Measures and associations Strength of association Risk ratio, odds ratio, hazard ratio, rate ratio 1.0 denotes no association (i.e. the exposure groups have the same risk) Statistical significance / p value Chi-square, t-test, multivariable regression p<0.05 denotes statistical significance Confidence intervals Show precision of estimate and statistical significance
Measures and Associations (continuous outcome) Mean Median Percentiles Difference between means Can calculate risk ratio for 1-unit increase, 10-unit increase, etc. (association is assumed to be constant over the exposure range)
Measures and Associations (categorical outcome) Prevalence Incidence Risk Odds Relative risk / risk ratio Hazard ratio Odds ratio Rate ratio
Prevalence People currently living with a health outcome of interest (e.g. 72%, 137/100,000, etc.) Prevalence is a reflection of several factors Incidence rate Cure rate Progression rate Death rate Explanatory factors (e.g. age, causal exposures, medical care)
Prevalence Point prevalence Period prevalence Lifetime prevalence
Incidence New cases of disease (risk or rate) Occur over time, during study follow-up or during public health surveillance Reflects: Changes in diagnostic standards Screening bias (early detection) Latent period (undetected/undetectable) (these factors also affect prevalence)
Risk Same as “proportion” Assumes all individuals have the same follow-up time Risk ratio: disease risk in exposed group, divided by risk in unexposed group
Rate Individuals do not need to have the same follow-up time Denominator is person-years of follow-up in each group Sum of individual follow-up times Rate ratio: disease rate in exposed group, divided by rate in unexposed group
Rate (example) Deaths Person-yrs Rate Group 1 45 13,739 32.8/10,000 15,180 21.1/10,000 relative rate = 1.6
Risk Ratio vs. Rate Ratio Example: compare two samples followed over 5 years. Group 1: 20% develop cancer, all within the first year Group 2: 20% develop cancer, all during Year 5 What is the risk ratio? What about the rate ratio?
Odds Probability of having disease (or exposure), divided by probability of not having disease (or exposure) Useful for case-control study Odds ratio is a good estimate of risk ratio for rare diseases
Odds ratio Exp No exp Dis 45 5 50 No dis 10 40 55 100 Odds of exposure in cases: 45/5 Odds of exposure in controls: 10/40 Exposure odds ratio = (45/5)/(10/40) = 36 Disease odds ratio also equals 36
Risk ratio Exp No exp Dis 45 5 50 No dis 10 40 55 100 Risk of disease in exposed: 45/55 Risk of disease in unexposed: 5/45 Risk ratio = (45/55)/(5/45) = 7.4
Risk ratio Exp No exp Dis 9 1 10 No dis 250 740 990 259 741 1000 Risk of disease in exposed: 9/259 Risk of disease in unexposed: 1/741 Risk ratio = (9/259)/(1/741) = 25.7
Hazard ratio Used in survival analysis (Cox proportional hazards model) of cohort studies “Time-to-event” information is the outcome of interest When each case arises, all other noncensored, healthy individuals serve as controls for that case at that timepoint. The model assumes that the hazard ratio (exposed/unexposed) stays constant over time.
Rate ratio Can be estimated in Poisson regression Another generalized linear model Uses “count” data, typically in cohort studies The model assumes that independent risk factors result in multiplicative risks.
Risk difference Can be estimated in linear regression Outcome is assumed to be continuous, normally distributed Exposures can be continuous, ordinal, or categorical The model assumes that independent risk factors result in additive risks.
The Golden Rule Thou shalt design and analyze epidemiologic studies in such a way as to allow you to answer the scientific question of interest. Corollary: methods are a means to an end. Implication: do not adapt the question to fit the methods.