The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July 10, 2006 Day 1: Definitions, Measures of Disease Incidence & Association
Nicholas P. Jewell© Copyright 2006, all rights reserved2 Course Outline Class meets from 8:30am—12:15pm Break? Labs Meet 5:30—8pm (except Friday when it stops at 7pm) Rough Idea of Topics Day 1: Definitions, Measures of Disease Incidence and Association Day 2: Confounding, Interaction & Stratification Techniques Day 3: Regression Models, Logistic Regression and Maximum Likelihood Day 4: Confounding & Interaction in Logistic Regression Models, Model Building & Goodness of Fit Day 5: Matched Studies, Alternatives and Extensions to Logistic Regression
Nicholas P. Jewell© Copyright 2006, all rights reserved3 Binary Outcome Data Binary OutcomeExplanatory Factors Use of Mental Health Services in 2005 Costs of mental health visit, sex Moved Residence in 2005Family size, family income Low birthweight of newbornHealth insurance status of mother, marital status of mother Vote Republican in 2004 electionParental voting pattern, sex Health insurance coveragePlace of birth, marital status Employment status in 2005Education level Choice of transportation to workIncome
Nicholas P. Jewell© Copyright 2006, all rights reserved4 Issues Related to Application Area Study design Randomized? Causality/association Definition of binary outcome Extensions Longitudinal observations More than 2 categories Ordered categories?
Nicholas P. Jewell© Copyright 2006, all rights reserved5 Other Issues Statistical Art in addition to Statistical Science Case studies WCGS (CHD--men) Coffee drinking and pancreatic cancer Spontaneous abortion history and CHD (women) Titanic
Nicholas P. Jewell© Copyright 2006, all rights reserved6 How do we Measure the Binary Outcome for Disease Occurrence? Incidence/prevalence Role of ‘time’ Chronological time Exposure time age Number of contacts Incidence (time interval) Prevalence (time point or interval) Fractions: Incidence Proportion unitless
Nicholas P. Jewell© Copyright 2006, all rights reserved7 Incidence Proportion Definition (D, =1, “yes”): Define risk interval explicitly including time scale (calendar year 2005, year of age 55, first year after menopause, etc) Be at risk at the beginning of the interval (define explicitly what ‘at risk’ means) Become an incident case during interval Incidence proportion is fraction of at risk population who are D Cumulative measure
Nicholas P. Jewell© Copyright 2006, all rights reserved8 Incidence Rate Introduces time at risk into our thinking: Incidence Rate (time interval) “=“ #D/cum. time at risk Units are now time -1 Still measure applies to whole interval (so still cumulative in that sense) Instantaneous Incidence rate: Hazard Function I(t) is the Incidence Proportion over the time interval [0,t]
Nicholas P. Jewell© Copyright 2006, all rights reserved9 Hazard Function for Caucasian Males in California in 1980
Nicholas P. Jewell© Copyright 2006, all rights reserved10 Survival Function (1-I(t)) for Caucasian Males in California in 1980
Nicholas P. Jewell© Copyright 2006, all rights reserved US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059
Nicholas P. Jewell© Copyright 2006, all rights reserved US Infant Mortality A: Death in First Year B: Unmarried Mother P(A&B) = P(A) = P(B) = P(A)xP(B) = x =
Nicholas P. Jewell© Copyright 2006, all rights reserved13 Measures of Association: Relative Risk Relative measure RR = 1 Independence Note upper bound RR is not symmetric in roles of D and E
Nicholas P. Jewell© Copyright 2006, all rights reserved14 Non-Symmetry of RR
Nicholas P. Jewell© Copyright 2006, all rights reserved US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 RR (assoc. with unmarried) =
Nicholas P. Jewell© Copyright 2006, all rights reserved16 Measures of Association: Odds Ratio Relative measure OR = 1 Independence No upper bound OR is symmetric in roles of D and E
Nicholas P. Jewell© Copyright 2006, all rights reserved17 Symmetry of OR
Nicholas P. Jewell© Copyright 2006, all rights reserved18 Symmetry of OR
Nicholas P. Jewell© Copyright 2006, all rights reserved US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 OR (assoc. with unmarried)
Nicholas P. Jewell© Copyright 2006, all rights reserved20 OR as Approximation to RR
Nicholas P. Jewell© Copyright 2006, all rights reserved21 OR as Approximation to RR
Nicholas P. Jewell© Copyright 2006, all rights reserved22 OR as Approximation to RR
Nicholas P. Jewell© Copyright 2006, all rights reserved23 OR as Approximation to RR
Nicholas P. Jewell© Copyright 2006, all rights reserved24 OR as Approximation to RR
Nicholas P. Jewell© Copyright 2006, all rights reserved25 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference
Nicholas P. Jewell© Copyright 2006, all rights reserved26 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference
Nicholas P. Jewell© Copyright 2006, all rights reserved27 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference
Nicholas P. Jewell© Copyright 2006, all rights reserved28 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference % % %
Nicholas P. Jewell© Copyright 2006, all rights reserved29 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference
Nicholas P. Jewell© Copyright 2006, all rights reserved30 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference
Nicholas P. Jewell© Copyright 2006, all rights reserved31 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference % % % %
Nicholas P. Jewell© Copyright 2006, all rights reserved32 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference
Nicholas P. Jewell© Copyright 2006, all rights reserved33 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference % % % % %
Nicholas P. Jewell© Copyright 2006, all rights reserved34 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference % % % % % ∞∞
Nicholas P. Jewell© Copyright 2006, all rights reserved35 RH (solid line), RR (dotted line), OR (dash- dotted line) as Risk Period extends in Time
Nicholas P. Jewell© Copyright 2006, all rights reserved36 Measures of Association: Odds Ratio Absolute comparison ER = 0 Independence ER is not symmetric in roles of D and E
Nicholas P. Jewell© Copyright 2006, all rights reserved37 Measures of Association: Attributable Risk Number of cases with current exposure distribution Number of cases with no exposure to E Population size = N
Nicholas P. Jewell© Copyright 2006, all rights reserved US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 AR (assoc. with unmarried)
Nicholas P. Jewell© Copyright 2006, all rights reserved39 Attributable Risk—Caution! Encourages causal interpretation that may be incorrect Assumes modification of E doesn’t change other risk factors "Baseball is 90% mental -- the other half is physical." (Yogi Berra)
Nicholas P. Jewell© Copyright 2006, all rights reserved40 Target Populaton, Study population and Sample Target Population Study Population Sample Selection bias may occur when Study Population differs from Study Population
Nicholas P. Jewell© Copyright 2006, all rights reserved41 Population-Based Study Need: Frame for Study Population Take a simple random sample of size n Measure D and E on sampled individuals Can estimate Joint probabilities, e.g. P(D & E) Marginal probabilities, e.g. P(D) Conditional probabilities, e.g. P(D | E)
Nicholas P. Jewell© Copyright 2006, all rights reserved42 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried75259 Married
Nicholas P. Jewell© Copyright 2006, all rights reserved43 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried75259 Married Joint probabilities Marginal probabilities Conditional probabilities
Nicholas P. Jewell© Copyright 2006, all rights reserved44 Cohort Study Need: Frame for Exposed and Unexposed Populations Take two (or more) simple random samples of size n E and n not E, separately from exposed and unexposed populations, respectively Measure D on sampled individuals Can estimate Some Conditional probabilities, e.g. P(D | E)
Nicholas P. Jewell© Copyright 2006, all rights reserved45 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried Married No Joint probabilities No Marginal probabilities Conditional probabilities
Nicholas P. Jewell© Copyright 2006, all rights reserved46 Case-Control Study Need: Frame for Diseases and No Disease Populations Take two simple random samples of size n D and n not D, separately from case-status groups Measure E on sampled individuals Can estimate Some Conditional probabilities, e.g. P(E | D)
Nicholas P. Jewell© Copyright 2006, all rights reserved47 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried Married No Joint probabilities No Marginal probabilities Conditional probabilities
Nicholas P. Jewell© Copyright 2006, all rights reserved48 Risk-Set (Density) Sampling For each incident case sampled at time t, select random set of controls from those still at risk at t Note control sampled at time s might be sampled as a case at time t 0T t
Nicholas P. Jewell© Copyright 2006, all rights reserved49 Example: HSV-2 and Cervical Cancer Study Population: 550,000 woman with donations to serum banks in Finland, Norway, and Sweden Cervical cancer cases identified over time and linked to serum bank data for identification of HSV-2 status 3 random controls chosen who were cancer free at the time of diagnosis of a case Caution: HSV-2 status is measured at time of donation rather than at time of sampling
Nicholas P. Jewell© Copyright 2006, all rights reserved50 Standard Case-Control Sampling Dnot D Exposur e E not E
Nicholas P. Jewell© Copyright 2006, all rights reserved51 Risk-Set Sampling Dnot D Exposure E not E 0T t
Nicholas P. Jewell© Copyright 2006, all rights reserved52 Case-Cohort Sampling Select cases as for traditional or risk-set-sampling; select random set of m ”controls” from all those at risk at beginning of interval Note “control” might also be sampled as a case 0T t All controls
Nicholas P. Jewell© Copyright 2006, all rights reserved53 Example: Low Fat Diet and Breast Cancer Women’s Health Trial randomly assigned 32,000 women (high risk group) to low fat intervention or control group All women filled out food questionnaires, and gave blood samples, at regular intervals over 10 years All breast cancer cases had their food diaries and blood samples analyzed 10% of original cohort were randomly selected to have their diaries and samples analyzed
Nicholas P. Jewell© Copyright 2006, all rights reserved54 Case-Cohort Sampling Dnot D Exposure E not E 0T t All controls
Nicholas P. Jewell© Copyright 2006, all rights reserved55 Case-Cohort Sampling:OR = RR (Bayes’ Theorem)
Nicholas P. Jewell© Copyright 2006, all rights reserved56 Rare Disease Assumption for OR RR Standard Case-control sampling Need rare disease assumption Risk Set Sampling No rare disease assumption if RH is of interest Case-Cohort Sampling No rare disease assumption if RR is of interest
Nicholas P. Jewell© Copyright 2006, all rights reserved57 2 x 2 Table Notation Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn
Nicholas P. Jewell© Copyright 2006, all rights reserved58 Chi-Squared Test Population-based study: Independence of D and E Look at estimate of P(D&E)-P(D)P(E) Yields (ad-bc)/n 2 Look at (ad-bc) or (ad-bc) 2 for simplicity Estimated variance of (ad-bc) is (a+b)(a+c)(b+d)(c+d)/n Yields
Nicholas P. Jewell© Copyright 2006, all rights reserved59 Statistic for Assessing Independence
Nicholas P. Jewell© Copyright 2006, all rights reserved60 Population-Based Study Birthweight LowNormal Marital Status at Birth Unmarried75259 Married p = 0.08
Nicholas P. Jewell© Copyright 2006, all rights reserved61 Cohort Study Cohort study Look at estimate of P(D|E)-P(D|not E) Yields (a/n 1 )-(c/n 2 ) where n 1 = a+b & n 2 = c+d Estimated variance of (a/n 1 )-(c/n 2 ) is Yields
Nicholas P. Jewell© Copyright 2006, all rights reserved62 Cohort Study Birthweight LowNormal Marital Status at Birth Unmarried Married p = 0.08
Nicholas P. Jewell© Copyright 2006, all rights reserved63 Case-Control Study Case-Control study Look at estimate of P(E|D)-P(E|not D) Yields (a/n 1 )-(b/n 2 ) where n 1 = a+c & n 2 = b+d Estimated variance of (a/n 1 )-(c/n 2 ) is Yields
Nicholas P. Jewell© Copyright 2006, all rights reserved64 Case-Control Study Birthweight LowNormal Marital Status at Birth Unmarried Married p = 0.002
Nicholas P. Jewell© Copyright 2006, all rights reserved65 Power Comparison Population- Based Cohort Case- Control 2 statistic P-value
Nicholas P. Jewell© Copyright 2006, all rights reserved66 Power Comparison for Specific Population: Cohort vs. Population-Based fixed is minimized, for fixed n when n 1 = n 2 = n/2
Nicholas P. Jewell© Copyright 2006, all rights reserved67 Power Comparison for Specific Population: Case-Control vs. Population-Based is minimized, for fixed n when n 1 = n 2 = n/2 fixed
Nicholas P. Jewell© Copyright 2006, all rights reserved68 Large-Sample Power Comparison Equal sample sizes of Exposed & Unexposed Cohort is more powerful than Population-Based Equal sample sizes of Cases & Controls Case-Control is more powerful than Population-Based
Nicholas P. Jewell© Copyright 2006, all rights reserved69 Power Comparison :Cohort & Case- Control (Equal Sample Sizes) fixed Power depends on size of (where because of equal sample sizes) d differs between cohort and case-control (although OR is fixed)
Nicholas P. Jewell© Copyright 2006, all rights reserved70 d against p d is biggest when p = (p 1 + p 2 ) /2= 0.5
Nicholas P. Jewell© Copyright 2006, all rights reserved71 Power Comparison :Cohort & Case- Control (Equal Sample Sizes) When P(E) is closer to 0.5 than P(D), the case-control design has greater power than the cohort When P(D) is closer to 0.5 than P(E), the cohort design has greater power than the case-control Since then the average of P(E|D) and P(E|not D) is closer to 0.5 than the average of P(D|E) and P(D|not E) Since then the average of P(D|E) and P(D|not E) is closer to 0.5 than the average of P(E|D) and P(E|not D)
Nicholas P. Jewell© Copyright 2006, all rights reserved72 Rule of Thumb about Power/Precision Want both exposure and disease marginals to be as balanced as possible given fixed total sample size For fixed design, more sample still always gives greater power For example, suppose fixed number of cases (n 1 ) Increasing controls (n 2 ) still increases power since will get smaller but with diminishing returns
Nicholas P. Jewell© Copyright 2006, all rights reserved73 Fixed Number of Cases-- Increasing Number of Controls R bigger means 2 statistic gets bigger by same amount
Nicholas P. Jewell© Copyright 2006, all rights reserved74 How many more Controls than Cases? Primary gain comes from going from k = 1 to k = 4
Nicholas P. Jewell© Copyright 2006, all rights reserved75 2 x 2 Table Notation Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn
Nicholas P. Jewell© Copyright 2006, all rights reserved76 Cohort Study Example (Population OR = 1) Disease Status DNot D Exposure status E84250 not E Typical Study p = 0.44
Nicholas P. Jewell© Copyright 2006, all rights reserved77 Cohort Study Example (Population OR = 1) 1,000 typical studies Smallest OR estimate = 0.15 Largest OR estimate = 7.58 Average of OR estimates = 1.16 (bias) Median of OR estimates = 1
Nicholas P. Jewell© Copyright 2006, all rights reserved78 Sampling Distribution of Odds Ratio Estimate not Normal--skewed
Nicholas P. Jewell© Copyright 2006, all rights reserved79 Cohort Study Example (Population OR = 1) 1,000 typical studies Smallest log(OR) estimate = =log(0.15) Largest log(OR) estimate = 2.03 = log(7.58) Average of OR estimates = (little bias) Median of OR estimates = 0 = log(1) I always use natural logarithms
Nicholas P. Jewell© Copyright 2006, all rights reserved80 Sampling Distribution of Log Odds Ratio Estimate
Nicholas P. Jewell© Copyright 2006, all rights reserved81 Confidence Intervals for the Odds Ratio Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(OR) and OR
Nicholas P. Jewell© Copyright 2006, all rights reserved82 Case-Control Study of Pancreatic Cancer Sex Disease Status Coffee Drinking (cups/day) Total Men Case Control Women Case Control Total
Nicholas P. Jewell© Copyright 2006, all rights reserved83 Case-Control Study of Pancreatic Cancer Pancreatic Cancer CasesControls Coffee Drinking (cups/day)
Nicholas P. Jewell© Copyright 2006, all rights reserved84 Estimate & Confidence Intervals for the Relative Risk Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(RR) and RR
Nicholas P. Jewell© Copyright 2006, all rights reserved85 Western Collaborative Group Study Occurrence of CHD YesNo Behavior Type Type A Type B
Nicholas P. Jewell© Copyright 2006, all rights reserved86 Estimate & Confidence Intervals for the Excess Risk Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for ER:
Nicholas P. Jewell© Copyright 2006, all rights reserved87 Western Collaborative Group Study Occurrence of CHD YesNo Behavior TypeType A Type B
Nicholas P. Jewell© Copyright 2006, all rights reserved88 Estimate & Confidence Intervals for the Attributable Risk: Population-Based Study Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(1-AR) and AR
Nicholas P. Jewell© Copyright 2006, all rights reserved89 Western Collaborative Group Study Occurrence of CHD YesNo Behavior TypeType A Type B
Nicholas P. Jewell© Copyright 2006, all rights reserved90 Small sample adjustments Odds Ratio Estimate: CIs: Relative Risk Estimate: Exact tests/CIs
Nicholas P. Jewell© Copyright 2006, all rights reserved91 Case-Control Study of Pancreatic Cancer Pancreatic Cancer CasesControls Coffee Drinking (cups/day) An exact 95% CI for OR is (1.64, 4.80)
Nicholas P. Jewell© Copyright 2006, all rights reserved92 Small Sample Ideas Be aware when you have entered “small sample world” where approximations may not be accurate and adjustments/exact methods may be required