The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July.

The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July 10, 2006 Day 1: Definitions, Measures of Disease Incidence & Association

Nicholas P. Jewell© Copyright 2006, all rights reserved2 Course Outline  Class meets from 8:30am—12:15pm  Break?  Labs  Meet 5:30—8pm (except Friday when it stops at 7pm)  Rough Idea of Topics  Day 1: Definitions, Measures of Disease Incidence and Association  Day 2: Confounding, Interaction & Stratification Techniques  Day 3: Regression Models, Logistic Regression and Maximum Likelihood  Day 4: Confounding & Interaction in Logistic Regression Models, Model Building & Goodness of Fit  Day 5: Matched Studies, Alternatives and Extensions to Logistic Regression

Nicholas P. Jewell© Copyright 2006, all rights reserved3 Binary Outcome Data Binary OutcomeExplanatory Factors Use of Mental Health Services in 2005 Costs of mental health visit, sex Moved Residence in 2005Family size, family income Low birthweight of newbornHealth insurance status of mother, marital status of mother Vote Republican in 2004 electionParental voting pattern, sex Health insurance coveragePlace of birth, marital status Employment status in 2005Education level Choice of transportation to workIncome

Nicholas P. Jewell© Copyright 2006, all rights reserved4 Issues Related to Application Area  Study design  Randomized?  Causality/association  Definition of binary outcome  Extensions  Longitudinal observations  More than 2 categories Ordered categories?

Nicholas P. Jewell© Copyright 2006, all rights reserved5 Other Issues  Statistical Art in addition to Statistical Science  Case studies  WCGS (CHD--men)  Coffee drinking and pancreatic cancer  Spontaneous abortion history and CHD (women)  Titanic

Nicholas P. Jewell© Copyright 2006, all rights reserved6 How do we Measure the Binary Outcome for Disease Occurrence?  Incidence/prevalence  Role of ‘time’ Chronological time Exposure time age Number of contacts  Incidence (time interval)  Prevalence (time point or interval)  Fractions: Incidence Proportion  unitless

Nicholas P. Jewell© Copyright 2006, all rights reserved7 Incidence Proportion  Definition (D, =1, “yes”):  Define risk interval explicitly including time scale (calendar year 2005, year of age 55, first year after menopause, etc)  Be at risk at the beginning of the interval (define explicitly what ‘at risk’ means)  Become an incident case during interval  Incidence proportion is fraction of at risk population who are D  Cumulative measure

Nicholas P. Jewell© Copyright 2006, all rights reserved8 Incidence Rate  Introduces time at risk into our thinking: Incidence Rate (time interval)  “=“ #D/cum. time at risk  Units are now time -1  Still measure applies to whole interval (so still cumulative in that sense)  Instantaneous Incidence rate: Hazard Function I(t) is the Incidence Proportion over the time interval [0,t]

Nicholas P. Jewell© Copyright 2006, all rights reserved11 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059

Nicholas P. Jewell© Copyright 2006, all rights reserved12 1991 US Infant Mortality  A: Death in First Year B: Unmarried Mother  P(A&B) = 0.0041  P(A) = 0.0086  P(B) = 0.295  P(A)xP(B) = 0.0086 x 0.295 = 0.0025

Nicholas P. Jewell© Copyright 2006, all rights reserved15 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 RR (assoc. with unmarried) =

Nicholas P. Jewell© Copyright 2006, all rights reserved19 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 OR (assoc. with unmarried)

Nicholas P. Jewell© Copyright 2006, all rights reserved28 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.01 1.00 0 0.022.002.021% 0.055.005.214.2% 0.1010.0011.0010%

Nicholas P. Jewell© Copyright 2006, all rights reserved30 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.05 1.00 0.012.002.11 0.153.003.35 0.204.004.75 0.5010.0019.00

Nicholas P. Jewell© Copyright 2006, all rights reserved31 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.05 1.00 0 0.012.002.115.6% 0.153.003.3511.8% 0.204.004.7518.8% 0.5010.0019.0090%

Nicholas P. Jewell© Copyright 2006, all rights reserved32 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.10 1.00 0.151.501.59 0.202.002.25 0.303.003.86 0.404.006.00 0.505.009.00

Nicholas P. Jewell© Copyright 2006, all rights reserved33 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.10 1.00 0 0.151.501.595.9% 0.202.002.2512.5% 0.303.003.8628.6% 0.404.006.0050% 0.505.009.0080%

Nicholas P. Jewell© Copyright 2006, all rights reserved34 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.20 1.00 0 0.301.501.7114.3% 0.402.002.6733.3% 0.502.504.0060% 0.603.006.00100% 0.804.0016.00300% 1.005.00 ∞∞

Nicholas P. Jewell© Copyright 2006, all rights reserved37 Measures of Association: Attributable Risk Number of cases with current exposure distribution Number of cases with no exposure to E Population size = N

Nicholas P. Jewell© Copyright 2006, all rights reserved38 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 AR (assoc. with unmarried)

Nicholas P. Jewell© Copyright 2006, all rights reserved39 Attributable Risk—Caution!  Encourages causal interpretation that may be incorrect  Assumes modification of E doesn’t change other risk factors  "Baseball is 90% mental -- the other half is physical." (Yogi Berra)

Nicholas P. Jewell© Copyright 2006, all rights reserved40 Target Populaton, Study population and Sample Target Population Study Population Sample Selection bias may occur when Study Population differs from Study Population

Nicholas P. Jewell© Copyright 2006, all rights reserved41 Population-Based Study  Need: Frame for Study Population  Take a simple random sample of size n  Measure D and E on sampled individuals  Can estimate  Joint probabilities, e.g. P(D & E)  Marginal probabilities, e.g. P(D)  Conditional probabilities, e.g. P(D | E)

Nicholas P. Jewell© Copyright 2006, all rights reserved43 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried75259 Married7134141 14186200 Joint probabilities Marginal probabilities Conditional probabilities

Nicholas P. Jewell© Copyright 2006, all rights reserved44 Cohort Study  Need: Frame for Exposed and Unexposed Populations  Take two (or more) simple random samples of size n E and n not E, separately from exposed and unexposed populations, respectively  Measure D on sampled individuals  Can estimate  Some Conditional probabilities, e.g. P(D | E)

Nicholas P. Jewell© Copyright 2006, all rights reserved45 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried1288100 Married595100 17183200 No Joint probabilities No Marginal probabilities Conditional probabilities

Nicholas P. Jewell© Copyright 2006, all rights reserved46 Case-Control Study  Need: Frame for Diseases and No Disease Populations  Take two simple random samples of size n D and n not D, separately from case-status groups  Measure E on sampled individuals  Can estimate  Some Conditional probabilities, e.g. P(E | D)

Nicholas P. Jewell© Copyright 2006, all rights reserved47 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried502878 Married5072122 100 200 No Joint probabilities No Marginal probabilities Conditional probabilities

Nicholas P. Jewell© Copyright 2006, all rights reserved48 Risk-Set (Density) Sampling  For each incident case sampled at time t, select random set of controls from those still at risk at t  Note control sampled at time s might be sampled as a case at time t 0T t

Nicholas P. Jewell© Copyright 2006, all rights reserved49 Example: HSV-2 and Cervical Cancer  Study Population: 550,000 woman with donations to serum banks in Finland, Norway, and Sweden  Cervical cancer cases identified over time and linked to serum bank data for identification of HSV-2 status  3 random controls chosen who were cancer free at the time of diagnosis of a case  Caution: HSV-2 status is measured at time of donation rather than at time of sampling

Nicholas P. Jewell© Copyright 2006, all rights reserved52 Case-Cohort Sampling  Select cases as for traditional or risk-set-sampling; select random set of m ”controls” from all those at risk at beginning of interval  Note “control” might also be sampled as a case 0T t All controls

Nicholas P. Jewell© Copyright 2006, all rights reserved53 Example: Low Fat Diet and Breast Cancer  Women’s Health Trial randomly assigned 32,000 women (high risk group) to low fat intervention or control group  All women filled out food questionnaires, and gave blood samples, at regular intervals over 10 years  All breast cancer cases had their food diaries and blood samples analyzed  10% of original cohort were randomly selected to have their diaries and samples analyzed

Nicholas P. Jewell© Copyright 2006, all rights reserved56 Rare Disease Assumption for OR  RR  Standard Case-control sampling  Need rare disease assumption  Risk Set Sampling  No rare disease assumption if RH is of interest  Case-Cohort Sampling  No rare disease assumption if RR is of interest

Nicholas P. Jewell© Copyright 2006, all rights reserved58 Chi-Squared Test  Population-based study: Independence of D and E  Look at estimate of P(D&E)-P(D)P(E)  Yields (ad-bc)/n 2  Look at (ad-bc) or (ad-bc) 2 for simplicity  Estimated variance of (ad-bc) is (a+b)(a+c)(b+d)(c+d)/n  Yields

Nicholas P. Jewell© Copyright 2006, all rights reserved61 Cohort Study  Cohort study  Look at estimate of P(D|E)-P(D|not E)  Yields (a/n 1 )-(c/n 2 ) where n 1 = a+b & n 2 = c+d  Estimated variance of (a/n 1 )-(c/n 2 ) is  Yields

Nicholas P. Jewell© Copyright 2006, all rights reserved63 Case-Control Study  Case-Control study  Look at estimate of P(E|D)-P(E|not D)  Yields (a/n 1 )-(b/n 2 ) where n 1 = a+c & n 2 = b+d  Estimated variance of (a/n 1 )-(c/n 2 ) is  Yields

Nicholas P. Jewell© Copyright 2006, all rights reserved68 Large-Sample Power Comparison  Equal sample sizes of Exposed & Unexposed Cohort is more powerful than Population-Based  Equal sample sizes of Cases & Controls Case-Control is more powerful than Population-Based

Nicholas P. Jewell© Copyright 2006, all rights reserved69 Power Comparison :Cohort & Case- Control (Equal Sample Sizes) fixed Power depends on size of (where because of equal sample sizes) d differs between cohort and case-control (although OR is fixed)

Nicholas P. Jewell© Copyright 2006, all rights reserved71 Power Comparison :Cohort & Case- Control (Equal Sample Sizes)  When P(E) is closer to 0.5 than P(D), the case-control design has greater power than the cohort  When P(D) is closer to 0.5 than P(E), the cohort design has greater power than the case-control Since then the average of P(E|D) and P(E|not D) is closer to 0.5 than the average of P(D|E) and P(D|not E) Since then the average of P(D|E) and P(D|not E) is closer to 0.5 than the average of P(E|D) and P(E|not D)

Nicholas P. Jewell© Copyright 2006, all rights reserved72 Rule of Thumb about Power/Precision  Want both exposure and disease marginals to be as balanced as possible given fixed total sample size  For fixed design, more sample still always gives greater power  For example, suppose fixed number of cases (n 1 )  Increasing controls (n 2 ) still increases power since will get smaller but with diminishing returns

Nicholas P. Jewell© Copyright 2006, all rights reserved76 Cohort Study Example (Population OR = 1) Disease Status DNot D Exposure status E84250 not E113950 1981100 Typical Study      p = 0.44

Nicholas P. Jewell© Copyright 2006, all rights reserved77 Cohort Study Example (Population OR = 1)  1,000 typical studies  Smallest OR estimate = 0.15  Largest OR estimate = 7.58  Average of OR estimates = 1.16 (bias)  Median of OR estimates = 1

Nicholas P. Jewell© Copyright 2006, all rights reserved79 Cohort Study Example (Population OR = 1)  1,000 typical studies  Smallest log(OR) estimate = -1.90 =log(0.15)  Largest log(OR) estimate = 2.03 = log(7.58)  Average of OR estimates = -0.011 (little bias)  Median of OR estimates = 0 = log(1) I always use natural logarithms

Nicholas P. Jewell© Copyright 2006, all rights reserved82 Case-Control Study of Pancreatic Cancer Sex Disease Status Coffee Drinking (cups/day) Total 01-23-45+ Men Case Control 9 32 94 119 53 74 60 82 216 307 Women Case Control 11 56 59 152 53 80 28 48 151 336 Total1084242602181010

Nicholas P. Jewell© Copyright 2006, all rights reserved88 Estimate & Confidence Intervals for the Attributable Risk: Population-Based Study Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(1-AR) and AR

Nicholas P. Jewell© Copyright 2006, all rights reserved91 Case-Control Study of Pancreatic Cancer Pancreatic Cancer CasesControls Coffee Drinking (cups/day) 347555902 02088108 3676431010 An exact 95% CI for OR is (1.64, 4.80)

Nicholas P. Jewell© Copyright 2006, all rights reserved92 Small Sample Ideas  Be aware when you have entered “small sample world” where approximations may not be accurate and adjustments/exact methods may be required

The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July.

Similar presentations

Presentation on theme: "The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July.

Similar presentations

Presentation on theme: "The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July."— Presentation transcript:

Similar presentations

About project

Feedback