Presentation is loading. Please wait.

Presentation is loading. Please wait.

The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July.

Similar presentations


Presentation on theme: "The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July."— Presentation transcript:

1 The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July 10, 2006 Day 1: Definitions, Measures of Disease Incidence & Association

2 Nicholas P. Jewell© Copyright 2006, all rights reserved2 Course Outline  Class meets from 8:30am—12:15pm  Break?  Labs  Meet 5:30—8pm (except Friday when it stops at 7pm)  Rough Idea of Topics  Day 1: Definitions, Measures of Disease Incidence and Association  Day 2: Confounding, Interaction & Stratification Techniques  Day 3: Regression Models, Logistic Regression and Maximum Likelihood  Day 4: Confounding & Interaction in Logistic Regression Models, Model Building & Goodness of Fit  Day 5: Matched Studies, Alternatives and Extensions to Logistic Regression

3 Nicholas P. Jewell© Copyright 2006, all rights reserved3 Binary Outcome Data Binary OutcomeExplanatory Factors Use of Mental Health Services in 2005 Costs of mental health visit, sex Moved Residence in 2005Family size, family income Low birthweight of newbornHealth insurance status of mother, marital status of mother Vote Republican in 2004 electionParental voting pattern, sex Health insurance coveragePlace of birth, marital status Employment status in 2005Education level Choice of transportation to workIncome

4 Nicholas P. Jewell© Copyright 2006, all rights reserved4 Issues Related to Application Area  Study design  Randomized?  Causality/association  Definition of binary outcome  Extensions  Longitudinal observations  More than 2 categories Ordered categories?

5 Nicholas P. Jewell© Copyright 2006, all rights reserved5 Other Issues  Statistical Art in addition to Statistical Science  Case studies  WCGS (CHD--men)  Coffee drinking and pancreatic cancer  Spontaneous abortion history and CHD (women)  Titanic

6 Nicholas P. Jewell© Copyright 2006, all rights reserved6 How do we Measure the Binary Outcome for Disease Occurrence?  Incidence/prevalence  Role of ‘time’ Chronological time Exposure time age Number of contacts  Incidence (time interval)  Prevalence (time point or interval)  Fractions: Incidence Proportion  unitless

7 Nicholas P. Jewell© Copyright 2006, all rights reserved7 Incidence Proportion  Definition (D, =1, “yes”):  Define risk interval explicitly including time scale (calendar year 2005, year of age 55, first year after menopause, etc)  Be at risk at the beginning of the interval (define explicitly what ‘at risk’ means)  Become an incident case during interval  Incidence proportion is fraction of at risk population who are D  Cumulative measure

8 Nicholas P. Jewell© Copyright 2006, all rights reserved8 Incidence Rate  Introduces time at risk into our thinking: Incidence Rate (time interval)  “=“ #D/cum. time at risk  Units are now time -1  Still measure applies to whole interval (so still cumulative in that sense)  Instantaneous Incidence rate: Hazard Function I(t) is the Incidence Proportion over the time interval [0,t]

9 Nicholas P. Jewell© Copyright 2006, all rights reserved9 Hazard Function for Caucasian Males in California in 1980

10 Nicholas P. Jewell© Copyright 2006, all rights reserved10 Survival Function (1-I(t)) for Caucasian Males in California in 1980

11 Nicholas P. Jewell© Copyright 2006, all rights reserved11 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059

12 Nicholas P. Jewell© Copyright 2006, all rights reserved12 1991 US Infant Mortality  A: Death in First Year B: Unmarried Mother  P(A&B) = 0.0041  P(A) = 0.0086  P(B) = 0.295  P(A)xP(B) = 0.0086 x 0.295 = 0.0025

13 Nicholas P. Jewell© Copyright 2006, all rights reserved13 Measures of Association: Relative Risk   Relative measure  RR = 1 Independence  Note upper bound  RR is not symmetric in roles of D and E

14 Nicholas P. Jewell© Copyright 2006, all rights reserved14 Non-Symmetry of RR

15 Nicholas P. Jewell© Copyright 2006, all rights reserved15 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 RR (assoc. with unmarried) =

16 Nicholas P. Jewell© Copyright 2006, all rights reserved16 Measures of Association: Odds Ratio   Relative measure  OR = 1 Independence  No upper bound  OR is symmetric in roles of D and E

17 Nicholas P. Jewell© Copyright 2006, all rights reserved17 Symmetry of OR

18 Nicholas P. Jewell© Copyright 2006, all rights reserved18 Symmetry of OR

19 Nicholas P. Jewell© Copyright 2006, all rights reserved19 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 OR (assoc. with unmarried)

20 Nicholas P. Jewell© Copyright 2006, all rights reserved20 OR as Approximation to RR

21 Nicholas P. Jewell© Copyright 2006, all rights reserved21 OR as Approximation to RR

22 Nicholas P. Jewell© Copyright 2006, all rights reserved22 OR as Approximation to RR

23 Nicholas P. Jewell© Copyright 2006, all rights reserved23 OR as Approximation to RR

24 Nicholas P. Jewell© Copyright 2006, all rights reserved24 OR as Approximation to RR

25 Nicholas P. Jewell© Copyright 2006, all rights reserved25 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.01 0.02 0.05 0.10

26 Nicholas P. Jewell© Copyright 2006, all rights reserved26 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.01 1.00 0.022.00 0.055.00 0.1010.00

27 Nicholas P. Jewell© Copyright 2006, all rights reserved27 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.01 1.00 0.022.002.02 0.055.005.21 0.1010.0011.00

28 Nicholas P. Jewell© Copyright 2006, all rights reserved28 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.01 1.00 0 0.022.002.021% 0.055.005.214.2% 0.1010.0011.0010%

29 Nicholas P. Jewell© Copyright 2006, all rights reserved29 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.05 0.01 0.15 0.20 0.50

30 Nicholas P. Jewell© Copyright 2006, all rights reserved30 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.05 1.00 0.012.002.11 0.153.003.35 0.204.004.75 0.5010.0019.00

31 Nicholas P. Jewell© Copyright 2006, all rights reserved31 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.05 1.00 0 0.012.002.115.6% 0.153.003.3511.8% 0.204.004.7518.8% 0.5010.0019.0090%

32 Nicholas P. Jewell© Copyright 2006, all rights reserved32 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.10 1.00 0.151.501.59 0.202.002.25 0.303.003.86 0.404.006.00 0.505.009.00

33 Nicholas P. Jewell© Copyright 2006, all rights reserved33 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.10 1.00 0 0.151.501.595.9% 0.202.002.2512.5% 0.303.003.8628.6% 0.404.006.0050% 0.505.009.0080%

34 Nicholas P. Jewell© Copyright 2006, all rights reserved34 Comparison of RR and OR at Various Risk Levels P(D|not E)P(D|E)RRORRelative Difference 0.20 1.00 0 0.301.501.7114.3% 0.402.002.6733.3% 0.502.504.0060% 0.603.006.00100% 0.804.0016.00300% 1.005.00 ∞∞

35 Nicholas P. Jewell© Copyright 2006, all rights reserved35 RH (solid line), RR (dotted line), OR (dash- dotted line) as Risk Period extends in Time

36 Nicholas P. Jewell© Copyright 2006, all rights reserved36 Measures of Association: Odds Ratio   Absolute comparison  ER = 0 Independence  ER is not symmetric in roles of D and E

37 Nicholas P. Jewell© Copyright 2006, all rights reserved37 Measures of Association: Attributable Risk Number of cases with current exposure distribution Number of cases with no exposure to E Population size = N

38 Nicholas P. Jewell© Copyright 2006, all rights reserved38 1991 US Infant Mortality Mother’s Marital Status Infant Mortality UnmarriedMarriedTotal Death16,71218,78435,496 Live at 1 Year 1,197,1422,878,4214,075,563 Total1,213,8542,897,2054,111,059 AR (assoc. with unmarried)

39 Nicholas P. Jewell© Copyright 2006, all rights reserved39 Attributable Risk—Caution!  Encourages causal interpretation that may be incorrect  Assumes modification of E doesn’t change other risk factors  "Baseball is 90% mental -- the other half is physical." (Yogi Berra)

40 Nicholas P. Jewell© Copyright 2006, all rights reserved40 Target Populaton, Study population and Sample Target Population Study Population Sample Selection bias may occur when Study Population differs from Study Population

41 Nicholas P. Jewell© Copyright 2006, all rights reserved41 Population-Based Study  Need: Frame for Study Population  Take a simple random sample of size n  Measure D and E on sampled individuals  Can estimate  Joint probabilities, e.g. P(D & E)  Marginal probabilities, e.g. P(D)  Conditional probabilities, e.g. P(D | E)

42 Nicholas P. Jewell© Copyright 2006, all rights reserved42 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried75259 Married7134141 14186200

43 Nicholas P. Jewell© Copyright 2006, all rights reserved43 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried75259 Married7134141 14186200 Joint probabilities Marginal probabilities Conditional probabilities

44 Nicholas P. Jewell© Copyright 2006, all rights reserved44 Cohort Study  Need: Frame for Exposed and Unexposed Populations  Take two (or more) simple random samples of size n E and n not E, separately from exposed and unexposed populations, respectively  Measure D on sampled individuals  Can estimate  Some Conditional probabilities, e.g. P(D | E)

45 Nicholas P. Jewell© Copyright 2006, all rights reserved45 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried1288100 Married595100 17183200 No Joint probabilities No Marginal probabilities Conditional probabilities

46 Nicholas P. Jewell© Copyright 2006, all rights reserved46 Case-Control Study  Need: Frame for Diseases and No Disease Populations  Take two simple random samples of size n D and n not D, separately from case-status groups  Measure E on sampled individuals  Can estimate  Some Conditional probabilities, e.g. P(E | D)

47 Nicholas P. Jewell© Copyright 2006, all rights reserved47 Marital Status & Birthweight Birthweight LowNormal Marital Status at Birth Unmarried502878 Married5072122 100 200 No Joint probabilities No Marginal probabilities Conditional probabilities

48 Nicholas P. Jewell© Copyright 2006, all rights reserved48 Risk-Set (Density) Sampling  For each incident case sampled at time t, select random set of controls from those still at risk at t  Note control sampled at time s might be sampled as a case at time t 0T t

49 Nicholas P. Jewell© Copyright 2006, all rights reserved49 Example: HSV-2 and Cervical Cancer  Study Population: 550,000 woman with donations to serum banks in Finland, Norway, and Sweden  Cervical cancer cases identified over time and linked to serum bank data for identification of HSV-2 status  3 random controls chosen who were cancer free at the time of diagnosis of a case  Caution: HSV-2 status is measured at time of donation rather than at time of sampling

50 Nicholas P. Jewell© Copyright 2006, all rights reserved50 Standard Case-Control Sampling Dnot D Exposur e E not E

51 Nicholas P. Jewell© Copyright 2006, all rights reserved51 Risk-Set Sampling Dnot D Exposure E not E 0T t

52 Nicholas P. Jewell© Copyright 2006, all rights reserved52 Case-Cohort Sampling  Select cases as for traditional or risk-set-sampling; select random set of m ”controls” from all those at risk at beginning of interval  Note “control” might also be sampled as a case 0T t All controls

53 Nicholas P. Jewell© Copyright 2006, all rights reserved53 Example: Low Fat Diet and Breast Cancer  Women’s Health Trial randomly assigned 32,000 women (high risk group) to low fat intervention or control group  All women filled out food questionnaires, and gave blood samples, at regular intervals over 10 years  All breast cancer cases had their food diaries and blood samples analyzed  10% of original cohort were randomly selected to have their diaries and samples analyzed

54 Nicholas P. Jewell© Copyright 2006, all rights reserved54 Case-Cohort Sampling Dnot D Exposure E not E 0T t All controls

55 Nicholas P. Jewell© Copyright 2006, all rights reserved55 Case-Cohort Sampling:OR = RR (Bayes’ Theorem)

56 Nicholas P. Jewell© Copyright 2006, all rights reserved56 Rare Disease Assumption for OR  RR  Standard Case-control sampling  Need rare disease assumption  Risk Set Sampling  No rare disease assumption if RH is of interest  Case-Cohort Sampling  No rare disease assumption if RR is of interest

57 Nicholas P. Jewell© Copyright 2006, all rights reserved57 2 x 2 Table Notation Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn

58 Nicholas P. Jewell© Copyright 2006, all rights reserved58 Chi-Squared Test  Population-based study: Independence of D and E  Look at estimate of P(D&E)-P(D)P(E)  Yields (ad-bc)/n 2  Look at (ad-bc) or (ad-bc) 2 for simplicity  Estimated variance of (ad-bc) is (a+b)(a+c)(b+d)(c+d)/n  Yields

59 Nicholas P. Jewell© Copyright 2006, all rights reserved59 Statistic for Assessing Independence

60 Nicholas P. Jewell© Copyright 2006, all rights reserved60 Population-Based Study Birthweight LowNormal Marital Status at Birth Unmarried75259 Married7134141 14186200 p = 0.08

61 Nicholas P. Jewell© Copyright 2006, all rights reserved61 Cohort Study  Cohort study  Look at estimate of P(D|E)-P(D|not E)  Yields (a/n 1 )-(c/n 2 ) where n 1 = a+b & n 2 = c+d  Estimated variance of (a/n 1 )-(c/n 2 ) is  Yields

62 Nicholas P. Jewell© Copyright 2006, all rights reserved62 Cohort Study Birthweight LowNormal Marital Status at Birth Unmarried1288100 Married595100 17183200 p = 0.08

63 Nicholas P. Jewell© Copyright 2006, all rights reserved63 Case-Control Study  Case-Control study  Look at estimate of P(E|D)-P(E|not D)  Yields (a/n 1 )-(b/n 2 ) where n 1 = a+c & n 2 = b+d  Estimated variance of (a/n 1 )-(c/n 2 ) is  Yields

64 Nicholas P. Jewell© Copyright 2006, all rights reserved64 Case-Control Study Birthweight LowNormal Marital Status at Birth Unmarried502878 Married5072122 100 200 p = 0.002

65 Nicholas P. Jewell© Copyright 2006, all rights reserved65 Power Comparison Population- Based Cohort Case- Control  2 statistic 3.043.1510.17 P-value0.08 0.002

66 Nicholas P. Jewell© Copyright 2006, all rights reserved66 Power Comparison for Specific Population: Cohort vs. Population-Based fixed is minimized, for fixed n when n 1 = n 2 = n/2

67 Nicholas P. Jewell© Copyright 2006, all rights reserved67 Power Comparison for Specific Population: Case-Control vs. Population-Based is minimized, for fixed n when n 1 = n 2 = n/2 fixed

68 Nicholas P. Jewell© Copyright 2006, all rights reserved68 Large-Sample Power Comparison  Equal sample sizes of Exposed & Unexposed Cohort is more powerful than Population-Based  Equal sample sizes of Cases & Controls Case-Control is more powerful than Population-Based

69 Nicholas P. Jewell© Copyright 2006, all rights reserved69 Power Comparison :Cohort & Case- Control (Equal Sample Sizes) fixed Power depends on size of (where because of equal sample sizes) d differs between cohort and case-control (although OR is fixed)

70 Nicholas P. Jewell© Copyright 2006, all rights reserved70 d against p d is biggest when p = (p 1 + p 2 ) /2= 0.5

71 Nicholas P. Jewell© Copyright 2006, all rights reserved71 Power Comparison :Cohort & Case- Control (Equal Sample Sizes)  When P(E) is closer to 0.5 than P(D), the case-control design has greater power than the cohort  When P(D) is closer to 0.5 than P(E), the cohort design has greater power than the case-control Since then the average of P(E|D) and P(E|not D) is closer to 0.5 than the average of P(D|E) and P(D|not E) Since then the average of P(D|E) and P(D|not E) is closer to 0.5 than the average of P(E|D) and P(E|not D)

72 Nicholas P. Jewell© Copyright 2006, all rights reserved72 Rule of Thumb about Power/Precision  Want both exposure and disease marginals to be as balanced as possible given fixed total sample size  For fixed design, more sample still always gives greater power  For example, suppose fixed number of cases (n 1 )  Increasing controls (n 2 ) still increases power since will get smaller but with diminishing returns

73 Nicholas P. Jewell© Copyright 2006, all rights reserved73 Fixed Number of Cases-- Increasing Number of Controls R bigger means  2 statistic gets bigger by same amount

74 Nicholas P. Jewell© Copyright 2006, all rights reserved74 How many more Controls than Cases? Primary gain comes from going from k = 1 to k = 4

75 Nicholas P. Jewell© Copyright 2006, all rights reserved75 2 x 2 Table Notation Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn

76 Nicholas P. Jewell© Copyright 2006, all rights reserved76 Cohort Study Example (Population OR = 1) Disease Status DNot D Exposure status E84250 not E113950 1981100 Typical Study      p = 0.44

77 Nicholas P. Jewell© Copyright 2006, all rights reserved77 Cohort Study Example (Population OR = 1)  1,000 typical studies  Smallest OR estimate = 0.15  Largest OR estimate = 7.58  Average of OR estimates = 1.16 (bias)  Median of OR estimates = 1

78 Nicholas P. Jewell© Copyright 2006, all rights reserved78 Sampling Distribution of Odds Ratio Estimate not Normal--skewed

79 Nicholas P. Jewell© Copyright 2006, all rights reserved79 Cohort Study Example (Population OR = 1)  1,000 typical studies  Smallest log(OR) estimate = -1.90 =log(0.15)  Largest log(OR) estimate = 2.03 = log(7.58)  Average of OR estimates = -0.011 (little bias)  Median of OR estimates = 0 = log(1) I always use natural logarithms

80 Nicholas P. Jewell© Copyright 2006, all rights reserved80 Sampling Distribution of Log Odds Ratio Estimate

81 Nicholas P. Jewell© Copyright 2006, all rights reserved81 Confidence Intervals for the Odds Ratio Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(OR) and OR

82 Nicholas P. Jewell© Copyright 2006, all rights reserved82 Case-Control Study of Pancreatic Cancer Sex Disease Status Coffee Drinking (cups/day) Total 01-23-45+ Men Case Control 9 32 94 119 53 74 60 82 216 307 Women Case Control 11 56 59 152 53 80 28 48 151 336 Total1084242602181010

83 Nicholas P. Jewell© Copyright 2006, all rights reserved83 Case-Control Study of Pancreatic Cancer Pancreatic Cancer CasesControls Coffee Drinking (cups/day) 347555902 02088108 3676431010

84 Nicholas P. Jewell© Copyright 2006, all rights reserved84 Estimate & Confidence Intervals for the Relative Risk Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(RR) and RR

85 Nicholas P. Jewell© Copyright 2006, all rights reserved85 Western Collaborative Group Study Occurrence of CHD YesNo Behavior Type Type A17814111589 Type B7914861565 25728973154

86 Nicholas P. Jewell© Copyright 2006, all rights reserved86 Estimate & Confidence Intervals for the Excess Risk Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for ER:

87 Nicholas P. Jewell© Copyright 2006, all rights reserved87 Western Collaborative Group Study Occurrence of CHD YesNo Behavior TypeType A17814111589 Type B7914861565 25728973154

88 Nicholas P. Jewell© Copyright 2006, all rights reserved88 Estimate & Confidence Intervals for the Attributable Risk: Population-Based Study Disease Status Dnot D Exposure Eaba+b not Ecdc+d a+cb+dn 95% CIs for log(1-AR) and AR

89 Nicholas P. Jewell© Copyright 2006, all rights reserved89 Western Collaborative Group Study Occurrence of CHD YesNo Behavior TypeType A17814111589 Type B7914861565 25728973154

90 Nicholas P. Jewell© Copyright 2006, all rights reserved90 Small sample adjustments  Odds Ratio  Estimate:  CIs:  Relative Risk  Estimate:  Exact tests/CIs

91 Nicholas P. Jewell© Copyright 2006, all rights reserved91 Case-Control Study of Pancreatic Cancer Pancreatic Cancer CasesControls Coffee Drinking (cups/day) 347555902 02088108 3676431010 An exact 95% CI for OR is (1.64, 4.80)

92 Nicholas P. Jewell© Copyright 2006, all rights reserved92 Small Sample Ideas  Be aware when you have entered “small sample world” where approximations may not be accurate and adjustments/exact methods may be required


Download ppt "The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July."

Similar presentations


Ads by Google