Presentation is loading. Please wait.

Presentation is loading. Please wait.

Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs.

Similar presentations


Presentation on theme: "Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs."— Presentation transcript:

1 Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs

2 Why different epidemiologic designs? It is generally not possible to observe everyone in a population New questions arise after data / samples have been collected Cost and feasibility Statistical efficiency and appropriateness to study question

3 The possibilities There are many approaches  Sampling from the whole population  Sampling from exposure  Sampling from caseness  Haphazard selection

4 True Population Configuration Underlying Epidemiologic Study Designs Exposed N = A + B Diseased N = A Nondiseased N = B Unexposed N = C + D Diseased N = C Nondiseased N = D Time 0 Time 1

5 True Population Configuration Underlying Epidemiologic Study Designs: Approximate numbers for Minnesota Exposed N = 1,000,000 Diseased N = 50,000 Nondiseased N = 950,000 Unexposed N = 3,000,000 Diseased N = 50,000 Nondiseased N = 2,950,000 Time 0 Time 1

6 Alternate Format: Population Values Exposed and Diseased DiseasedNot diseased ExposedAC Not exposedBD The numbers A, B, C, D are fixed and known exactly

7 Alternate Format: Population Values Exposed and Diseased DiseasedNot diseased ExposedA = 50,000C = 950,000 Not exposedB = 50,000D = 2,950,000 The numbers A, B, C, D are fixed and known exactly

8 Measures of Risk and Relative Risk: Whole Population Diseased Not diseased Odds Risk = Probability ExposedACA/CA/(A+C) Not exposed BDB/DB/(B+D) Exposure Ratio A/(A+B)C/(C+D) Risk Difference = A/(A+C) – B/(B+D) Risk Ratio, Relative Risk = {A/(A+C)} / {B/(B+D)} Odds Ratio = A/C / B/D = AD/BC

9 Measures of Risk and Relative Risk: Whole Population Diseased Not diseased Odds Risk = Probability Exposed50,000950,0000.053.05 Not exposed 50,0002,950,0000.017.017 Exposure Ratio 0.50.244 Risk Difference = 0.033 Risk Ratio, Relative Risk = 3 Odds Ratio = 3.11

10 Epidemiologic studies sample from A, B, C, and D to estimate Odds or Risk; Risk Difference, Risk Ratio, or Relative Risk Epidemiologic design is determined by investigator control, temporality, sampling fraction

11 Epidemiologic design: level of investigator control Clinical Trial  Exposure assigned (at random)  Reflects temporary state Observational  Exposure occurs naturally  Often reflects long term state

12 Epidemiologic design: temporality Clinical Trial, Cohort, Nested case control, Case-cohort  Exposure assessed at variable times before disease Cross-sectional  Exposure assessed simultaneously with disease Case-control  Past exposure assessed simultaneously with disease

13 Epidemiologic design: sampling fraction Cells A, B, C, and D are sampled at random with constant probability (called the sampling fraction) Sample size is a, b, c, d If a/A = b/B = c/C = d/D then the sampling fraction is equal for all cells

14 Sampling fractions DiseasedNot diseased Exposeda/A = f A c/C = f C Not exposedb/B = f B d/D = f D The numbers A, B, C, D are fixed and known exactly. The numbers a,b,c,d are realized in a given study, determined during the study.

15 Expected observations given sampling fractions DiseasedNot diseased Exposed a = 5,000 f A =0.1 c = 950 f C =0.001 Not exposed b = 1,250 f B =0.025 d = 5,900 f D =0.002 Risk: naïve (and wrong) = 5000/5950 = 0.84 and 1250/7150 = 0.175; naïve relative risk = 4.8 Correct risk = 5000/0.1/ (5000/0.1 + 950/0.001) = 0.05 and 1250/0.025 / (1250/0.025 + 5900/0.002) = 0.017 leading to Relative risk = 0.05/0.017 = 3

16 Observations given sampling fractions DiseasedNot diseased Exposed a = 5,000 +  a f A =0.1 c = 950 +  c f C =0.001 Not exposed b = 1,250 +  b f B =0.025 d = 5,900 +  d f D =0.002 All estimates differ from population values by random amounts (see example in Excel file)

17 Epidemiologic design: sampling fraction Cross-sectional: sample equally from everyone f A = f B = f C = f D Clinical trial, Cohort study: sample equally from initial exposure groups:  f A+C and f B+D  (a+c)/(A+C) usually differ from (b+d)/(B+D) in clinical trial, usually the same in cohort study

18 Cross-sectional Study DiseasedNot diseased Exposeda/A = fc/C = f Not exposedb/B = fd/D = f Sampling fraction is the same in all cells. Risk and odds estimates are unbiased, so risk differences and ratios are unbiased.

19 Expected Cross-sectional Study DiseasedNot diseased Exposed a = 50 f A =0.001 c = 950 f C =0.001 Not exposed b = 50 f B =0.001 d = 2,950 f D =0.001 Naïve risks and relative risks are correct! 50/1000 = 0.05, etc.

20 Observed Cross-sectional Study DiseasedNot diseased Exposed a = 50 +  a f A =0.001 c = 950 +  c f C =0.001 Not exposed b = 50 +  b f B =0.001 d = 2,950 –  abc f D =0.001 All estimates differ from population values by random amounts

21 Clinical Trial or Cohort Study DiseasedNot diseased Exposeda/A = f A+C c/C = f A+C Not exposedb/B = f B+D d/D = f B+D Sampling fraction is fixed within exposed and within not exposed. Usually f A+C not = f B+D in clinical trial, f A+C = f B+D in cohort study (which has cross-sectional baseline). Risk and odds estimates are unbiased, so risk differences and ratios are unbiased.

22 Expected Clinical Trial or Cohort Study DiseasedNot diseased Exposed a = 100 f A+C =0.002 c = 1900 f A+C =0.002 Not exposed b = 50 f B+D =0.001 d = 2950 f B+D =0.001 f A+C usually = f B+D in a cohort study f A+C may differ from f B+D in a clinical trial (if treatment allocation is not 1:1)

23 Expected Measures of Risk and Relative Risk: Clinical Trial or Cohort Study Diseased Not diseased Odds Risk = Probability Exposed1001,9000.053.05 Not exposed 502,9500.017.017 Exposure Ratio 0.670.39  Differs from total population Correct Risk Difference = 0.033 Correct Risk Ratio, Relative Risk = 3 Odds Ratio = 3.11

24 Observed Clinical Trial or Cohort Study DiseasedNot diseased Exposed a = 100 +  a f A+C =0.002 c = 1900 -  a f A+C =0.002 Not exposed b = 50 +  b f B+D =0.001 d = 2950 -  b f B+D =0.001 All estimates differ from population values by random amounts

25 Epidemiologic design: sampling fraction Case control: sample differentially within diseased and within nondiseased  f A = f B = f A+B and f C = f D = f C+D  Usually f A+B much greater than f C+D

26 Case-control Study DiseasedNot diseased Exposeda/A = f A+B c/C = f C+D Not exposedb/B = f A+B d/D = f C+D Sampling fraction is fixed with diseased and within not diseased. Exposure probabilities and odds estimates are unbiased, but risk, disease odds, risk differences and ratios are biased. Odds ratio ~ relative risk when disease is rare.

27 Expected Case-control Study DiseasedNot diseased Exposed a = 500 f A+B =0.01 c = 494 f C+D =0.00052 Not exposed b = 500 f A+B =0.01 d = 1534 f C+D =0.00052 f A+B = 19.23 * f C+D

28 Expected Measures of Risk and Relative Risk: Case-Control Study Diseased Not diseased Odds Risk = Probability Exposed5004941.010.503 Not exposed 5001,5340.330.246 Exposure Ratio 0.50.24 ^^Differs from total population Incorrect Risk Difference = 0.257 Incorrect Risk Ratio, Relative Risk = 2.04 Odds Ratio = 3.11  correct and approx = true Rel Risk

29 Observed Case-control Study DiseasedNot diseased Exposed a = 500 +  a f A+B =0.01 c = 494 +  c f C+D =0.00052 Not exposed b = 500 -  a f A+B =0.01 d = 1534 -  c f C+D =0.00052

30 Epidemiologic design: sampling fraction Nested case control: sample differentially within diseased and within nondiseased starting with a cross-sectional base, so exposure measured prior to disease diagnosis  f A = f B = f A+B and f C = f D = f C+D  Often f A+B = 1  Usually f A+B somewhat greater than f C+D

31 Nested Case-Control Study, 1: Observed Cross-sectional Study DiseasedNot diseased Exposed a = 500 +  a f A =0.01 c = 9500 +  c f C =0.01 Not exposed b = 500 +  b f B =0.01 d = 29500 –  abc f D =0.01 Previous cross-sectional example with sampling fractions increased by a factor of 10

32 Nested Case-Control Study, 2: Sampling from the cross-section DiseasedNot diseased Exposeda/A = f A+B c/C = f C+D Not exposedb/B = f A+B d/D = f C+D Sampling fraction is fixed within diseased and within not diseased; temporality preserved. Exposure probabilities and odds estimates are unbiased, but risk, disease odds, risk differences and ratios are biased. Odds ratio ~ relative risk when disease is rare.

33 Observed Nested Case-Control Study DiseasedNot diseased Exposed a = 500 +  a f A+B =1 c = 950 +  c +  c1 f C+D =0.01 Not exposed b = 500 +  b f A+B =1 d = 2950 –  abc –  c1 f C+D =0.01  a,  b,  c are ignored; if f A+B < 1 then there is an  a1.

34 Expected Measures of Risk and Relative Risk: Nested Case-Control Study Diseased Not diseased Odds Risk = Probability Exposed5009500.5260.344 Not exposed 5002,9500.1690.145 Exposure Ratio 0.50.24 ^^Differs from total population Incorrect Risk Difference = 0.199 Incorrect Risk Ratio, Relative Risk = 2.38 Odds Ratio = 3.11  correct and approx = true Rel Risk

35 Epidemiologic design: sampling fraction Case cohort: sample differentially within diseased and within everyone (diseased + nondiseased) starting with a cross-sectional base, so exposure measured prior to disease diagnosis  f A = f B = f A+B ; the whole cohort is sampled at f A+B+C+D  Usually f A+B = 1, while f A+B+C+D is a sizeable fraction like 0.1 or 0.25.

36 Case-Cohort Study, 1: Observed Cross-sectional Study DiseasedNot diseased Exposed a = 500 +  a f A =0.01 c = 9500 +  c f C =0.01 Not exposed b = 500 +  b f B =0.01 d = 29500 –  abc f D =0.01 Previous cross-sectional example with sampling fractions increased by a factor of 10

37 Case-Cohort Study, 2: sampling from the cross-section Diseased Cohort (Part of all ppts) ExposedA/A, f A+B =1(a+c)/(A+C) = f A+B+C+D Not exposedB/B, f A+B =1(b+d)/(B+D) = f A+B+C+D Sampling fraction is fixed within diseased and within not diseased; temporality preserved; cohort includes cases and noncases. Risk and odds estimates are unbiased within exposed and within unexposed but differently weighted, so risk differences biased Risk ratios are unbiased.

38 Observed Case-Cohort Study DiseasedCohort Exposed a = 500 +  a f A+B =1 c = 1000+  c +  c1 f A+B+C+D =0.1 Not exposed b = 500 +  b f A+B =1 d = 3000–  abc –  c1 f A+B+C+D =0.1

39 Observed Case-Cohort Study Case, f A+B =1 Cohort, f A+B+C+D =0.1 Diseased Not diseased Exposed a = 500 +  a 50+  c +  c1 950+  c +  c3 Not exposed b = 500 +  b 50+  c +  c2 2950 –  abc –  c123 When f A+B = 1, cohort diseased is a subset of case diseased. When f A+B < 1, cohort diseased usually overlaps case diseased.

40 Nested Case-Control vs Case Cohort Same cases in both For a certain sampling strategy, same noncases in both Analytic strategy different

41 Expected Measures of Risk and Relative Risk: Case-Cohort Study DiseasedCohortOdds Risk = Probability Exposed5001,000n/a0.5 Not exposed 5003,000n/a0.167 Exposure Ratio 0.50.25  ^^Differs from total population Incorrect Risk Difference = 0.333 (true risk diff/f A+B+C+D ) “Odds ratio” = Correct Risk Ratio, Relative Risk = 3 Relative risk would be correct even if the disease were rare

42 Analysis of Case Cohort Study Set up table as if cohort were the control group Include the overlapping cases in both cases and cohort Compute ad/bc You have estimated relative risk Note:  If you know the cohort sampling fraction, you can multiply the cohort up and estimate true risks  Given additional error in second stage cohort sampling, this is less efficient than estimating relative risk without upweighting

43 Analysis of Case Control and Case Cohort Study Case Control  Logistic regression  e b is an odds ratio  Temporal bias? Nested Case Control  Logistic regression  e b is an odds ratio  No temporal bias Case Cohort  Logistic regression or Linear regression  e b is a relative risk, not an odds ratio  No Temporal bias  Variance somewhat high unless robust variance estimate is used (e.g. PROC GENMOD with GEE option)

44 Disadvantage of Case Control vs. Case Cohort Study Case Control and Nested Case Control  Inflexible: Outcome is fixed  Even in nested case control study the sampling structure is usually unknown Case Cohort  Ideal for the intended outcome or for multiple outcomes  If cohort is large enough, multiple outcomes can be analyzed  Cases can be included in analysis of alternate dependent variable because sampling structure is known

45 Person years of risk, 1 The foregoing assumes that all cases occur at the same time (or can safely be treated as such) In many, even most studies, this assumption is reasonable Person years is ~ length of followup * number of participants when events are rare and/or all participants start followup at nearly the same time

46 Person years of risk, 2 Even if there are 50% events and they occur on average somewhat after the midpoint of followup, person years > ¾ length of followup * number of participants Incidence density rates are somewhat higher than correspondingly scaled cumulative incidence rates, but relative risks are probably not much affected by computation of incidence density vs cumulative incidence

47 Person years of risk, 3 Proportional hazards models do not allow time dependency in prediction, so most analyses are not considering followup time in this way. The timing of events vs. censoring and competing risk may cause differences in findings for incidence density vs. cumulative incidence, but this would be rare Subgroups with very different followup times could create problems, but this is rare

48 In prospective studies, events are accumulated over time, so incidence density methods can be applied Nested case control Case cohort

49 X X O X Time t1t1 O Nested Case-Control Study (1) Consider the following hypothetical cohort: X = lung cancer case O = loss to follow-up t2t2 t3t3

50 At time t 1 the first case occurs for which 8 eligible controls are identified Similarly, there are 5 eligible controls for the case at time t 2, and 4 eligible controls for the case occurring at time t 3 A control can become a case at a later time (e.g., cases at t 2 and t 3 serve as control for case at t 1 ) Controls can be selected randomly from all eligible controls (i.e., 1 or more controls for each case) Number of eligible controls decreases with increasing number of matching factors Nested Case-Control Study (2)

51 X X O X Time O Case-Cohort Study (1) Consider the following hypothetical cohort: X = lung cancer case O = loss to follow-up t0t0

52 In closed cohort (in this case, when everybody enters cohort at t 0 ), a sample of all subjects (“sub-cohort”) is randomly selected from cohort members at start of follow-up t 0 In open cohort (i.e., when time of entry into cohort is variable), a sample of all subjects (“sub-cohort”) is randomly selected from members of cohort as it is followed over time (i.e., regardless of when subjects entered the cohort) Case-Cohort Study (2)

53 From a black and white theoretical perspective, time- based selection makes sense: a person is a noncase until a certain time, then becomes a case. In the life table approach, consistent with this thought, the risk set at any time point is cases and noncases at that time point. However, much chronic disease develops slowly and the black-white, case-noncase formulation does not apply very well. I am unenthusiastic generally about time-based selection of controls or noncases because I would like to maintain maximum separation between cases and noncases. Merits of time-based selection

54 Nevertheless, taking person years into account (incidence density analysis) is more precise than is analysis of cumulative incidence. Analysis is therefore by Cox proportional hazards life table regression methods, or some similar technique. The nested case-control method is not easily adapted to this type of analysis The case-cohort method is easily analyzed this way (PROC PHREG or GENMOD with Poisson regression), but the variances of the slopes tend to be too large. Robust variance estimation is possible in GENMOD with GEE option; Barlow provides a SAS macro for use of PHREG. Analysis of events that evolve in time


Download ppt "Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs."

Similar presentations


Ads by Google