Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs.

Slides:



Advertisements
Similar presentations
Agency for Healthcare Research and Quality (AHRQ)
Advertisements

Observational Studies and RCT Libby Brewin. What are the 3 types of observational studies? Cross-sectional studies Case-control Cohort.
Study Designs in Epidemiologic
Observational Designs Oncology Journal Club April 26, 2002.
1 Case-Control Study Design Two groups are selected, one of people with the disease (cases), and the other of people with the same general characteristics.
What is a sample? Epidemiology matters: a new introduction to methodological foundations Chapter 4.
Bios 101 Lecture 2 September 27, Hierarchy of Designs Expert opinion, usual practice Case series and case reports Ecological studies/Correlational.
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Measures of Disease Association Measuring occurrence of new outcome events can be an aim by itself, but usually we want to look at the relationship between.
Measures of association
Comunicación y Gerencia 1Case control studies15/12/2010.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
NACC National Alzheimer’s Coordinating Center Time Dependent Exposure in Case-Control Studies Roger Higdon, PhD Senior Biostatistician NACC, University.
Cohort Studies.
1 The Odds Ratio (Relative Odds) In a case-control study, we do not know the incidence in the exposed population or the incidence in the nonexposed population.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Are exposures associated with disease?
EPIDEMIOLOGY Why is it so damn confusing?. Disease or Outcome Exposure ab cd n.
Analytic Epidemiology
Cohort Study.
INTRODUCTION TO EPIDEMIOLO FOR POME 105. Lesson 3: R H THEKISO:SENIOR PAT TIME LECTURER INE OF PRESENTATION 1.Epidemiologic measures of association 2.Study.
Multiple Choice Questions for discussion
Dr. Abdulaziz BinSaeed & Dr. Hayfaa A. Wahabi Department of Family & Community medicine  Case-Control Studies.
Epidemiologic Study Designs Nancy D. Barker, MS. Epidemiologic Study Design The plan of an empirical investigation to assess an E – D relationship. Exposure.
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Research Study Design and Analysis for Cardiologists Nathan D. Wong, PhD, FACC.
CHP400: Community Health Program- lI Research Methodology STUDY DESIGNS Observational / Analytical Studies Case Control Studies Present: Disease Past:
Types of study designs Arash Najimi
Lecture 6 Objective 16. Describe the elements of design of observational studies: (current) cohort studies (longitudinal studies). Discuss the advantages.
Case-control studies Overview of different types of studies Review of general procedures Sampling of controls –implications for measures of association.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Understanding real research 4. Randomised controlled trials.
Smart designs Case control studies FETP India. Competency to be gained from this lecture Design a case control study.
A short introduction to epidemiology Chapter 2b: Conducting a case- control study Neil Pearce Centre for Public Health Research Massey University Wellington,
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
The binomial applied: absolute and relative risks, chi-square.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Osteoarthritis Initiative Analytic Strategies for the OAI Data December 6, 2007 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and.
Measures of Association and Impact Michael O’Reilly, MD, MPH FETP Thailand Introductory Course.
Leicester Warwick Medical School Health and Disease in Populations Case-Control Studies Paul Burton.
Case Control Study Dr. Ashry Gad Mohamed MB, ChB, MPH, Dr.P.H. Prof. Of Epidemiology.
Basic concept of clinical study
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Overview of Study Designs. Study Designs Experimental Randomized Controlled Trial Group Randomized Trial Observational Descriptive Analytical Cross-sectional.
Study designs. Kate O’Donnell General Practice & Primary Care.
1 Basic epidemiological study designs and its role in measuring disease exposure association M. A. Yushuf Sharker Assistant Scientist Center for Communicable.
Case-Control Studies Abdualziz BinSaeed. Case-Control Studies Type of analytic study Unit of observation and analysis: Individual (not group)
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
Instructor Resource Chapter 13 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
A short introduction to epidemiology Chapter 2: Incidence studies Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Types of Studies. Aim of epidemiological studies To determine distribution of disease To examine determinants of a disease To judge whether a given exposure.
1 Study Design Imre Janszky Faculty of Medicine, ISM NTNU.
Case control & cohort studies
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Case Control study. An investigation that compares a group of people with a disease to a group of people without the disease. Used to identify and assess.
Epidemiological Study Designs And Measures Of Risks (1)
Journal Club Curriculum-Study designs. Objectives  Distinguish between the main types of research designs  Randomized control trials  Cohort studies.
Chapter 9: Case Control Studies Objectives: -List advantages and disadvantages of case-control studies -Identify how selection and information bias can.
April 18 Intro to survival analysis Le 11.1 – 11.2
Epidemiologic Measures of Association
CASE-CONTROL STUDIES Ass.Prof. Dr Faris Al-Lami MB,ChB MSc PhD FFPH
Epidemiologic design from a sampling perspective
Types of Errors Type I error is the error committed when a true null hypothesis is rejected. When performing hypothesis testing, if we set the critical.
Ggjfتبغبغب باسمه تعالی اپیدمیولوژی بالینی مبحث: انواع مطالعات.
Lecture 1: Fundamentals of epidemiologic study design and analysis
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Interpreting Epidemiologic Results.
HEC508 Applied Epidemiology
Presentation transcript:

Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs

Why different epidemiologic designs? It is generally not possible to observe everyone in a population New questions arise after data / samples have been collected Cost and feasibility Statistical efficiency and appropriateness to study question

The possibilities There are many approaches  Sampling from the whole population  Sampling from exposure  Sampling from caseness  Haphazard selection

True Population Configuration Underlying Epidemiologic Study Designs Exposed N = A + B Diseased N = A Nondiseased N = B Unexposed N = C + D Diseased N = C Nondiseased N = D Time 0 Time 1

True Population Configuration Underlying Epidemiologic Study Designs: Approximate numbers for Minnesota Exposed N = 1,000,000 Diseased N = 50,000 Nondiseased N = 950,000 Unexposed N = 3,000,000 Diseased N = 50,000 Nondiseased N = 2,950,000 Time 0 Time 1

Alternate Format: Population Values Exposed and Diseased DiseasedNot diseased ExposedAC Not exposedBD The numbers A, B, C, D are fixed and known exactly

Alternate Format: Population Values Exposed and Diseased DiseasedNot diseased ExposedA = 50,000C = 950,000 Not exposedB = 50,000D = 2,950,000 The numbers A, B, C, D are fixed and known exactly

Measures of Risk and Relative Risk: Whole Population Diseased Not diseased Odds Risk = Probability ExposedACA/CA/(A+C) Not exposed BDB/DB/(B+D) Exposure Ratio A/(A+B)C/(C+D) Risk Difference = A/(A+C) – B/(B+D) Risk Ratio, Relative Risk = {A/(A+C)} / {B/(B+D)} Odds Ratio = A/C / B/D = AD/BC

Measures of Risk and Relative Risk: Whole Population Diseased Not diseased Odds Risk = Probability Exposed50,000950, Not exposed 50,0002,950, Exposure Ratio Risk Difference = Risk Ratio, Relative Risk = 3 Odds Ratio = 3.11

Epidemiologic studies sample from A, B, C, and D to estimate Odds or Risk; Risk Difference, Risk Ratio, or Relative Risk Epidemiologic design is determined by investigator control, temporality, sampling fraction

Epidemiologic design: level of investigator control Clinical Trial  Exposure assigned (at random)  Reflects temporary state Observational  Exposure occurs naturally  Often reflects long term state

Epidemiologic design: temporality Clinical Trial, Cohort, Nested case control, Case-cohort  Exposure assessed at variable times before disease Cross-sectional  Exposure assessed simultaneously with disease Case-control  Past exposure assessed simultaneously with disease

Epidemiologic design: sampling fraction Cells A, B, C, and D are sampled at random with constant probability (called the sampling fraction) Sample size is a, b, c, d If a/A = b/B = c/C = d/D then the sampling fraction is equal for all cells

Sampling fractions DiseasedNot diseased Exposeda/A = f A c/C = f C Not exposedb/B = f B d/D = f D The numbers A, B, C, D are fixed and known exactly. The numbers a,b,c,d are realized in a given study, determined during the study.

Expected observations given sampling fractions DiseasedNot diseased Exposed a = 5,000 f A =0.1 c = 950 f C =0.001 Not exposed b = 1,250 f B =0.025 d = 5,900 f D =0.002 Risk: naïve (and wrong) = 5000/5950 = 0.84 and 1250/7150 = 0.175; naïve relative risk = 4.8 Correct risk = 5000/0.1/ (5000/ /0.001) = 0.05 and 1250/0.025 / (1250/ /0.002) = leading to Relative risk = 0.05/0.017 = 3

Observations given sampling fractions DiseasedNot diseased Exposed a = 5,000 +  a f A =0.1 c =  c f C =0.001 Not exposed b = 1,250 +  b f B =0.025 d = 5,900 +  d f D =0.002 All estimates differ from population values by random amounts (see example in Excel file)

Epidemiologic design: sampling fraction Cross-sectional: sample equally from everyone f A = f B = f C = f D Clinical trial, Cohort study: sample equally from initial exposure groups:  f A+C and f B+D  (a+c)/(A+C) usually differ from (b+d)/(B+D) in clinical trial, usually the same in cohort study

Cross-sectional Study DiseasedNot diseased Exposeda/A = fc/C = f Not exposedb/B = fd/D = f Sampling fraction is the same in all cells. Risk and odds estimates are unbiased, so risk differences and ratios are unbiased.

Expected Cross-sectional Study DiseasedNot diseased Exposed a = 50 f A =0.001 c = 950 f C =0.001 Not exposed b = 50 f B =0.001 d = 2,950 f D =0.001 Naïve risks and relative risks are correct! 50/1000 = 0.05, etc.

Observed Cross-sectional Study DiseasedNot diseased Exposed a = 50 +  a f A =0.001 c =  c f C =0.001 Not exposed b = 50 +  b f B =0.001 d = 2,950 –  abc f D =0.001 All estimates differ from population values by random amounts

Clinical Trial or Cohort Study DiseasedNot diseased Exposeda/A = f A+C c/C = f A+C Not exposedb/B = f B+D d/D = f B+D Sampling fraction is fixed within exposed and within not exposed. Usually f A+C not = f B+D in clinical trial, f A+C = f B+D in cohort study (which has cross-sectional baseline). Risk and odds estimates are unbiased, so risk differences and ratios are unbiased.

Expected Clinical Trial or Cohort Study DiseasedNot diseased Exposed a = 100 f A+C =0.002 c = 1900 f A+C =0.002 Not exposed b = 50 f B+D =0.001 d = 2950 f B+D =0.001 f A+C usually = f B+D in a cohort study f A+C may differ from f B+D in a clinical trial (if treatment allocation is not 1:1)

Expected Measures of Risk and Relative Risk: Clinical Trial or Cohort Study Diseased Not diseased Odds Risk = Probability Exposed1001, Not exposed 502, Exposure Ratio  Differs from total population Correct Risk Difference = Correct Risk Ratio, Relative Risk = 3 Odds Ratio = 3.11

Observed Clinical Trial or Cohort Study DiseasedNot diseased Exposed a =  a f A+C =0.002 c =  a f A+C =0.002 Not exposed b = 50 +  b f B+D =0.001 d =  b f B+D =0.001 All estimates differ from population values by random amounts

Epidemiologic design: sampling fraction Case control: sample differentially within diseased and within nondiseased  f A = f B = f A+B and f C = f D = f C+D  Usually f A+B much greater than f C+D

Case-control Study DiseasedNot diseased Exposeda/A = f A+B c/C = f C+D Not exposedb/B = f A+B d/D = f C+D Sampling fraction is fixed with diseased and within not diseased. Exposure probabilities and odds estimates are unbiased, but risk, disease odds, risk differences and ratios are biased. Odds ratio ~ relative risk when disease is rare.

Expected Case-control Study DiseasedNot diseased Exposed a = 500 f A+B =0.01 c = 494 f C+D = Not exposed b = 500 f A+B =0.01 d = 1534 f C+D = f A+B = * f C+D

Expected Measures of Risk and Relative Risk: Case-Control Study Diseased Not diseased Odds Risk = Probability Exposed Not exposed 5001, Exposure Ratio ^^Differs from total population Incorrect Risk Difference = Incorrect Risk Ratio, Relative Risk = 2.04 Odds Ratio = 3.11  correct and approx = true Rel Risk

Observed Case-control Study DiseasedNot diseased Exposed a =  a f A+B =0.01 c =  c f C+D = Not exposed b =  a f A+B =0.01 d =  c f C+D =

Epidemiologic design: sampling fraction Nested case control: sample differentially within diseased and within nondiseased starting with a cross-sectional base, so exposure measured prior to disease diagnosis  f A = f B = f A+B and f C = f D = f C+D  Often f A+B = 1  Usually f A+B somewhat greater than f C+D

Nested Case-Control Study, 1: Observed Cross-sectional Study DiseasedNot diseased Exposed a =  a f A =0.01 c =  c f C =0.01 Not exposed b =  b f B =0.01 d = –  abc f D =0.01 Previous cross-sectional example with sampling fractions increased by a factor of 10

Nested Case-Control Study, 2: Sampling from the cross-section DiseasedNot diseased Exposeda/A = f A+B c/C = f C+D Not exposedb/B = f A+B d/D = f C+D Sampling fraction is fixed within diseased and within not diseased; temporality preserved. Exposure probabilities and odds estimates are unbiased, but risk, disease odds, risk differences and ratios are biased. Odds ratio ~ relative risk when disease is rare.

Observed Nested Case-Control Study DiseasedNot diseased Exposed a =  a f A+B =1 c =  c +  c1 f C+D =0.01 Not exposed b =  b f A+B =1 d = 2950 –  abc –  c1 f C+D =0.01  a,  b,  c are ignored; if f A+B < 1 then there is an  a1.

Expected Measures of Risk and Relative Risk: Nested Case-Control Study Diseased Not diseased Odds Risk = Probability Exposed Not exposed 5002, Exposure Ratio ^^Differs from total population Incorrect Risk Difference = Incorrect Risk Ratio, Relative Risk = 2.38 Odds Ratio = 3.11  correct and approx = true Rel Risk

Epidemiologic design: sampling fraction Case cohort: sample differentially within diseased and within everyone (diseased + nondiseased) starting with a cross-sectional base, so exposure measured prior to disease diagnosis  f A = f B = f A+B ; the whole cohort is sampled at f A+B+C+D  Usually f A+B = 1, while f A+B+C+D is a sizeable fraction like 0.1 or 0.25.

Case-Cohort Study, 1: Observed Cross-sectional Study DiseasedNot diseased Exposed a =  a f A =0.01 c =  c f C =0.01 Not exposed b =  b f B =0.01 d = –  abc f D =0.01 Previous cross-sectional example with sampling fractions increased by a factor of 10

Case-Cohort Study, 2: sampling from the cross-section Diseased Cohort (Part of all ppts) ExposedA/A, f A+B =1(a+c)/(A+C) = f A+B+C+D Not exposedB/B, f A+B =1(b+d)/(B+D) = f A+B+C+D Sampling fraction is fixed within diseased and within not diseased; temporality preserved; cohort includes cases and noncases. Risk and odds estimates are unbiased within exposed and within unexposed but differently weighted, so risk differences biased Risk ratios are unbiased.

Observed Case-Cohort Study DiseasedCohort Exposed a =  a f A+B =1 c =  c +  c1 f A+B+C+D =0.1 Not exposed b =  b f A+B =1 d = 3000–  abc –  c1 f A+B+C+D =0.1

Observed Case-Cohort Study Case, f A+B =1 Cohort, f A+B+C+D =0.1 Diseased Not diseased Exposed a =  a 50+  c +  c  c +  c3 Not exposed b =  b 50+  c +  c –  abc –  c123 When f A+B = 1, cohort diseased is a subset of case diseased. When f A+B < 1, cohort diseased usually overlaps case diseased.

Nested Case-Control vs Case Cohort Same cases in both For a certain sampling strategy, same noncases in both Analytic strategy different

Expected Measures of Risk and Relative Risk: Case-Cohort Study DiseasedCohortOdds Risk = Probability Exposed5001,000n/a0.5 Not exposed 5003,000n/a0.167 Exposure Ratio  ^^Differs from total population Incorrect Risk Difference = (true risk diff/f A+B+C+D ) “Odds ratio” = Correct Risk Ratio, Relative Risk = 3 Relative risk would be correct even if the disease were rare

Analysis of Case Cohort Study Set up table as if cohort were the control group Include the overlapping cases in both cases and cohort Compute ad/bc You have estimated relative risk Note:  If you know the cohort sampling fraction, you can multiply the cohort up and estimate true risks  Given additional error in second stage cohort sampling, this is less efficient than estimating relative risk without upweighting

Analysis of Case Control and Case Cohort Study Case Control  Logistic regression  e b is an odds ratio  Temporal bias? Nested Case Control  Logistic regression  e b is an odds ratio  No temporal bias Case Cohort  Logistic regression or Linear regression  e b is a relative risk, not an odds ratio  No Temporal bias  Variance somewhat high unless robust variance estimate is used (e.g. PROC GENMOD with GEE option)

Disadvantage of Case Control vs. Case Cohort Study Case Control and Nested Case Control  Inflexible: Outcome is fixed  Even in nested case control study the sampling structure is usually unknown Case Cohort  Ideal for the intended outcome or for multiple outcomes  If cohort is large enough, multiple outcomes can be analyzed  Cases can be included in analysis of alternate dependent variable because sampling structure is known

Person years of risk, 1 The foregoing assumes that all cases occur at the same time (or can safely be treated as such) In many, even most studies, this assumption is reasonable Person years is ~ length of followup * number of participants when events are rare and/or all participants start followup at nearly the same time

Person years of risk, 2 Even if there are 50% events and they occur on average somewhat after the midpoint of followup, person years > ¾ length of followup * number of participants Incidence density rates are somewhat higher than correspondingly scaled cumulative incidence rates, but relative risks are probably not much affected by computation of incidence density vs cumulative incidence

Person years of risk, 3 Proportional hazards models do not allow time dependency in prediction, so most analyses are not considering followup time in this way. The timing of events vs. censoring and competing risk may cause differences in findings for incidence density vs. cumulative incidence, but this would be rare Subgroups with very different followup times could create problems, but this is rare

In prospective studies, events are accumulated over time, so incidence density methods can be applied Nested case control Case cohort

X X O X Time t1t1 O Nested Case-Control Study (1) Consider the following hypothetical cohort: X = lung cancer case O = loss to follow-up t2t2 t3t3

At time t 1 the first case occurs for which 8 eligible controls are identified Similarly, there are 5 eligible controls for the case at time t 2, and 4 eligible controls for the case occurring at time t 3 A control can become a case at a later time (e.g., cases at t 2 and t 3 serve as control for case at t 1 ) Controls can be selected randomly from all eligible controls (i.e., 1 or more controls for each case) Number of eligible controls decreases with increasing number of matching factors Nested Case-Control Study (2)

X X O X Time O Case-Cohort Study (1) Consider the following hypothetical cohort: X = lung cancer case O = loss to follow-up t0t0

In closed cohort (in this case, when everybody enters cohort at t 0 ), a sample of all subjects (“sub-cohort”) is randomly selected from cohort members at start of follow-up t 0 In open cohort (i.e., when time of entry into cohort is variable), a sample of all subjects (“sub-cohort”) is randomly selected from members of cohort as it is followed over time (i.e., regardless of when subjects entered the cohort) Case-Cohort Study (2)

From a black and white theoretical perspective, time- based selection makes sense: a person is a noncase until a certain time, then becomes a case. In the life table approach, consistent with this thought, the risk set at any time point is cases and noncases at that time point. However, much chronic disease develops slowly and the black-white, case-noncase formulation does not apply very well. I am unenthusiastic generally about time-based selection of controls or noncases because I would like to maintain maximum separation between cases and noncases. Merits of time-based selection

Nevertheless, taking person years into account (incidence density analysis) is more precise than is analysis of cumulative incidence. Analysis is therefore by Cox proportional hazards life table regression methods, or some similar technique. The nested case-control method is not easily adapted to this type of analysis The case-cohort method is easily analyzed this way (PROC PHREG or GENMOD with Poisson regression), but the variances of the slopes tend to be too large. Robust variance estimation is possible in GENMOD with GEE option; Barlow provides a SAS macro for use of PHREG. Analysis of events that evolve in time