1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.

Slides:



Advertisements
Similar presentations
What is Event History Analysis?
Advertisements

Agency for Healthcare Research and Quality (AHRQ)
Topic: Several Approaches to Modeling Recurrent Event Data Presenter: Yu Wang.
Controlling for Time Dependent Confounding Using Marginal Structural Models in the Case of a Continuous Treatment O Wang 1, T McMullan 2 1 Amgen, Thousand.
M2 Medical Epidemiology
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Informative Censoring Addressing Bias in Effect Estimates Due to Study Drop-out Mark van der Laan and Maya Petersen Division of Biostatistics, University.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
QBM117 Business Statistics Statistical Inference Sampling 1.
FINAL REVIEW BIOST/EPI 536 December 14, Outline Before the midterm: Interpretation of model parameters (Cohort vs case-control studies) Hypothesis.
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Topic 3: Regression.
Cumulative Geographic Residual Test Example: Taiwan Petrochemical Study Andrea Cook.
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li December 2, 2004.
Modeling clustered survival data The different approaches.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.
Model Checking in the Proportional Hazard model
Generalized Linear Models
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 1: Event history data and counting processes.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk 1 & Ulf Strömberg 2 1 Competence Center for.
1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint.
01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
A short introduction to epidemiology Chapter 2b: Conducting a case- control study Neil Pearce Centre for Public Health Research Massey University Wellington,
Borgan and Henderson:. Event History Methodology
HSRP 734: Advanced Statistical Methods July 17, 2008.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Epidemiologic design from a sampling perspective Epidemiology II Lecture April 14, 2005 David Jacobs.
03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
BIOST 536 Lecture 11 1 Lecture 11 – Additional topics in Logistic Regression C-statistic (“concordance statistic”)  Same as Area under the curve (AUC)
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Authenticity of results of statistical research. The Normal Distribution n Mean = median = mode n Skew is zero n 68% of values fall between 1 SD n 95%
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
1 Using dynamic path analysis to estimate direct and indirect effects of treatment and other fixed covariates in the presence of an internal time-dependent.
Instructor Resource Chapter 15 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Average values and their types. Averages n Averages are widely used for comparison in time, that allows to characterize the major conformities to the.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
Joint Modelling of Accelerated Failure Time and Longitudinal Data By By Yi-Kuan Tseng Yi-Kuan Tseng Joint Work With Joint Work With Professor Jane-Ling.
1 Study Design Imre Janszky Faculty of Medicine, ISM NTNU.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Survival time treatment effects
BINARY LOGISTIC REGRESSION
Statistical Modelling
If we can reduce our desire,
April 18 Intro to survival analysis Le 11.1 – 11.2
Epidemiologic Measures of Association
Sec 9C – Logistic Regression and Propensity scores
CONCEPTS OF HYPOTHESIS TESTING
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
If we can reduce our desire,
Presentation transcript:

1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model

2 Relative risk regression models Hazard rate for individual i Relative risk for individual i depends on covariates x i =(x i1, x i2, …, x ip ), possibly time-dependent relative risk (hazard ratio) baseline hazard Cox: Excess relative risk:

3 Cohort data with delayed entry Study time individuals at risk (arrows are censored observations)

4 Need information on covariates for all individuals at risk Estimate regression coefficients by maximizing Cox's partial likelihood The partial likelihood is a product over all failure times (event times) The contribution for the individual i j failing at t j is R j is the risk set at t j n(t) is number at risk at t–

5 Cohort sampling designs Expensive to collect and check (!) covariate information for all individuals in large cohorts Also not necessary when there are few events — the cases carry most of the statistical information Useful to have cohort sampling designs where one only needs to collect covariate information for cases and controls –Nested case-control –Case-cohort

6 Classical nested case-control design Select at random at each failure time m – 1 controls among the n(t j ) – 1 non-failures at risk Illustration for m = 2 case control

7 Counter-matched nested case-control design The statistical information in a sampled risk set (a case and its controls) depends on the variation of the covariate values within the set We may obtain "large" variation of an exposure of interest by counter-matching on (i) a surrogate measure for the exposure (ii) exposure when correcting for a confounder Classify each individual at risk into one of L strata based on information available for everyone. Select the controls by stratified random sampling

8 Want a specified number m l from each stratum l in a sampled risk set (a case and its controls) Illustration for L = 2 and m 1 = m 2 = 1 Select m l controls among those at risk in stratum l, except for the case's stratum s where only m s – 1 controls are selected

9 A sampling design is described by its sampling distribution Classical nested case-control design: If individual i fails at time t the probability of selecting r as the sampled risk set is Sampled risk setconsists of the case i j and the m – 1 controls (we assume that r is a subset of the risk set, that r is of size m and that i is in r )

10 Counter-matched nested case-control design: (under suitable assumptions on the set r) If individual i in stratum s(i) fails at time t the probability of selecting r as the sampled risk set is Denote by n l (t) the number at risk in stratum l at time t –

11 Partial likelihood Introduce the counting process N (i,r) (t) counting the number of times in [ 0,t] that individual i fails and the sampled risk set equals r This takes the form: hazard rate at risk indicator sampling probability Corresponding intensity process:

12 Introduce the aggregated processes:s Probability that individual i fails given that a failure occurs at t and given that the sampled risk set is r : Partial likelihood is a product of such factors over all failures and sampled risk set occurrences (after cancelling common factors)

13 Contribution to the partial likelihood from a sampled risk set: Classical nested case-control: Counter-matched case-control: May estimate regression parameters by software for relative risk regression (Cox, etc) that allows for "offsets". By similar counting process arguments as for the full cohort, one may show that the usual large sample likelihood methods apply.

14 Uranium miners cohort 3347 uranium miners from Colorado Plateau included in study cohort Followed-up until end of lung cancer deaths Interested in effect of radon and smoking exposure on the risk of lung cancer death Have exposure information for the full cohort. Will use cohort sampling for illustration

15 DesignRadon (b 1 )Smoke (b 1 ) 1:1 case-control0.42 (0.20)0.23 (0.10) 1:1 counter-matched0.39 (0.14)0.25 (0.10) 1:3 case-control0.43 (0.16)0.20 (0.07) 1:3 counter-matched0.41 (0.13)0.19 (0.07) Full cohort0.38 (0.11)0.17 (0.05) Countermatch on radon exposure quartiles. Fit excess relative risk model: x i1 = cumulative radon (100 WLMs) x i2 = cumulative smoking (1000 PACKS)

16 Classical case-cohort design Select at random a subcohort C consisting of a fraction p of the full cohort Illustration for p = 0,50 subcohort

17 Use a pseudo likelihood for estimation Software for relative risk regression (Cox, etc) may be "tricked" to do the estimation Contribution to pseudo likelihood for a case: Likelihood methods do not apply. Standard errors from statistical software need to be fixed, and likelihood ratio tests cannot be used

18 Stratified case-cohort design Select the subcohort by stratified random sampling of a fraction p s from stratum s Illustration for S = 2 and p 1 = p 2 = 0,50

19 Contribution to pseudo likelihood for a case: Weights: for i in stratum s Alternative versions of the pseudo likelihood are available

20 Simulation with one normal covariate: Stratify into two strata according to a binary surrogate that is available for everyone 10 % surrogate positive individuals Covariate N(0,1) for surrogate negative individuals and N( ,   ) for surrogate positive individuals Baseline and censoring adjusted to get 10% failures and 20% censoring before the ”closure of the study”

21 Simulation repeated 1000 times with 1000 individuals in the cohort 100 individuals in the subcohort for case-cohort 100 controls (on the average) for nested case-control Efficiencies in % relative to full cohort:  =2  =4  =4  =1  =1  =2 Classical nested case-control Classical case-cohort Counter matched nested case-control Stratified case-cohort517172

22 Statistical inference: –Nested case-control (NCC): usual likelihood methods apply, and standard software may be used for the analysis –Case-cohort (CC): Likelihood methods are not valid, but statistical software may be "tricked" to do the analysis Statistical efficiency is about the same for the two design Missing covariates : –NCC: in a 1:1 design a sampled risk set is lost if covariate information is missing for the control –CC: missing covariates in the subcohort are less serious Nested case-control or case-cohort?

23 Logistics for prospective studies: –NCC: control sampling has to wait until cases occur –CC: subcohort can be selected at the outset Time scale for analysis: –NCC: must be decided before sampling of controls –CC: need not be decided before sampling of subcohort