1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint.

Slides:



Advertisements
Similar presentations
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
Advertisements

1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Introduction to Survival Analysis October 19, 2004 Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH Division of General Medical Sciences.
Departments of Medicine and Biostatistics
HSRP 734: Advanced Statistical Methods July 24, 2008.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
the Cox proportional hazards model (Cox Regression Model)
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
NACC National Alzheimer’s Coordinating Center Time Dependent Exposure in Case-Control Studies Roger Higdon, PhD Senior Biostatistician NACC, University.
Introduction to Survival Analysis Seminar in Statistics 1 Presented by: Stefan Bauer, Stephan Hemri
Cumulative Geographic Residual Test Example: Taiwan Petrochemical Study Andrea Cook.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Maximum likelihood (ML)
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Model Checking in the Proportional Hazard model
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
Multiple Choice Questions for discussion
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 1: Event history data and counting processes.
1 Testing and developing statistical models for adoption studies of genetic and environmental influences of premature death Introduction and summary Paper.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Scale estimation and significance testing for three focused statistics Peter A. Rogerson Departments of Geography and Biostatistics University at Buffalo.
13.1 Goodness of Fit Test AP Statistics. Chi-Square Distributions The chi-square distributions are a family of distributions that take on only positive.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Extending Cox Regression Accelerated Failure Time Models Tom Greene & Nan Hu.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Bayesian Analysis and Applications of A Cure Rate Model.
4.3 Diagnostic Checks VO Verallgemeinerte lineare Regressionsmodelle.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
01/20151 EPI 5344: Survival Analysis in Epidemiology Age as time scale March 31, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
Borgan and Henderson:. Event History Methodology
HSRP 734: Advanced Statistical Methods July 31, 2008.
03/20131 EPI 5344: Survival Analysis in Epidemiology Risk Set Analysis Approaches April 16, 2013 Dr. N. Birkett, Department of Epidemiology & Community.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Lecture 12: Cox Proportional Hazards Model
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Threshold Regression Models Mei-Ling Ting Lee, University of Maryland, College Park
Empirical Likelihood for Right Censored and Left Truncated data Jingyu (Julia) Luan University of Kentucky, Johns Hopkins University March 30, 2004.
UW Winter 07 1 IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION Donald A. Pierce, Oregon Health Sciences Univ Ruggero Bellio, Udine, Italy These slides.
Chapter 8: Simple Linear Regression Yang Zhenlin.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
1 Using dynamic path analysis to estimate direct and indirect effects of treatment and other fixed covariates in the presence of an internal time-dependent.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
Contingency table analysis
Proportional Hazards Model Checking the adequacy of the Cox model: The functional form of a covariate The link function The validity of the proportional.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
DURATION ANALYSIS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 9.
Statistics 262: Intermediate Biostatistics
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Kaplan-Meier and Nelson-Aalen Estimators
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Ch11 Curve Fitting II.
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Dynamic analysis of binary longitudinal data
Presentation transcript:

1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Bryan Langholz

2 Outline: Example: Uranium miners cohort Cohort model, data and martingale residuals Risk set sampling Martingale residuals and goodness-of- fit tests for sampled risk set data Concluding remarks

3 Uranium miners cohort: 3347 uranium miners from Colorado Plateau included in study cohort Followed-up until end of lung cancer deaths Interested in effect of radon and smoking exposure on the risk of lung cancer death Have exposure information for the full cohort. Will sample from the risk sets for illustration (e.g. Langholz & Goldstein, 1996)

4 Relative risk regression models Hazard rate for individual i Relative risk for individual i depends on covariates x i1, x i2, …, x ip (possibly time-dependent) relative riskbaseline hazard Cox: Excess relative risk:

5 Cohort data: Study time individuals at risk (arrows are censored observations)

6 t 1 < t 2 < t 3 < …. times of failures i j individual failing at t j ("case") Counting process for individual i : Intensity process l i (t) is given by

7 Cumulative intensity processes: Martingales: Martingale residual processes: at risk indicator hazard rate

8 Martingal residual processes may be used to assess goodness of fit: Plot individual martingale residuals Plot grouped martingale residual processes versus time (Aalen,1993; Grønnesby & Borgan,1996) versus covariates (Therneau, Grambsch & Flemming,1990) The latter may be extended to sampled risk set data

9 Risk set sampling Cohort studies need information on covariates for all individuals at risk Expensive to collect and check (!) this information for all individuals in large cohorts For risk set sampling designs one only needs to collect covariate information for the cases and a few controls sampled at the times of the failure

10 Select m –1 controls among the n(t) – 1 non-failures at risk if a case occurs at time t, i.e. match on study time Illustration for m = 2 case control

11 A sampling design for the controls is described by its sampling distribution The classical nested case-control design: If individual i fails at time t the probability of selecting the set r as the sampled risk set is A sampled risk setconsists of the case i j and its controls (we assume that r is a subset of the risk set, that r is of size m and that i is in r ) A number of sampling designs are available

12 Inference on the regression coefficients can be based on the partial likelihood The partial likelihood enjoys usual likelihood properties (Borgan, Goldstein & Langholz 1995) For the classical nested case-control design, the partial likelihood simplifies

13 Martingale residuals and goodness- of-fit tests for sampled risk set data Introduce the counting processes Intensity processes take the form:

14 Martingale residual processes: Corresponding martingales: The are of little practical use on their own, but they may be aggregated over groups of individuals to produce useful plots

15 For group g May be interpreted as "observed _ expected" number of failures in group g Asymptotic distribution may be derived using counting process methods Simplifies for classical nested case-control

16 Ilustration: uranium miners cohort Fit excess relative risk model: x i1 = cumulative radon (100 WLMs) x i2 = cumulative smoking (1000 packs) For classical nested case-control with three controls per case:

17 Aggregate martingale residual processes in three groups according to cumulative radon exposure: Groups: I: 1500 WLMs There are indications for an interaction between cumulative radon exposure and age

18 Age and groupObservedExpected Below 60 years & group I Below 60 years & group II Below 60 years & group III Above 60 years & group I Above 60 years & group II Above 60 years & group III Observed and expected number of failures in the groups for ages below and above 60 years: Chi-squared statistic with 2(3 – 1) = 4 df takes the value 10.5 (P-value 3.2%)

19 Concluding remarks Introduces a time aspect that is usually disregarded for sample risk set data Gives a similar model formulation as for cohort data and thereby opens up for similar methodo- logical developments as for cohort studies Grouped martingale residual processes is one example of this. They allow to check for time- dependent effects and other deviations from the model The counting process formulation of nested case-control studies:

20 How should the grouping be performed? How do specific deviations from the model turn up in the plots? Kolmogorov-Smirnov and Cramer von Mises type tests? (Durbin’s approximation, Lin et al’s simultation trick) Questions and further develoments of grouped martingale residual plots and related goodness-of-fit methods