Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.

Slides:



Advertisements
Similar presentations
Random Processes Introduction (2)
Advertisements

Lecture 11 (Chapter 9).
Introduction to Regression with Measurement Error STA431: Spring 2015.
1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.
Bayesian Estimation in MARK
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Error Component models Ric Scarpa Prepared for the Choice Modelling Workshop 1st and 2nd of May Brisbane Powerhouse, New Farm Brisbane.
STAT 497 APPLIED TIME SERIES ANALYSIS
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
1 A General Class of Models for Recurrent Events Edsel A. Pena University of South Carolina at Columbia [ Research support from.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research.
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics, USC Research supported by NIH and NSF Grants Based on joint.
From PET to SPLIT Yuri Kifer. PET: Polynomial Ergodic Theorem (Bergelson) preserving and weakly mixing is bounded measurable functions polynomials,integer.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
3.3 Brownian Motion 報告者:陳政岳.
Probability By Zhichun Li.
Presenting: Assaf Tzabari
Parametric Inference.
Modeling clustered survival data The different approaches.
On a dynamic approach to the analysis of multivariate failure time data Odd Aalen Section of Medical Statistics, University of Oslo, Norway.
Introduction to Regression with Measurement Error STA431: Spring 2013.
Lecture II-2: Probability Review
Analysis of Complex Survey Data
1 A General Class of Models for Recurrent Events Edsel A. Pena University of South Carolina at Columbia [ Research support from.
Simulation Output Analysis
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
Random Sampling, Point Estimation and Maximum Likelihood.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
0 K. Salah 2. Review of Probability and Statistics Refs: Law & Kelton, Chapter 4.
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
Borgan and Henderson:. Event History Methodology
1 A Bayes method of a Monotone Hazard Rate via S-paths Man-Wai Ho National University of Singapore Cambridge, 9 th August 2007.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Week 21 Stochastic Process - Introduction Stochastic processes are processes that proceed randomly in time. Rather than consider fixed random variables.
Introduction Sample Size Calculation for Comparing Strategies in Two-Stage Randomizations with Censored Data Zhiguo Li and Susan Murphy Institute for Social.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Empirical Likelihood for Right Censored and Left Truncated data Jingyu (Julia) Luan University of Kentucky, Johns Hopkins University March 30, 2004.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
1 Using dynamic path analysis to estimate direct and indirect effects of treatment and other fixed covariates in the presence of an internal time-dependent.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 6.1: Recurrent event data Intensity processes and rate functions Robust.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Double the confidence region S. Eguchi, ISM & GUAS This talk is a part of co-work with J. Copas, University of Warwick.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Multiple Random Variables and Joint Distributions
Probability Theory and Parameter Estimation I
Appendix A: Probability Theory
Ch3: Model Building through Regression
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The Empirical FT. What is the large sample distribution of the EFT?
STOCHASTIC HYDROLOGY Random Processes
EM for Inference in MV Data
Econometrics Chengyuan Yin School of Mathematics.
EM for Inference in MV Data
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Presentation transcript:

Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina Research supported by NIH and NSF Grants Based on joint works with R. Strawderman (Cornell) and M. Hollander (Florida State) Talk at Mississippi State Univ, 10/19/01

2 A Real Recurrent Event Data (Source: Aalen and Husebye (‘91), Statistics in Medicine) Variable: Migrating motor complex (MMC) periods, in minutes, for 19 individuals in a gastroenterology study concerning small bowel motility during fasting state.

3 Pictorial Representation of Data for a Unit or Subject Consider unit/subject #3. K = 3 Gap Times, T j : 284, 59, 186 Censored Time,  -S K : 4 Calendar Times, S j : 284, 343, 529 Limit of Obs. Period:  0  =533 S 1 =284 S 2 =343 S 3 =529 T 1 =284 T 2 =59 T 3 =186  -S 3 =4 Calendar Scale

4 Features of Data Set Random observation period per subject (administrative constraints). Length of period:  Event of interest is recurrent. A subject may have more than one event during observation period. # of events (K) informative about MMC period distribution (F). Last MMC period right-censored by a variable informative about F. Calendar times: S 1, S 2, …, S K. Right-censoring variable:  -S K.

5 Assumptions and Problem Aalen and Husebye: “Consecutive MMC periods for each individual appear (to be) approximate renewal processes.” Translation: The inter-event times T ij ’s are assumed stochastically independent. Problem: Under this IID assumption, and taking into account the informativeness of K and the right- censoring mechanism, to estimate the inter-event distribution, F.

6 General Form of Data Accrual Calendar Times of Event Occurrences S i0 =0 and S ij = T i1 + T i2 + … + T ij Number of Events in Observation Period K i = max{j: S ij <  i } Upper limit of observation periods,  ’s, could be fixed, or assumed to be IID with unknown distribution G.

7 Observables Theoretical Problem To obtain an estimator of the gap- time or inter-event time distribution, F; and to determine its properties.

8 Relevance and Applicability Recurrent phenomena occur in a variety of settings. –Outbreak of a disease. –Terrorist attacks. –Labor strikes. –Hospitalization of a patient. –Tumor occurrence. –Epileptic seizures. –Non-life insurance claims. –When stock index (e.g., Dow Jones) decreases by at least 6% in one day.

9 Limitations of Existing Estimation Methods Consider only the first, possibly right-censored, observation per unit and use the product-limit estimator (PLE). –Loss of information –Inefficient Ignore the right-censored last observation, and use empirical distribution function (EDF). –Leads to bias (“biased sampling”). –Estimator actually inconsistent.

10 Review: Prior Results T 1, T 2, …, T n IID F(t) = P(T < t) Empirical Survivor Function (EDF) Asymptotics of EDF where W 1 is a zero-mean Gaussian process with covariance function Single-Event Complete Data

11 In Hazards View Hazard rate function Cumulative hazard function Equivalences Another representation of the variance

12 Single-Event Right-Censored Data Failure times: T 1, T 2, …, T n IID F Censoring times: C 1, C 2, …, C n IID G Right-censored data (Z 1,  1 ), (Z 2,  2 ), …, (Z n,  n ) Z i = min(T i, C i )  i = I{T i < C i } with Product-limit or Kaplan-Meier Estimator

13 PLE Properties Asymptotics of PLE where W 2 is a zero-mean Gaussian process with covariance function If G(w) = 0 for all w, so no censoring,

14 Relevant Stochastic Processes for Recurrent Event Setting Calendar-Time Processes for ith unit is a vector of square-integrable zero-mean martingales. Then,

15 is the length since last event at calendar time v Needed: Calendar-Gaptime Space 0 Difficulty: arises because interest is on (.) or  (.), but these appear in the compensator process in form GapTime Calendar Time s t For Unit 3 in MMC Data

16 Processes in Calendar-Gaptime Space N i (s,t) = # of events in calendar time [0,s] for ith unit whose gaptimes are at most t Y i (s,t) = number of events in [0,s] for ith unit whose gaptimes are at least t: “at-risk” process

17 GapTime Calendar Time s t s=400 t=100 MMC Unit #3 N 3 (s=400,t=100) = 1 Y 3 (s=400,t=100) = 1 K 3 (s=400) = 2

18 “Change-of-Variable” Formulas

19 Estimators of  and F for the Recurrent Event Setting By “change-of-variable” formula, RHS is a sq-int. zero-mean martingale, so Estimator of  (t)

20 Estimator of F Since by substitution principle, a generalized product-limit estimator (GPLE). GPLE extends the EDF for complete data, and the PLE or KME for single- event right-censored data.

21 Asymptotic Properties of GPLE Special Case: If F = EXP(  ) and G = EXP(  )

22 Weak Convergence Theorem: ; Proof relied on weak convergence theorem for recurrent and renewal settings in Pena, Strawderman and Hollander (2000), which utilized ideas in Sellke (1988) and Gill (1980).

23 Comparison of Limiting Variance Functions PLE: GPLE (recurrent event): For large s, For large t or if in stationary state, R(t) = t/  F, so approximately, with  G (w) being the mean residual life of  given  > w. EDF:

24 Wang-Chang Estimator (JASA, ‘99) Beware! Wang and Chang developed this estimator to be able to handle correlated inter- event times, so comparison with GPLE is not completely fair to their estimator!

25 Frailty-Induced Correlated Model Correlation induced according to a frailty model: U 1, U 2, …, U n are IID unobserved Gamma( ,  ) random variables, called frailties. Given U i = u, (T i1, T i2, T i3, …) are independent inter-event times with Marginal survivor function of T ij:

26 Frailty-Model Estimator Frailty parameter, , determines dependence among inter-event times. Small (Large)  : Strong (Weak) dependence. EM algorithm needed to obtain estimator (FRMLE). Unobserved frailties viewed as missing. Parallels Nielsen, Gill, Andersen, and Sorensen (1992). GPLE needed in EM algorithm. GPLE is not consistent when frailty parameter is finite, that is, when IID model does not hold.

27 Monte Carlo Studies Under gamma frailty model. F = EXP(  ):  = 6 G = EXP(  ):  = 1 n = 50 # of Replications = 1000 Frailty parameter  took values in {Infty (IID), 6, 2} Computer programs: S-Plus and Fortran routines.

28 IID Simulated Comparison of the Three Estimators for Varying Frailty Parameter Black=GPLE; Blue=WCPLE; Red=FRMLE

29 Effect of the Frailty Parameter (  ) for Each of the Three Estimators (Black=Infty; Blue=6; Red=2)

30 The Three Estimates of Inter- Event Survivor Function for the MMC Data Set IID assumption seems acceptable. Estimate of  is 10.2.