Download presentation
Presentation is loading. Please wait.
Published byDana Greene Modified over 9 years ago
1
UW Winter 07 1 IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION Donald A. Pierce, Oregon Health Sciences Univ Ruggero Bellio, Udine, Italy These slides are at www.science.oregonstate.edu/~piercedo/
2
UW Winter 07 2 Nearly all survival analysis uses first-order asymptotics: limiting distns of MLE, LR, or scores ; interest here only on Cox regression, partial likelihood Usually these approximations are quite good, but it is of interest to verify this or improve on them (Samuelsen, Lifetime Data Analysis, 2003) We consider both higher-order asymptotics and more direct simulation of P-values Primary issue: inference beyond first-order requires more than the likelihood function This may lead to unreasonable dependence of methods on the censoring model and baseline hazard Our approach involves forms of conditioning on censoring
3
UW Winter 07 3 Consider direct simulation of P-values without this (same issues arise in higher-order asymptotics) One must estimate the baseline hazard, sample failure times according to this, then apply the censoring model which may involve estimating a censoring distribution Quite unattractive in view of the essential nature of Cox regression With suitable conditioning, and some further conventions regarding the censoring model, this can be avoided Aim is to maintain the rank-based nature of inference in the presence of censoring (simulation: sample failures from exponential distn, apply censoring to ranks) We provide convenient Stata and R routines for carrying out both the simulation and higher-order asymptotics.
4
UW Winter 07 4 COX REGRESSION: Hazards of form, with unspecified. Interest parameter a scalar function of with remaining coordinates as nuisance parameters X O X O O X Risk set : those alive at failure time Multinomial likelihood contribution, the probability that it is individual (i) among these that fails. Partial likelihood
5
UW Winter 07 5 Useful reference sets for interpreting a given dataset: (i) data-production frame of reference (ii) conditional on “ censoring configuration ” (iii) treating all risk sets as fixed Using (i) involves censoring model, estimation of baseline hazard and censoring distribution (see Dawid 1991 JRSSS-A regarding data-production and inferential reference sets) That of (ii) requires some development/explanation. By “ censoring configuration ” we mean the numbers of censorings between successive ordered failures Approach (iii) is not really “ conditional ”, but many may feel this is the most appropriate reference set --- things are certainly simple from this viewpoint. Applies when risk sets arise in complicated ways, and to time-dependent covariables
6
UW Winter 07 6 EXTREME * EXAMPLE TO SHOW HOW THINGS ARE WORKING: n = 40 with 30% random censoring, log(RR) interest parameter 1.0 with binary covar -- 5 nuisance params in RR involving exponential covariables. Hypotheses where one-sided Wald P-value is 0.05 * 6 covariables with <30 failures results typical for datasets Lower Upper LR first order0.046 0.062 Data production, exact (simulation)0.090 0.020 Conditional, exact (simulation)0.103 0.024 Conditional, 2 nd order asymptotics0.096 0.025 Fixed risk sets, exact (simulation)0.054 0.051 Fixed risk sets, 2 nd order asymptotics0.052 0.052
7
UW Winter 07 7 With fewer failures, and fewer nuisance parameters, adjustments are smaller and thus harder to summarize. Lower Upper LR first order0.042 0.065 Data production, exact (simulation)0.053 0.040 Conditional, exact (simulation)0.054 0.037 Conditional, 2 nd order asymptotics0.060 0.043 Fixed risk sets, 2 nd order asymptotics0.047 0.051 However, the following for a typical dataset shows the essential nature of results. This is for n = 20 with 25% censoring, interest parameter as before, and only 1 nuisance parameter. Samuelsen’s conclusion, that in small samples the Wald and LR confidence intervals are conservative, does not seem to hold up with any useful generality
8
UW Winter 07 8 CONDITIONING ON “ CENSORING CONFIGURATION ” That is, on the vector, where is the number censored following the j th ordered failure Seems easy to accept that this is “ ancillary ” information, for inference about relative risk when using partial likelihood. It could be that “ ancillary ” is not the best term for this (comments please!!) The further convention involved in making this useful pertains to which individuals are censored Our convention for this: in martingale fashion, sample from risk sets the to be censored, with probabilities possibly depending on covariables (comments please!!) Unless these probabilities depend on covariables, a quite exceptional assumption, results of Kalbfleisch & Prentice (1973 Bka) apply: partial likelihood is the likelihood of “ reduced ranks ”
9
UW Winter 07 9 Recall that a probability model for censoring is often (but with notable exceptions) sort of a “fiction” concocted by the statistician, with following aims A common model is that for each individual there is a fixed, or random, latent censoring time and what is observed is the minimum of the failure and censoring time Leads to usual likelihood function: product over individuals of The use of censoring models is usually only to consider whether this likelihood is valid (censoring is “uninformative”) --- model is not used beyond this But usual models as above render the problem not to be one only involving ranks, whereas our conditioning and convention maintain the rank-based inferential structure
10
UW Winter 07 10 “ Reduced ranks ”, or marginal distribution of ranks, concept – individual 3 is here censored x x O x Compatible ranks for uncensored data – the single “ reduced ranks ” outcome 2, 3, 4, 1 2, 4, 3, 1 2, 4, 1, 3 Partial likelihood, as a function of the data, provides the distribution of these reduced ranks
11
UW Winter 07 11 Thus with our conditioning and convention, and no direct dependence of censoring on covariates, the K&P result yields that the partial likelihood is the actual likelihood for the “ reduced rank ” data Means that all the theory of higher-order likelihood inference applies to partial likelihood (subject to minor issues of discreteness) --- a more general argument exists for the data-production reference set Higher-order asymptotics depend only on certain covariances of scores and loglikelihood Either exact or asymptotic results can in principle be computed from the K&P result, but simulation is both simpler and more computationally efficient Simulation for asymptotics is considerably simpler than for exact results (no need to fit models for each trial), but many will prefer the latter when it is not problematic
12
UW Winter 07 12 SIMULATION OF P-VALUES: With conditioning, one may: (i) simulate failure times using constant baseline hazard since only the ranks matter (ii) apply censoring process to the rank data, and (iii) fit the two models Our primary aim is to lay out assumptions justifying (i) and (ii). (comments please!!) Highly tractable, except that null and alternative model must be fitted for each trial Quite often must allow for “ infinite ” MLEs, but even with this can be problematic for small samples Primary advantage over asymptotics is the transparency Stata procedure uses same syntax as the ordinary fitting routine, takes about a minute for 5,000 trials
13
UW Winter 07 13 SECOND-ORDER METHODS: This is for inference about scalar functions of the RR. It involves the quantity proposed by Barndorff-Nielsen, where r is the signed-root maximum LR, and adj involves more than the likelihood function. Insight into limitations of first-order methods derives from decomposing this adjustment as where NP allows for fitting nuisance parameters and INF basically allows for moving from likelihood to frequency inference. Generally, INF is only important for fairly small samples, but NP can be important for reasonable amounts of data when there are several nuisance parameters.
14
UW Winter 07 14 COMPUTATION OF THIS: Will not give (the fairly simple) formulas here, but they involve computing where the parameter are then evaluated at the constrained and full MLEs (formulas: Pierce & Bellio, Bka 2006, 425) These must be computed by simulation, raising the same issues about reference sets, but this is far easier than the simulation of likelihood ratios Quantities above pertain to statistical curvature, and at least in our setting the magnitude and direction of the NP adjustment relate to the extent and direction of the curvature introduced by variation in composition of risk sets
15
UW Winter 07 15 RISK SETS AS FIXED: Things simplify considerably for the inferential reference set where the risk sets are taken as fixed (and experiments on these as independent) Use of this reference set often seems necessary when the risk sets arise in complex ways, mainly useful for inference about relative risk beyond analysis of simple response-time data It is also quite adequate for all needs when the numbers at risk are large in relation to the number of failures (rare events).
16
UW Winter 07 16 FORMULAS FOR FIXED RISK SETS: In this case the setting is one of independent multinomial experiments defined on the risk sets. Following is for loglinear RR Formulas of Pierce & Peters 1992 JRSS-B apply, yielding Where w is the Wald statistic, and is the ratio of determinants of the nuisance parameter information at the full and constrained MLEs May be useful in exploring for what settings the NP adjustment is important: nuisance parameter information must “ vary rapidly ” with the value of the interest parameter However, these adjustments are smaller than for our other reference sets
17
UW Winter 07 17 SAME AS FIRST EXAMPLE (5 nuisance parameters) BUT WITH: n = 500 with 97% random censoring (fewer failures than before, namely about 15) – rare disease case Remainder of model specification as in first example, results when Wald P-value is 0.05 typical results for a single dataset, lower limits LR first order0.057 Data-production refset0.059 Conditional exact (direct simulation)0.054 Conditional, second-order0.054 Fixed risk sets, exact (simulation)0.055 Fixed risk sets, 2 nd order0.052
18
UW Winter 07 18 OVERALL RECOMMENDATIONS: 1.Seems that adjustments will usually be small, but it may be at least worthwhile to verify that in many instances if convenient enough. 2.Will provide routines in Stata and R. The Stata one largely uses same syntax as basic fitting command. 3.When failures are a substantial fraction of those at risk, use conditional simulation of P-values unless problems with fitting are encountered 4.If those problems are likely or encountered, then use the 2 nd -order methods. These also provide more insight. 5.When failures are a small fraction of those at risk, or when risk sets arise in some special way, use the asymptotic fixed risk set calculations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.