1 A Bayes method of a Monotone Hazard Rate via S-paths Man-Wai Ho National University of Singapore Cambridge, 9 th August 2007.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Exact Inference in Bayes Nets
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Markov Chains 1.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Presenting: Assaf Tzabari
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Visual Recognition Tutorial
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Continuous Random Variables and Probability Distributions
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Lecture II-2: Probability Review
Approximate Inference 2: Monte Carlo Markov Chain
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Model Inference and Averaging
Random Sampling, Point Estimation and Maximum Likelihood.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Bayesian Analysis and Applications of A Cure Rate Model.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Assessing Estimability of Latent Class Models Using a Bayesian Estimation Approach Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University Departments.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Efficiency Measurement William Greene Stern School of Business New York University.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Model Inference and Averaging
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Kernel Stick-Breaking Process
Markov Networks.
Haim Kaplan and Uri Zwick
Generalized Spatial Dirichlet Process Models
Ch13 Empirical Methods.
Econometrics Chengyuan Yin School of Mathematics.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Markov Networks.
Presentation transcript:

1 A Bayes method of a Monotone Hazard Rate via S-paths Man-Wai Ho National University of Singapore Cambridge, 9 th August 2007

2 Agenda Overview of Estimation of Monotone Hazard Rates What is an S-path? A class of random monotone (non-decreasing or non-increasing) hazard rates –Posterior analysis via S-paths A Markov chain Monte Carlo (MCMC) method Numerical Examples

3 Hazard Rate / Hazard Function The hazard rate at time, is interpreted as the instantaneous probability of failure of an object. A wide variety of shapes: –The simplest case of a constant hazard rate corresponds to an exponential lifetime distribution. –Cases of increasing or decreasing hazard rate: lifetime distributions that are of a heavier or lighter tail, respectively, compared with an exponential distribution. t Lifetime ¸ ( t ) = l i m ± ! 0 P r ( t ·T < t + ± j T¸ t ) ±

4 Estimation of Monotone Hazard Rates – Bayesian Approach A non-increasing (or “decreasing”) hazard rate on the positive line may be written in the mixture form where is the indicator function of a set. The unknown measure is modeled as a random measure / random process. for a non-decreasing hazard rate. I ( A ) A ¹ R = ( 0 ; 1 ) ¸ ( t j ¹ ) = R R I ( t < u ) ¹ ( d u ) ; I ( t > u )

5 Motivation of this Work Bayes estimate (posterior mean) of an increasing hazard rate as a sum over e-vectors [Dykstra & Laud (1981)] or m-vectors [Lo & Weng (1989)] based on extended / weighted Gamma processes. Lo & Weng (1989) characterized the posterior distribution of any random hazard rate with being a weighted gamma process in terms of random partitions p. ¹ ¸ ( t j ¹ ) = R R k ( t ; u ) ¹ ( d u )

6 Motivation of this Work Recently, James (2002, 2005) generalized the result of Lo & Weng (1989) about general hazard rates by modeling as a completely random measure –Analogously obtained a characterization of the posterior distribution in terms of partitions p. –A completely random measure [Kingman (1967, 1993)] includes Gamma process, weighted Gamma process, generalized gamma process [Brix (1999)], stable process and many other random measures as special cases. ¹ ¸ ( t j ¹ ) = R R k ( t ; u ) ¹ ( d u )

7 Motivation of this Work Monotone Hazard Rates K erne l i s I ( t < u ) General Hazard Rates K erne l i s k ( t ; u ) Partitions structure S-paths structure ? ¹ i sawe i g h t e d G ammaran d ommeasure ¹ i sacomp l e t e l yran d ommeasure

8 What is an S-path? An integer-valued vector satisfying Denote S = ( S 0 ; S 1 ;:::; S n ¡ 1 ; S n ) j S j 0 n n ( i ) S 0 = 0 ( ii ) S n = n ( iii ) S j · j ( i v ) S j · S j + 1. m j = S j ¡ S j ¡ 1 ; f or j = 1 ;:::; n ) ( m 1 ;:::; m n )

9 What is an S-path? Served as an alternative for clustering integers provided that only (i) number of elements, and (ii) the maximum index, in each cluster are concerned. –Suppose The path conveys One cluster of 2 integers with 3 as the maximum index Another cluster of 2 integers with 4 as the maximum index A combinatorial reduction of p, e.g., if n n = 4 : n = 4 ; f ( 0 ; 0 ; 0 ; 2 ; 4 ) correspondence S = ( 0 ; 0 ; 0 ; 2 ; 4 ) p 1 ; p 2 2 C S p 1 = f( 1 ; 3 ) ; ( 2 ; 4 )g p 2 = f( 2 ; 3 ) ; ( 1 ; 4 )g

10 What is an S-path? One of the advantages of using S-paths over partitions: for a fixed, the space of all S is much less than the space of all p ( ) n ¼ n !

11 A Class of Random Decreasing Hazard Rate Consider a class of random “decreasing” hazard rates on defined by where is a completely random measure on. This class contains the decreasing counterpart of the models considered by Dykstra & Laud (1981) based on extended/weighted Gamma process ¹ R ¸ ( t j ¹ ) = R R I ( t < u ) ¹ ( d u ) ; t 2 R R = ( 0 ; 1 )

12 The Completely Random Measure An “Independent increment” process uniquely characterized by the Laplace functional Alternatively, can be represented in a distributional sense as where is a Poisson random measure on with intensity measure ¹ ¹ ( d u ) = R R z N ( d z ; d u ) N ( d z ; d u ) R £ R L ¹ ( g j ½ ; ´ ) = exp · ¡ Z R Z R ³ 1 ¡ e ¡ g ( u ) z ´ ½ ( d z j u ) ´ ( d u ) ¸ E [ N ( d z ; d u )] = ½ ( d z j u ) ´ ( d u )

13 The Data The data is observed upon a right-censorship scheme. Suppose we collect observations, denoted by, from items with monotone hazard rates until time. – are completely observed failure times, and – are the right-censored times. T 1 < ¢¢¢ < T n < ¿ N N ¿ T = ( T 1 ;:::; T n ;:::; T N ) T n + 1 = ¢¢¢ = T N ´ ¿

14 Posterior Analysis The likelihood is given by [Aalen (1975, 1978)] where The posterior distribution of is proportional to the product of the likelihood and the prior ¹ N ! ( N ¡ n ) ! " n Y i = 1 Z R I ( T i < u i ) ¹ ( d u i ) # exp [ ¡ ¹ ( g N )] ¹ ( g N ) = Z ¿ 0 " N X i = 1 I ( T i ¸ t ) # · Z R I ( t < w ) ¹ ( d w ) ¸ d t

15 Posterior Analysis A streamline proof: –Note –Augment the latent variables and work with the joint distribution of –Apply Proposition 2.3 in James (2005) to get a nice product form –recognize from the posterior distribution that the information carried by a partition about the remaining members other than the maximal element in any cell is irrelevant: when ¹ ( d u i ) = R R z i N ( d z i ; d u i ), i = 1 ;:::; n ( z i ; u i ), i = 1 ;:::; n ( z ; u ; N ; T ) p 2 C S n ( p ) Y i = 1 Y j 2 C i I ( T j < v i ) = n ( p ) Y i = 1 I ( max j 2 C i T j < v i ) = Y f j :m j > 0 g I ( T j < y j )

16 Posterior Analysis Theorem 1: The posterior law of given the data can be described by a three-step experiment: 1)An S-path has a distribution where for any integer and ¹ · i ( e ¡ f N ½ j u ) = R R z i e ¡ g N ( u ) z ½ ( d z j u ) < 1 ; i > 0 ; g N ( u ) = R ¿ 0 h P N i = 1 I ( T i ¸ t ) i I ( t < u ) d t T S = ( 0 ; S 1 ;:::; S n ¡ 1 ; n ) Z ( S ) = Á ( S )= P S Á ( S ), f N ( z ; u ) = g N ( u ) z Á ( S ) = Y f j :m j > 0 g µ j ¡ 1 ¡ S j ¡ 1 j ¡ S j ¶ Z 1 T j · m j ( e ¡ f N ½ j y ) ´ ( d y ) ;

17 Posterior Analysis 2)Given S, there exists independent pairs of, denoted by where is distributed as 3)Given has a distribution as where is a completely random measure characterized by ( y j ; Q j ) ( y ; Q ) = f( y j ; Q j ) :m j > 0 ; j = 1 ;:::; n g, P r f Q j 2 d z j S ; y j ; T g /z m j e ¡ g N ( y j ) z ½ ( d z j y j ) : ( S ; y ; Q ), ¹ ¤ N ¹ y j j S ; T ´ j ( d y j j S ; T ) / I ( T j < y j ) · m j ( e ¡ f N ½ j y j ) ´ ( d y j ) ; ¹ ¤ N + P f j :m j > 0 g Q j ± y j ; e ¡ g N ( u ) z ½ ( d z j u ) ´ ( d u ). P n j = 1 I ( m j > 0 )

18 The Bayes Estimate The Bayes estimate (posterior mean) of a decreasing hazard rate given is given by where if ; otherwise 0. T m j > 0 ¸ j ( t j S ) = Z 1 max ( t ; T j ) · m j + 1 ( e ¡ f N ½ j y ) ´ ( d y ) Z 1 T j · m j ( e ¡ f N ½ j y ) ´ ( d y ) ¸ 0 ( t ) = R 1 t · 1 ( e ¡ f N ½ j y ) ´ ( d y ) E [ ¸ ( t j ¹ )j T ] = ¸ 0 ( t ) + X S Z ( S ) n X j = 1 ¸ j ( t j S )

19 The Bayes Estimate The posterior mean is a sum over all S-paths with coordinates. The computation is formidable even when even though the total number of S is smaller than that of p NO exact simulation method for S! Possible strategies: –Develop Markov chain Monte Carlo (MCMC) algorithms or sequential importance sampling (SIS) methods for sampling S (not straightforward!) –Use algorithms for sampling p or latent variables? n + 1 n = 50

20 The Bayes Estimate Consistent estimate due to the weak consistency of the posterior by Drǎgichi & Ramamoorthi (2003) Always less variable than as a specialization of James (2002, 2005) according to a Rao-Blackwellization argument : by a discrete uniform conditional distribution of p|S,T and constancy of for all h ( p ) p 2 C S n X i = 1 ¸ j ( t j S ) = E [ h ( p )j S ; T ] = X p 2 C S h ( p ) ¼ ( p j S ; T ) ¸ 0 ( t ) + P p W ( p ) h ( p )

21 An MCMC method To draw a Markov chain of S-paths, which has a unique stationary distribution given by Generalize an accelerated path (AP) sampler [Ho (2002)] -- an efficient MCMC method for sampling S-paths in Bayesian nonparametric models based on Dirichlet process and Gamma process. –An improvement over a naïve Gibbs sampler [Ho (2002)] for S-paths. Z ( S ) = Á ( S )= P S Á ( S )

22 The AP Sampler A transition cycle contains steps. At step : –Obtain from the current path q = m i n f i > r:m i > 0 g r ( = 1 ;:::; n ¡ 1 ) n ¡ 1 S j r ¡ 1 r q n j 0

23 The AP Sampler –Note that the current S is given by where –Then, the new S will benew S for, with (transition) probability proportional to (transition) probability proportional to Repeat for steps to finish one cycle ( 0 ; S 1 ;:::; S r ¡ 1 ; c ;:::; c ; S q ;:::; S n ¡ 1 ; n ) S r ¡ 1 · c · m i n ( r ; S q ¡ 1 ) ( 0 ; S 1 ;:::; S r ¡ 1 ; j ;:::; j ; S q ;:::; S n ¡ 1 ; n ) j = S r ¡ 1 ; S r ¡ ;:::; m i n ( r ; S q ¡ 1 ) Á (( 0 ; S 1 ;:::; S r ¡ 1 ; j ;:::; j ; S q ;:::; S n ¡ 1 ; n )) r = 1 ;:::; n ¡ 1

24 0 r ¡ 1 r j S j q S q S r = S r ¡ 1 S r = S r ¡ S r = r

25 The AP Sampler – the Transition Probabilities I f j = S r ¡ 1, t h epro b a b i l i t y i spropor t i ona l t o O t h erw i se i f j 2 f S r ¡ ; S r ¡ ;:::; m i n ( r ; S q ¡ 1 )g, t h epro b a b i l i t y i spropor t i ona l t o r ¡ S r ¡ 1 S q ¡ 1 ¡ S r ¡ 1 Z 1 T q · S q ¡ S r ¡ 1 ( e ¡ f N ½ j y ) ´ ( d y ) µ S q ¡ S r ¡ 1 ¡ 2 S q ¡ j ¡ 1 ¶ q ¡ 1 Y i = r + 1 µ i ¡ j i ¡ S r ¡ 1 ¶ £ Z 1 T r · j ¡ S r ¡ 1 ( e ¡ f N ½ j y ) ´ ( d y ) £ Z 1 T q · S q ¡ j ( e ¡ f N ½ j z ) ´ ( d z )

26 Evaluation of the Bayes Estimate Start with an arbitrary path Repeat cycles to yield a Markov chain Compute the ergodic average to approximate the sum in the posterior mean of the monotone hazard rate M S ( 0 ). S ( 0 ) ; S ( 1 ) ;:::; S ( M ) 1 M M X i = 1 n X j = 1 ¸ j ( t j S ( i ) ) X S Z ( S ) n X j = 1 ¸ j ( t j S )

27 Rationale behind the AP Sampler Markov chain is defined by a transition kernel A Markov chain has a unique stationary distribution –Irreducible transition kernel: all states in the space communicate with each other within one cycle Construct reducible kernels such that the stationary distribution of the resulted chain is the target distribution and the product of them is irreducible [Hastings (1970; Tierney (1994)] n ¡ 1

28 Numerical Examples Gamma process for (i.e., ) with shape measure Lifetime data of an item with a hazard rate Data are generated subject to ; the censoring rate is about 15% Monte Carlo size is Initial path: ¹ ¿ = 3 ¸ ( t ) = ½ 10 · t < 1 0 : 5 t ¸ 1 : M = 1000 S ( 0 ) = ( 0 ; 1 ;:::; n ¡ 1 ; n ) ´ ( d u ) = 1 6 I ( 0 < u < 6 ) d u ½ ( d z j u ) = z ¡ 1 e ¡ z d z

29

30 Comparisons between Different Methods Many commonly-used partition-based methods [Lo, Brunner & Chan (1996); Ishwaran & James (2003); James (2005)] Replicate 1000 independent hazard estimates by each of the three available methods: (i) the AP sampler, (ii) the naïve Gibbs path (gP) sampler, and (iii) the gWCR sampler in Lo, Brunner and Chan (1996).

31

32 Comparisons between Different Methods At different time points, the three averages are close to each other, yet the standard errors vary substantially. –The standard error of hazard rate estimates produced by the AP sampler is the smallest among all the three methods. –The AP sampler definitely outweighs the naive Gibbs path sampler and beats the closest competitor, the gWCR sampler, by a comfortable margin.

33 Conclusions Tractable posterior distribution and Bayes estimate in terms of S-paths A Rao-Blackwellization result for S over p An efficient numerical method for sampling S-paths Accentuate the importance of study and usage of S-paths in models with monotonicity constraints (e.g., symmetrical unimodal density, monotone density, unimodal density, bathtub-shaped hazard rate,…)

34 THANK YOU!