Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Bayesian Estimation in MARK
Statistics review of basic probability and statistics.
Monte Carlo Methods for Inference and Learning Guest Lecturer: Ryan Adams CSC 2535
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Bayesian Reasoning: Markov Chain Monte Carlo
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Computational statistics, course introduction Course contents  Monte Carlo Methods  Random number generation  Simulation methodology  Bootstrap  Markov.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Introduction to Monte Carlo Methods D.J.C. Mackay.
1 1 Slide Statistical Inference n We have used probability to model the uncertainty observed in real life situations. n We can also the tools of probability.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis B. P. Carlin and A. E. Gelfand Statistics and Computing 1991 A Generic Approach to Posterior.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Tea – Time - Talks Every Friday 3.30 pm ICS 432. We Need Speakers (you)! Please volunteer. Philosophy: a TTT (tea-time-talk) should approximately take.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Mail Call Us: , , Data Science Training In Ameerpet
Sampling and Sampling Distributions
Introduction to Sampling based inference and MCMC
Inference concerning two population variances
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
STATISTICAL INFERENCE
Advanced Statistical Computing Fall 2016
Basic simulation methodology
Chapter 4. Inference about Process Quality
Jun Liu Department of Statistics Stanford University
How to handle missing data values
AP Statistics: Chapter 7
Markov Chain Monte Carlo
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Ch13 Empirical Methods.
Hierarchical Models.
Presentation transcript:

Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)

Introduction Sampling from a non-standard distribution Metropolis algorithm is sensitive to choice of proposal distribution Proposing changes that are too small leads to inefficient random walk Proposing changes that are too large leads to frequent rejections Inhibits the development of software that automatically constructs Markov chain samplers from model specifications

Alternative to Metropolis : Slice sampling Slice sampling – Requires knowledge of a function proportional to the target density – May not sample more efficiently than a well-designed Metropolis scheme, but often requires less effort to implement and tune – For some distributions, slice sampling can be more efficient, because it can adaptively choose a scale for changes appropriate to the region of the distribution being sampled

The idea of slice sampling Sample from a distribution for a variable x, whose density is proportional to some function f(x) Introduction of an auxiliary variable y The joint density for (x,y) is,

The idea of slice sampling Gibbs sampling to sample from p(x,y) – P(y/x) ~ uniform over (0, f(x)) – P(x/y) ~ uniform over the region (“slice” defined by y)

The idea of slice sampling Generating an independent point drawn uniformly from S may still be difficult, in which case we can substitute some update for x that leaves the uniform distribution over S invariant

Single-variable slice sampling Slice sampling is simplest when only one (real-valued) variable is being updated – Univariate – More typically, the single variable slice sampling methods of this section will be used to sample from multivariate distribution for x = (x1,..., xn) by sampling repeatedly for each variable in turn

Finding an appropriate interval After a value for the auxiliary variable has been drawn, defining the slice S, the next task is to find an interval I = (L,R), containing the current point, x0, from which the new point, x1, will be drawn – would like this interval to contain as much of the slice as is feasible, so as to allow the new point to differ as much as possible from the old point – like to avoid intervals that are much larger than the slice, as this will make the subsequent sampling step less efficient

Finding an appropriate interval

Stepping-out and shrinkage procedure

Sampling from the interval

Correctness of single-variable slice sampling We need to show that the selection of x1 to follow x0 in steps (b) and (c) of the single-variable slice sampling procedure leaves the joint distribution of x and y invariant Since these steps do not change y, this is the same as leaving the conditional distribution for x given y invariant, and this conditional distribution is uniform over S = {x :y <f(x)}, the slice defined by y We can show invariance of this distribution by showing that the updates satisfy detailed balance, which for a uniform distribution reduces to showing that the probability density for x1 to be selected as the next state, given that the current state is x0, is the same as the probability density for x0 to be the next state, given that x1 is the current state, for any states x0 and x1 within S Recall: f(x1|x0)P(x0) = f(x0|x1)P(x1) For uniform distribution: f(x1|x0)= f(x0|x1)

Overrelaxed slice sampling When dependence between variables are strong, the conditional distribution will be much narrower than the corresponding marginal distributions, p(x i ), and many iterations of the Markov chain will be necessary for the state to visit the full range of the distribution defined by p(x) – In typical MH the distribution is explored by taking small steps in each direction and the direction of these steps is randomized in each iteration – Sampling efficiency can be improved in this context by suppressing the random walk behavior characteristic of simple schemes such as Gibbs sampling – One way of achieving this is by using “overrelaxed” updates Like Gibbs sampling, overrelaxation methods update each variable in turn, but rather than drawing a new value for a variable from its conditional distribution independently of the current value, the new value is instead chosen to be on the opposite side of the mode from the current value – In Adler’s (1981) scheme, applicable when the conditional distributions are Gaussian, the new value for variable i is

Overrelaxed Gibbs sampling

Overrelaxed slice sampling

Experimental results The task is to sample from a distribution for ten real-valued variables, v and x1 to x9 – The marginal distribution of v is Gaussian with mean zero and standard deviation 3 – Conditional on a given value of v, the variables x1 to x9 are independent, with the conditional distribution for each being Gaussian with mean zero and variance exp(v) The resulting shape resembles a ten-dimensional funnel, with small values for v at its narrow end, and large values for v at its wide end

Multivariate slice sampling methods

Although this simple multivariate slice sampling method is easily implemented, in one respect it works less well than applying single-variable slice sampling to each variable in turn – When each variable is updated separately, the interval for that variable will be shrunk only as far as needed to obtain a new value within the slice – The amount of shrinkage can be different for different variables. In contrast, the procedure of Figure 8 shrinks all dimensions of the hyperrectangle until a point inside the slice is found, even though the probability density may not vary rapidly in some of these dimensions, making shrinkage in these directions unnecessary

Multivariate slice sampling methods Multivariate slice sampling using hyperrectangles will usually not offer much advantage over single-variable slice sampling (as is also the case with multivariate versus single-variable Metropolis methods)