MCMC in practice Start collecting samples after the Markov chain has “mixed”. How do you know if a chain has mixed or not? In general, you can never “proof”

Slides:



Advertisements
Similar presentations
Generalised linear mixed models in WinBUGS
Advertisements

Contrastive Divergence Learning
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
THE CENTRAL LIMIT THEOREM
COMPUTER PROGRAMMING I Essential Standard 5.02 Understand Breakpoint, Watch Window, and Try And Catch to Find Errors.
Exact Inference in Bayes Nets
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
Bayesian Estimation in MARK
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Markov-Chain Monte Carlo
Markov Networks.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Gibbs sampling for motif finding in biological sequences Christopher Sheldahl.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
6. More on the For-Loop Using the Count Variable Developing For-Loop Solutions.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Today Introduction to MCMC Particle filters and MCMC
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Bayes Factor Based on Han and Carlin (2001, JASA).
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Markov Random Fields Probabilistic Models for Images
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
 In computer programming, a loop is a sequence of instruction s that is continually repeated until a certain condition is reached.  PHP Loops :  In.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
K means ++ and K means Parallel Jun Wang. Review of K means Simple and fast Choose k centers randomly Class points to its nearest center Update centers.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Sampling Distribution of a Sample Proportion Lecture 28 Sections 8.1 – 8.2 Wed, Mar 7, 2007.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Fast search for Dirichlet process mixture models
Introduction to Sampling based inference and MCMC
Bayesian Semi-Parametric Multiple Shrinkage
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
ERGM conditional form Much easier to calculate delta (change statistics)
Classification of unlabeled data:
Jun Liu Department of Statistics Stanford University
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
Markov Networks.
Haim Kaplan and Uri Zwick
Sampling Distribution
Sampling Distribution
CS 188: Artificial Intelligence
Markov Chain Monte Carlo
Class #19 – Tuesday, November 3
Lecture 15 Sampling.
CSCI N207 Data Analysis Using Spreadsheet
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Sampling Distributions
Markov Networks.
Presentation transcript:

MCMC in practice Start collecting samples after the Markov chain has “mixed”. How do you know if a chain has mixed or not? In general, you can never “proof” a chain has mixed But in may cases you can show that it has NOT. (If you fail to do so using several different methods, you probably convinced yourself that it has mixed.) Now it becomes: how do you know a chain has not mixed? Log likelihood plot Marginal plot (for several chains) Marginal scatter plot (for two chains)

Mixing? Probably NO Log likelihood iterations iterations Initialized from a bad (non-informative) configuration Initialized from a “good” configuration (found by other methods) Mixing? Log likelihood Probably iterations iterations Log likelihood NO iterations iterations

Each dot is a statistic (e.g., P(X_1 = x_10)) x-position is its estimation value from chain 1 y-position is its estimation value from chain 2 Mixing? NO Probably

Toy Model for Data Association Blue dots: variables, x_i (i=1,2,3,4) Red dots: observations (values that we assign to variables) What does the distribution look like? distance (A) (B) (C)

How do we sample from it? Add one observation (such that Gibbs would work) Two modes: Gibbs How does it traverse between the two modes? Block Gibbs (block size = 2) How do we sample? Metropolis Hasting Take larger steps using a proposal distribution. (We will come to details of this later.)

Try it yourself Connect to clusters with graphics Windows https://itservices.stanford.edu/service/unixcomputing/unix/moreX MacOS or Linux ssh –x user@corn.stanford.edu Copy the code to your own directory cp –r /afs/ir/class/cs228/mcmc ./ cd mcmc Run Matlab and execute the following scripts VisualMCMC1(10000, 0.1, 0.05); % live animation of sampling % parameters: num of samples, sigma, pause time after each sample Plot1; % the first few lines of Plot1.m contain the parameters you may want to play around with

Proposal distributions for M-H Back to the model with 4 observations What will Gibbs do on this? Proposal distribution 1 (flip two) Randomly pick two variables, flip their assignments What is the acceptance probability?

Proposal distributions for M-H Proposal distribution 2 (augmented path) 1. randomly pick one variable 2. sample it pretending that all observations are available 3. pick the variable whose assignment was taken (conflict), goto step 2 4. loop until step 2 creates no conflict What is the acceptance probability?

Proposal distributions for M-H Proposal distribution 3 (“smart” augmented path) Same as the previous one except for the highlighted 1. randomly pick one variable 2. sample it pretending that all observations are available (excluding the current one) 3. pick the variable whose assignment was taken (conflict), goto step 2 4. loop until step 2 creates no conflict What is the acceptance probability?

Try it yourself Run the following Matlab scripts: VisualMCMC2(10000, 0.7, 0.05); % live animation of sampling % parameters: num of samples, sigma, pause time after each sample Plot2; % the first few lines of Plot2.m contain the parameters you may want to play around with

Try to fix Gibbs with annealing A skewed distribution The right blue dot is moved up a little bit such that a < b How does the distribution look like? What would you use for annealing? (A) Multiple shorter chains. (B) One longer chain (suppose that our computation resource cannot afford multiple long chains) a b

Try it yourself Run the following Matlab scripts: VisualMCMC3(200, 0.06, 0.05, 50); % parameters: num of samples, sigma, pause time after each sample, num of chains to run What you will see using default parameters: Live animation of sampling for 5 chains with annealing, 5 chains without. At the end of 200 samples: Estimate P(x1=1) using Gibbs without annealing: 0.664000 Estimate P(x1=1) using Gibbs with annealing: 0.876000 Estimate P(x1=1) using Metropolis Hasting: 0.909100 (numbers may vary in different runs) should be close to the true value