Intro to Sampling Methods

Slides:



Advertisements
Similar presentations
11 - Markov Chains Jim Vallandingham.
Advertisements

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Advanced Artificial Intelligence
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Random-Variate Generation. Need for Random-Variates We, usually, model uncertainty and unpredictability with statistical distributions Thereby, in order.
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
Monte Carlo Methods in Partial Differential Equations.
Lecture II-2: Probability Review
Introduction to Monte Carlo Methods D.J.C. Mackay.
01/24/05© 2005 University of Wisconsin Last Time Raytracing and PBRT Structure Radiometric quantities.
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Probabilistic Robotics Bayes Filter Implementations.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Simulation techniques Summary of the methods we used so far Other methods –Rejection sampling –Importance sampling Very good slides from Dr. Joo-Ho Choi.
Markov-Chain Monte Carlo CSE586 Computer Vision II Spring 2010, Penn State Univ.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Introduction to Sampling based inference and MCMC
Lesson 8: Basic Monte Carlo integration
Oliver Schulte Machine Learning 726
Normal Distribution and Parameter Estimation
Introduction to Monte Carlo Method
Basic simulation methodology
Bayes Net Learning: Bayesian Approaches
Probabilistic Robotics
Artificial Intelligence
CS 4/527: Artificial Intelligence
Markov chain monte carlo
Introduction to particle filter
CAP 5636 – Advanced Artificial Intelligence
Visual Tracking CMPUT 615 Nilanjan Ray.
Auxiliary particle filtering: recent developments
Hidden Markov Models Part 2: Algorithms
CS 188: Artificial Intelligence Spring 2007
Probability Distributions
TexPoint fonts used in EMF.
Particle filters for Robot Localization
Multidimensional Integration Part I
Lecture 2 – Monte Carlo method in finance
Introduction to particle filter
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Ch13 Empirical Methods.
CS 188: Artificial Intelligence
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Particle Filtering.
Monte Carlo I Previous lecture Analytical illumination formula
Statistical Methods for Data Analysis Random number generators
Approximate Inference: Particle-Based Methods
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Lesson 4: Application to transport distributions
Generating Random Variates
Fractional-Random-Weight Bootstrap
Presentation transcript:

Intro to Sampling Methods CSE586 Computer Vision II Spring 2010, Penn State Univ

A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Rejection Sampling Importance Sampling

Integration and Area http://videolectures.net/mlss08au_freitas_asm/

Integration and Area http://videolectures.net/mlss08au_freitas_asm/

Integration and Area total # boxes http://videolectures.net/mlss08au_freitas_asm/

Integration and Area http://videolectures.net/mlss08au_freitas_asm/

Integration and Area As we use more samples, our answer should get more and more accurate Doesn’t matter what the shape looks like (0,0) (1,1) arbitrary region (even disconnected) (0,0) (1,1) area under curve aka integration!

Monte Carlo Integration Goal: compute definite integral of function f(x) from a to b Generate N uniform random samples in upper bound volume N upper bound c f(x) a b count the K samples that fall below the f(x) curve K K N Answer= * Area of upper bound volume

Monte Carlo Integration Sampling-based integration is useful for computing the normalizing constant that turns an arbitrary non-negative function f(x) into a probability density function p(x). Z Compute this via sampling (Monte Carlo Integration). Then: 1 Z Note: for complicated, multidimensional functions, this is the ONLY way we can compute this normalizing constant.

Sampling: A Motivation If we can generate random samples xi from a given distribution P(x), then we can estimate expected values of functions under this distribution by summation, rather than integration. That is, we can approximate: by first generating N i.i.d. samples from P(x) and then forming the empirical estimate:

Inverse Transform Sampling It is easy to sample from a discrete 1D distribution, using the cumulative distribution function. 1 wk 1 k N 1 k N

Inverse Transform Sampling It is easy to sample from a discrete 1D distribution, using the cumulative distribution function. 1) Generate uniform u in the range [0,1] u 1 2) Visualize a horizontal line intersecting bars 3) If index of intersected bar is j, output new sample xk=j j xk 1 k N

Inverse Transform Sampling Why it works: cumulative distribution function inverse cumulative distribution function

Efficient Generating Many Samples Basic idea: choose one initial small random number; deterministically sample the rest by “crawling” up the cdf function. This is O(N). odd property: you generate the “random” numbers in sorted order... Due to Arulampalam

A Brief Overview of Sampling Inverse Transform Sampling (CDF) Rejection Sampling Importance Sampling For these two, we can sample from continuous distributions, and they do not even need to be normalized. That is, to sample from distribution P, we only need to know a function P*, where P = P* / c , for some normalization constant c.

Rejection Sampling Need a proposal density Q(x) [e.g. uniform or Gaussian], and a constant c such that c(Qx) is an upper bound for P*(x) Example with Q(x) uniform generate uniform random samples in upper bound volume upper bound cQ(x) P*(x) accept samples that fall below the P*(x) curve the marginal density of the x coordinates of the points is then proportional to P*(x) Note the relationship to Monte Carlo integration.

Rejection Sampling More generally: generate sample xi from a proposal density Q(x) generate sample u from uniform [0,cQ(xi)] if u <= P*(xi) accept xi; else reject cQ(xi) P*(xi) cQ(x) P*(x) Q(x) xi

Importance “Sampling” Not for generating samples. It is a method to estimate the expected value of a function f(xi) directly Generate xi from Q(x) an empirical estimate of EQ(f(x)), the expected value of f(x) under distribution Q(x), is then However, we want EP(f(x)), which is the expected value of f(x) under distribution P(x) = P*(x)/Z

Importance Sampling P*(x) Q(x) When we generate from Q(x), values of x where Q(x) is greater than P*(x) are overrepresented, and values where Q(x) is less than P*(x) are underrepresented. To mitigate this effect, introduce a weighting term

Importance Sampling New procedure to estimate EP(f(x)): Generate N samples xi from Q(x) form importance weights compute empirical estimate of EP(f(x)), the expected value of f(x) under distribution P(x), as

Resampling Note: We thus have a set of weighted samples (xi, wi | i=1,…,N) If we really need random samples from P, we can generate them by resampling such that the likelihood of choosing value xi is proportional to its weight wi This would now involve now sampling from a discrete distribution of N possible values (the N values of xi ) Therefore, regardless of the dimensionality of vector x, we are resampling from a 1D distribution (we are essentially sampling from the indices 1...N, in proportion to the importance weights wi). So we can using the inverse transform sampling method we discussed earlier.

Note on Proposal Functions Computational efficiency is best if the proposal distribution looks a lot like the desired distribution (area between curves is small). These methods can fail badly when the proposal distribution has 0 density in a region where the desired distribution has non-negligeable density. For this last reason, it is said that the proposal distribution should have heavy tails.

Sequential Monte Carlo Methods Sequential Importance Sampling (SIS) and the closely related algorithm Sampling Importance Sampling (SIR) are known by various names in the literature: - bootstrap filtering - particle filtering - Condensation algorithm - survival of the fittest General idea: Importance sampling on time series data, with samples and weights updated as each new data term is observed. Well-suited for simulating Markov chains and HMMs!

Markov-Chain Monte Carlo

References

Problem Intuition: In high dimension problems, the “Typical Set” (volume of nonnegligable prob in state space) is a small fraction of the total space.

High Dimensional Spaces each pixel has two states: on and off

, but...

ignores neighborhood structure of pixel lattice and empirical evidence that images are “smooth”

Recall: Markov Chains a b c d Markov Chain: A sequence of random variables X1,X2,X3,... Each variable is a distribution over a set of states (a,b,c...) Transition probability of going to next state only depends on the current state. e.g. P(Xn+1 = a | Xn = b) a b c d transition probs can be arranged in an NxN table of elements kij = P(Xn+1=j | Xn = i) where the rows sum to one

K= transpose of transition prob table {k ij} (cols sum to one K= transpose of transition prob table {k ij} (cols sum to one. We do this for computational convenience (next slide)

Question: Assume you start in some state, and then run the simulation for a large number of time steps. What percentage of time do you spend at X1, X2 and X3?

[.33 .33 .33] four possible initial distributions initial distribution distribution after one time step all eventually end up with same distribution -- this is the stationary distribution!

in matlab: [E,D] = eigs(K)

The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all (known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are not linked to. The parameter q is taken to be about 0.15.

Another Question: Assume you want to spend a particular percentage of time at X1, X2 and X3. What should the transition probabilities be? P(x1) = .2 P(x2) = .3 P(x3) = .5 X1 K = [ ? ? ? ? ? ? ? ? ? ] X3 X2 we will discuss this next time