CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011

Last Time 1.Random Sampling 1.Logic sampling 2.Rejection sampling 3.Likelihood weighting 4.Importance sampling 2.Markov Chain Monte Carlo (MCMC) 1.Gibbs sampling 2.Metropolis-Hastings

Example Estimate the probability of head for an unbiased coin –Generate samples from P(coin)=(0.5,0.5) –Like tossing a coin –Finally :

Sampling Algorithms Idea: search for high probability instances Suppose are instances with high mass We can approximate (Bayes rule): If is a complete instantiation, then is – 0 or 1

Search Algorithms (cont) Instances that do not satisfy e, do not play a role in approximation We need to focus the search to find instances that do satisfy e Clearly, in some cases this is hard (NP- hardness result)

Stochastic Sampling Intuition: given a sufficient number of samples x[1],…,x[N], we can estimate Law of large number implies that as N grows, our estimate will converge to p The number of samples that we need is potentially exponential in dimension of P.

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eab 0.4 Earthquake Radio Burglary Alarm Call

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb Earthquake Radio Burglary Alarm Call 0.8

Samples: B E A C R Logic sampling P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eacb r 0.3 Earthquake Radio Burglary Alarm Call

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a 0.8 0.05 P(r) e e 0.30.001 b Earthquake Radio Burglary Alarm Call 0.03 Weight = r a = a

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb Earthquake Radio Burglary Alarm Call 0.001 Weight = r = a

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 eb 0.4 Earthquake Radio Burglary Alarm Call Weight = r = a 0.6 a

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 ecb Earthquake Radio Burglary Alarm Call 0.05 Weight = r = a a 0.6

Samples: B E A C R Likelihood Weighting P(b) 0.03 P(e) 0.001 P(a) b e 0.98 0.4 0.7 0.01 P(c) a a 0.8 0.05 P(r) e e 0.30.001 ecb r 0.3 Earthquake Radio Burglary Alarm Call Weight = r = a a 0.6 *0.3

Importance Sampling A method for evaluating expectation of f under P(x), P(X) Discrete: Continuous: If we could sample from P

Importance Sampling A general method for evaluating P(X) when we cannot sample from P(X). Idea: Choose an approximating distribution Q(X) and sample from it Using this we can now sample from Q and then W(X) If we could generate samples from P(X) Now that we generate the samples from Q(X)

(Unnormalized) Importance Sampling 1. For m=1:M Sample X[m] from Q(X) Calculate W(m) = P(X)/Q(X) 2. Estimate the expectation of f(X) using Requirements: P(X)>0  Q(X)>0 (don’t ignore possible scenarios) Possible to calculate P(X),Q(X) for a specific X=x It is possible to sample from Q(X)

Normalized Importance Sampling Assume that we cannot evaluate P(X=x) but can evaluate P’(X=x) =  P(X=x) (for example we can evaluate P(X) but not P(X|e) in a Bayesian network) We define w’(X) = P’(X)/Q(X). We can then evaluate  : and then: In the last step we simply replace  with the above equation

Normalized Importance Sampling We can now estimate the expectation of f(X) similarly to unnormalized importance sampling by sampling x[m] from Q(X) and then (hence the name “normalized”)

Importance Sampling Weaknesses Important to choose sampling distribution with heavy tails –Not to “miss” large values of f Many-dimensional I-S: –“Typical set” of P may take a long time to find, unless Q good approximation to P –Weights vary by factors exponential in N Similar for Likelihood Weighting

Today’s Agenda 1.The Normal Distribution: pdf, cdf, estimation, sampling 2.Markov Chain Monte Carlo (MCMC) 1.Gibbs sampling 2.Metropolis-Hastings

Sampling from a Normal Distribution Gaussian Distribution Samples

Normal Distribution

Stochastic Sampling Previously: independent samples to estimate P(X = x |e ) Problem: difficult to sample from P(X 1, …. X n |e ) We had to use likelihood weighting –Introduces bias in estimation In some case, such as when the evidence is on leaves, these methods are inefficient –Very low weights if e has low prior probability –Very few samples have high-mass (high weight)

MCMC Methods Sampling methods that are based on Markov Chain –Markov Chain Monte Carlo (MCMC) methods Key ideas: –Sampling process as a Markov Chain Next sample depends on the previous one –Approximate any posterior distribution Next: review theory of Markov chains

MCMC Methods Notes: The Markov chain variable Y takes as value assignments to all variables that are consistent with evidence For simplicity, we will denote such a state using the vector of variables

Gibbs Sampler One of the simplest MCMC method Each transition changes the state of one X i The transition probability defined by P itself as a stochastic procedure: –Input: a state x 1,…,x n –Choose i at random (uniform probability) –Sample x’ i from P(X i |x 1, …, x i-1, x i+1,…, x n, e) –let x’ j = x j for all j  i –return x’ 1,…,x’ n

Sampling Strategy How do we collect the samples? Strategy I: Run the chain M times, each for N steps –each run starts from a different state points Return the last state in each run M chains

Sampling Strategy Strategy II: Run one chain for a long time After some “burn in” period, sample points every some fixed number of steps “burn in” M samples from one chain

Comparing Strategies Strategy I: –Better chance of “covering” the space of points especially if the chain is slow to reach stationarity –Have to perform “burn in” steps for each chain Strategy II: –Perform “burn in” only once –Samples might be correlated (although only weakly) Hybrid strategy: –Run several chains, sample few times each –Combines benefits of both strategies

Summary: Sampling Inference Monte Carlo (sampling with positive and negative error) Methods: –Pos: Simplicity of implementation and theoretical guarantee of convergence –Neg: Can be slow to converge and hard to diagnose their convergence.

THE END

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Similar presentations

Presentation on theme: "CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Similar presentations

Presentation on theme: "CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011."— Presentation transcript:

Similar presentations

About project

Feedback