Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate Inference Methods

Similar presentations


Presentation on theme: "Approximate Inference Methods"— Presentation transcript:

1 Approximate Inference Methods
Loopy belief propagation Forward sampling Likelihood weighting Gibbs sampling Other MCMC sampling methods Variational methods Many slides by Daniel Rembiszewski and Avishay Livne. Based on the book (Chapter 12): “Probabilistic Graphical Models, Principles And Techniques” By Koller and Friedman.

2 Course Grade Example S H E G R S – student is smart {0, 1}.
R – student has reference {0, 1}. H – homework grade {0, 1}. E – exam grade {0, 1}. G – final course grade {0, 1}. 0.7 1 0.3 0.01 1 0.99 S H E G R S E=0 E=1 0.4 1 0.7 S,R H=0 H=1 0,0 0.2 0,1 0.5 1,0 0.6 1,1 0.9 H,E G=0 G=1 0,0 0.001 0,1 0.4 1,0 0.2 1,1 0.9

3 Forward sampling Query: P(G=1) S H E G R S R H E G Counter 1 2 3 …
For i=[1,M]: s = sample from P(S) r = sample from P(R) h = sample from P(H|S=s,R=r) e = sample from P(E|S=s) g = sample from P(G|H=h,E=e) If (condition holds - g=1) counter++ S R H E G Counter 1 2 3 [k,M] Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample M

4 Forward sampling P(x) M – number of samples

5 Forward sampling x[m] – sample m. x – the query. - 1 if the condition is satisfied (e.g. G=1), else 0. M – number of samples. Count the number of samples that satisfy the condition and divide by number of samples.

6 Conditional probability queries
Query: P(G=1 | E=1) S H E G R Solution 1: Calculate P(G=1,E=1), Calculate P(E=1). Solution 2: Calculate directly using reject sampling. Similar to Forward Sampling, but reject any sample that is not compatible with E=1. Problem (both solutions): When the probability of the event being estimated is low, the the needed sample size M becomes very large.

7 Analysis of error Just how many samples do we need (say to validate a fair coin)? Naturally it depends on the accuracy we wish to achieve. Absolute Error, Hoeffding bound: R – true value. E.g. R=0.5 for a fair coin. ε > 0: error of the interval. E.g. ε = 0.1 implies interval [0.4,0.6]. M – number of samples. S – estimated value. E.g. S=0.52 for a fair coin. Choose M so the estimator falls in the interval with high probability? ≤ δ = desired probability of error δ=10-3; ε=10-1  M=3.8*105

8 Analysis of error Relative Error Chernoff bound:
R – true value. M – number of samples. ε > 0: error as a fraction of true value. S – estimated value. Choose M so the estimator falls in the interval with high probability? ≤ δ = desired probability of error Note: For Relative error bound, unlike for Absolute error bound, the needed sample size M also depends on the true value R. Small R implies large M.

9 Likelihood Weighting Attempt 1:
Force evidence (E=1) by setting it instead of sampling. counter = 0 for i=[1,M]: s = sample from P(S) r = sample from P(R) h = sample from P(H|S=s,R=r) e = sample from P(E|S=s) g = sample from P(G|H=h,E=e) If (condition holds - g=1) counter++ e = 1

10 Likelihood Weighting P(x) M – number of samples Biased sample !

11 Likelihood Weighting Problem: By setting values to variables we sample only from a limited part of the sample space. As a result the probability is biased (הטייה). In our example: P(G=1|E=1)=0.71 instead of 0.73. Solution: weighting samples. In the example, E is set to 1. Hence the sample’s weight is set to w=0.4 or w=0.7 depending on the value of S. (using E’s CPT). S H E G R S E=0 E=1 0.4 1 0.7

12 Likelihood Weighting Our algorithm: counter = 0 for i=[1,M]: w = 1
S = sample from P(S) R = sample from P(R) H = sample from P(H|S,R) E = set to true. w *= P(E=1|S=s) G = sample from P(G|H,E) If (condition holds - g=1) counter += w Output = counter / M

13 Likelihood Weighting P(x) M – number of samples

14 Likelihood Weighting u = variables with assigned value.
Our algorithm: counter = 0 for i=[1,M]: w = 1 S = sample from P(S) R = sample from P(R) H = sample from P(H|S,R) E = set to true. w *= P(E=1|S=s) G = sample from P(G|H,E) If (condition holds - g=1) counter += w Output = counter / M General case algorithm: counter = 0 for i=[1,M]: w = 1 foreach x (in topological order): if (x in evidence) x = value in evidence w *= P(X=x|u) else x = sample from P(X|u) If (condition holds) counter += w Output = counter / M u = variables with assigned value. P(X|u) = probability of x given its parents in BN.

15 Gibb’s sampling P(G=1|E=1) S H E G R
Start with random assignment. Set the value in evidence to evidenced variables. e.g. (S=1, R=0,H=0,E=1,G=0) In each step produce new sample by sampling each variable from the current probability of the network. S = sample from P(S|R=0,H=0,E=1,G=0); e.g. 0 R = sample from P(R|S=0,H=0,E=1,G=0); e.g. 1 H = sample from P(H|S=0,R=1,E=1,G=0); e.g. 0 E = 1 G = sample from P(G|S=0,R=1,H=0,E=1); e.g. 0

16 Gibb’s sampling - generalization
Intuition: iteratively take the most likely sample. Initialization: Randomly assign value to each variable without evidence. Set the known value for each variable with evidence. count = 0 for i=[1,M] for j = [1,N(=number of vertices)]: xji+1 = sample from P(|x1i+1,x2i+1,…, xj-1i+1, xj+1i, …,xNi) if (condition true) count++ Xji=value of variable j at step i Output = count/M will converge to P(xj|evidence)

17 Gibb’s sampling S H E G R Calculating P(xj|x1i+1,x2i+1,…, xj-1i+1, xj+1i, …,xNi) Independencies in the model ease the computation. In our example P(R|Si+1,Hi,Ei,Gi)=P(R|Si+1,Hi)


Download ppt "Approximate Inference Methods"

Similar presentations


Ads by Google