For inference in Bayesian Networks Presented by Daniel Rembiszewski and Avishay Livne Based on Probabilistic Graphical Models: Principles And Techniques.

for inference in Bayesian Networks Presented by Daniel Rembiszewski and Avishay Livne Based on Probabilistic Graphical Models: Principles And Techniques By Daphne Koller and Nir Friedman

Introduction Forward sampling Conditional probability queries Likelihood weighting Gibb’s sampling Analysis of error [if time permits]

The task: Inference in a complex Bayesian Network. The problem: Exact inference is NP-hard. The solution: Approximate Inference. Our focus: Stochastic Sampling.

S – student is smart {0, 1}. R – student has reference {0, 1}. H – homework grade {0, 1}. E – exam grade {0, 1}. G – final course grade {0, 1}. 00.01 10.99 00.7 10.3 S HE G R S,RH=0H=1 0,00.80.2 0,10.5 1,00.40.6 1,10.10.9 SE=0E=1 00.60.4 10.30.7 H,EG=0G=1 0,00.9990.001 0,10.60.4 1,00.80.2 1,10.10.9

counter = 0 For i=[1,M]: s = sample from P(S) r = sample from P(R) h = sample from P(H|S=s,R=r) e = sample from P(E|S=s) g = sample from P(G|H=h,E=e) If (condition holds - g=1) counter++ Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 Sample M SRHEGCounter 011000 101111 010012 010102 111002 011102 000013 … [k,M] Query: P(G=1) S HE G R

P(x) M – number of samples

x[m] – sample m. x – the query. - 1 if the condition is satisfied (e.g. G=1), else 0. M – number of samples. Count the number of samples that satisfied the condition and divide by number of samples.

Query: P(G=1 | E=1) Solution 1: Calculate P(G=1,E=1), Calculate P(E=1). Solution 2: Calculate directly using reject sampling. Similar to Forward Sampling, but reject any sample that is not compatible with E=1. Problem (both solutions): When the number of counted (unrejected) samples is small, M becomes big*. * How big? If time permits – “Analysis of error” S HE G R

Attempt 1: Force evidence (E=1) by setting it instead of sampling. counter = 0 for i=[1,M]: s = sample from P(S) r = sample from P(R) h = sample from P(H|S=s,R=r) e = sample from P(E|S=s) g = sample from P(G|H=h,E=e) If (condition holds - g=1) counter++ e = 1 Note we introduced a bias, because we ignore many samples (e=0).

Indeed biased… P(x) M – number of samples

Problem: By setting values to variables we sample only from a limited part of samples space. Solution: weighting samples. In our example we set E=1 and therefore set the sample’s weight w to be w*0.4 or w*0.7 (taken from E’s CPT). S HE G R SE=0E=1 00.60.4 10.30.7 As a result the probability is biased. In our example: P(G=1|E=1)=0.71 instead of 0.73.

Our algorithm: counter = 0 for i=[1,M]: w = 1 S = sample from P(S) R = sample from P(R) H = sample from P(H|S,R) E = set to true. w *= P(E=1|S=s) G = sample from P(G|H,E) If (condition holds - g=1) counter += w Output = counter / M General case algorithm: counter = 0 for i=[1,M]: w = 1 foreach x (in topological order): if (x in evidence) x = value in evidence w *= P(X=x|u) else x = sample from P(X|u) If (condition holds) counter += w Output = counter / M u = variables with assigned value. P(X|u) = probability of x given its parents in BN.

P(x) M – number of samples

S HE G R Start with random assignment. Set the value in evidence to evidenced variables. P(G=1|E=1) e.g. (S=1, R=0,H=0,E=1,G=0) In each step produce new sample by sampling each variable from the current probability of the network. S = sample from P(S|R=0,H=0,E=1,G=0); e.g. 0 R = sample from P(R|S=0,H=0,E=1,G=0); e.g. 1 H = sample from P(H|S=0,R=1,E=1,G=0); e.g. 0 E = 1 G = sample from P(G|S=0,R=1,H=0,E=1); e.g. 0

Intuition: instead of randomly sampling from the sample space, iteratively take the most likely sample. for i=[1,M] for j = [1,N(=number of vertices)]: x j i+1 = sample from P(x j |x 1 i+1,x 2 i+1,…, x j-1 i+1, x j+1 i, …,x N i ) if (condition true) count++ X j i =value of variable j at step i Initialization: Randomly assign value to each variable without evidence. Set the known value for each variable with evidence. count = 0 Output = count/M will converge to P(x|evidence)

S HE G R Calculating P(x j |x 1 i+1,x 2 i+1,…, x j-1 i+1, x j+1 i, …,x N i ) Independences in the model can make the computation easy. In our example P(R|S i+1,H i,E i,G i )=P(R|S i+1,H i )

Absolute Error Hoeffding bound: S – approximated value. R – real value. M – number of samples. ε > 0 – error interval. Probability of error ≤ δ δ=10 -3 ; ε=10 -4  M=3.8*10 8 Just how many samples do we need? Naturally it depends on the accuracy we wish to achieve.

Relative Error Chernoff bound: S – approximated value. R – real value. M – number of samples. ε > 0 – error interval. Note that unlike in the absolute error, here M also dependends on the real value itself.

For inference in Bayesian Networks Presented by Daniel Rembiszewski and Avishay Livne Based on Probabilistic Graphical Models: Principles And Techniques.

Similar presentations

Presentation on theme: "For inference in Bayesian Networks Presented by Daniel Rembiszewski and Avishay Livne Based on Probabilistic Graphical Models: Principles And Techniques."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

For inference in Bayesian Networks Presented by Daniel Rembiszewski and Avishay Livne Based on Probabilistic Graphical Models: Principles And Techniques.

Similar presentations

Presentation on theme: "For inference in Bayesian Networks Presented by Daniel Rembiszewski and Avishay Livne Based on Probabilistic Graphical Models: Principles And Techniques."— Presentation transcript:

Similar presentations

About project

Feedback