QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.

QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is a lot faster than enumeration, but not always exact. FALSE  T/F: The running time of VE is independent of the order you pick the r.v.s. FALSE  T/F: The more evidence, the faster VE runs. TRUE  P(X|Y) sums to... |Y|  P(x|Y) sums to... ??  P(X|y) sums to... 1  P(x|y) sums to... P(x|y)  P(x,Y) sums to.. P(x)  P(X,Y) sums to... 1 1

CSE 511a: Artificial Intelligence Spring 2013 Lecture 17: Bayes’ Nets IV– Approximate Inference (Sampling) 04/01/2013 Robert Pless Via Kilian Weinberger via Dan Klein – UC Berkeley

Announcements  Project 4 out soon.  Project 3 due at midnight. 3

Exact Inference Variable Elimination 4

General Variable Elimination  Query:  Start with initial factors:  Local CPTs (but instantiated by evidence)  While there are still hidden variables (not Q or evidence):  Pick a hidden variable H  Join all factors mentioning H  Eliminate (sum out) H  Join all remaining factors and normalize 5

7 Example: Variable elimination Query: What is the probability that a student attends class, given that they pass the exam? [based on slides taken from UMBC CMSC 671, 2005] P(pr|at,st)atst 0.9TT 0.5TF 0.7FT 0.1FF attendstudy prepared fair pass P(at)=.8 P(st)=.6 P(fa)=.9 P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

8 Join study factors attendstudy prepared fair pass P(at)=.8 P(st)=.6 P(fa)=.9 OriginalJointMarginal prepstudyattendP(pr|at,st)P(st)P(pr,st|sm)P(pr|sm) TTT0.90.60.540.74 TFT0.50.40.2 TTF0.70.60.420.46 TFF0.10.40.04 FTT0.10.60.060.26 FFT0.50.40.2 FTF0.30.60.180.54 FFF0.90.40.36 P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

9 Marginalize out study attend prepared, study fair pass P(at)=.8 P(fa)=.9 OriginalJointMarginal prepstudyattendP(pr|at,st)P(st)P(pr,st|ac)P(pr|at) TTT0.90.60.540.74 TFT0.50.40.2 TTF0.70.60.420.46 TFF0.10.40.04 FTT0.10.60.060.26 FFT0.50.40.2 FTF0.30.60.180.54 FFF0.90.40.36 P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

10 Remove “study” attend prepared fair pass P(at)=.8 P(fa)=.9 P(pr|at)prat 0.74TT 0.46TF 0.26FT 0.54FF P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

11 Join factors “fair” attend prepared fair pass P(at)=.8 P(fa)=.9 P(pr|at)prepattend 0.74TT 0.46TF 0.26FT 0.54FF OriginalJointMarginal papreattendfair P(pa|at,pre, fa)P(fair) P(pa,fa|sm, pre) P(pa|sm,pre ) tTTT0.9 0.810.82 tTTF0.1 0.01 tTFT0.70.90.630.64 tTFF0.1 0.01 tFTT0.70.90.630.64 tFTF0.1 0.01 tFFT0.20.90.180.19 tFFF0.1 0.01

12 Marginalize out “fair” attend prepared pass, fair pass, fair P(at)=.8 P(pr|at)prepattend 0.74TT 0.46TF 0.26FT 0.54FF OriginalJointMarginal papreattendfairP(pa|at,pre,fa)P(fair)P(pa,fa|at,pre)P(pa|at,pre) TTTT0.9 0.810.82 TTTF0.1 0.01 TTFT0.70.90.630.64 TTFF0.1 0.01 TFTT0.70.90.630.64 TFTF0.1 0.01 TFFT0.20.90.180.19 TFFF0.1 0.01

13 Marginalize out “fair” attend prepared pass P(at)=.8 P(pr|at)prepattend 0.74TT 0.46TF 0.26FT 0.54FF P(pa|at,pre)papreattend 0.82tTT 0.64tTF tFT 0.19tFF

14 Join factors “prepared” attend prepared pass P(at)=.8 OriginalJointMarginal papreattendP(pa|at,pr)P(pr|at)P(pa,pr|sm)P(pa|sm) tTT0.820.740.60680.7732 tTF0.640.460.29440.397 tFT0.640.260.1664 tFF0.190.540.1026

15 Join factors “prepared” attend pass, prepared P(at)=.8 OriginalJointMarginal papreattendP(pa|at,pr)P(pr|at)P(pa,pr|at)P(pa|at) tTT0.820.740.60680.7732 tTF0.640.460.29440.397 tFT0.640.260.1664 tFF0.190.540.1026

16 Join factors “prepared” attend pass P(at)=.8 P(pa|at)paattend 0.7732tT 0.397tF

17 Join factors attend pass P(at)=.8 OriginalJointNormalized: paattendP(pa|at)P(at)P(pa,sm)P(at|pa) TT0.77320.80.618560.89 TF0.3970.20.07940.11

18 Join factors attend, pass OriginalJointNormalized: paattendP(pa|at)P(at)P(pa,at)P(at|pa) TT0.77320.80.618560.89 TF0.3970.20.07940.11

Approximate Inference Sampling (particle based method) 19

Approximate Inference 20

Sampling – the basics...  Scrooge McDuck gives you an ancient coin.  He wants to know what is P(H)  You have no homework, and nothing good is on television – so you toss it 1 Million times.  You obtain 700000x Heads, and 300000x Tails.  What is P(H)? 21

Sampling – the basics...  Exactly, P(H)=0.7  Why? 22

Monte Carlo Method 23 Who is more likely to win? Green or Purple? What is the probability that green wins, P(G)? Two ways to solve this: 1.Compute the exact probability. 2.Play 100,000 games and see how many times green wins.

Approximate Inference  Simulation has a name: sampling  Sampling is a hot topic in machine learning, and it’s really simple  Basic idea:  Draw N samples from a sampling distribution S  Compute an approximate posterior probability  Show this converges to the true probability P  Why sample?  Learning: get samples from a distribution you don’t know  Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) 24 S A F

Forward Sampling Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass 25 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, -s, +r, +w -c, +s, -r, +w … [Excel Demo]

Forward Sampling  This process generates samples with probability: …i.e. the BN’s joint probability  Let the number of samples of an event be  Then  I.e., the sampling procedure is consistent 26

Example  We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w  If we want to know P(W)  We have counts  Normalize to get P(W) =  This will get closer to the true distribution with more samples  Can estimate anything else, too  What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?  Fast: can use fewer samples if less time (what’s the drawback?) Cloudy Sprinkler Rain WetGrass C S R W 27

Rejection Sampling  Let’s say we want P(C)  No point keeping all samples around  Just tally counts of C as we go  Let’s say we want P(C| +s)  Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s  This is called rejection sampling  It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w Cloudy Sprinkler Rain WetGrass C S R W 28

Sampling Example  There are 2 cups.  The first contains 1 penny and 1 quarter  The second contains 2 quarters  Say I pick a cup uniformly at random, then pick a coin randomly from that cup. It's a quarter (yes!). What is the probability that the other coin in that cup is also a quarter?

Likelihood Weighting  Problem with rejection sampling:  If evidence is unlikely, you reject a lot of samples  You don’t exploit your evidence as you sample  Consider P(B|+a)  Idea: fix evidence variables and sample the rest  Problem: sample distribution not consistent!  Solution: weight by probability of evidence given parents BurglaryAlarm BurglaryAlarm 30 -b, -a +b, +a -b +a -b, +a +b, +a

Likelihood Weighting  Sampling distribution if z sampled and e fixed evidence  Now, samples have weights  Together, weighted sampling distribution is consistent Cloudy R C S W 31

Likelihood Weighting 32 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, +s, +r, +w … Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass

 Inference:  Sum over weights that match query value  Divide by total sample weight  What is P(C|+w,+r)? Likelihood Weighting Example 33 CloudyRainySprinklerWet GrassWeight 01110.495 00110.45 0011 0011 10110.09

Likelihood Weighting  Likelihood weighting is good  We have taken evidence into account as we generate the sample  E.g. here, W’s value will get picked based on the evidence values of S, R  More of our samples will reflect the state of the world suggested by the evidence  Likelihood weighting doesn’t solve all our problems  Evidence influences the choice of downstream variables, but not upstream ones (C isn’t more likely to get a value matching the evidence)  We would like to consider evidence when we sample every variable 34 Cloudy Rain C S R W

Markov Chain Monte Carlo*  Idea: instead of sampling from scratch, create samples that are each like the last one.  Procedure: resample one variable at a time, conditioned on all the rest, but keep evidence fixed. E.g., for P(b|c):  Properties: Now samples are not independent (in fact they’re nearly identical), but sample averages are still consistent estimators!  What’s the point: both upstream and downstream variables condition on evidence. 35 +a+c+b +a+c-b-b-a-a -b-b

Random Walks  [Explain on Blackboard] 36

Gibbs Sampling 1.Set all evidence E to e 2.Do forward sampling t obtain x 1,...,x n 3.Repeat: 1.Pick any variable X i uniformly at random. 2.Resample x i ’ from p(X i | x 1,..., x i-1, x i+1,..., x n ) 3.Set all other x j ’=x j 4.The new sample is x 1’,..., x n ’ 37

Markov Blanket 38 X Markov blanket of X: 1.All parents of X 2.All children of X 3.All parents of children of X (except X itself) X is conditionally independent from all other variables in the BN, given all variables in the markov blanket (besides X).

Gibbs Sampling 1.Set all evidence E to e 2.Do forward sampling t obtain x 1,...,x n 3.Repeat: 1.Pick any variable X i uniformly at random. 2.Resample x i ’ from p(X i | markovblanket(X i )) 3.Set all other x j ’=x j 4.The new sample is x 1’,..., x n ’ 39

Summary  Sampling can be your salvation  The dominating approach to inference in BNs  Approaches:  Forward (/Prior) Sampling  Rejection Sampling  Likelihood Weighted Sampling  Gibbs Sampling 40

Learning in Bayes Nets  Task 1: Given the network structure and given data, where a data point is an observed setting for the variables, learn the CPTs for the Bayes Net. Might also start with priors for CPT probabilities.  Task 2: Given only the data (and possibly a prior over Bayes Nets), learn the entire Bayes Net (both Net structure and CPTs).

Turing Award for Bayes Nets 42

Recap: Inference Example  Find P(W|F=bad)  Restrict all factors  No hidden vars to eliminate (this time!)  Just join and normalize Weather Forecast WP(W) sun0.7 rain0.3 FP(F|rain) good0.1 bad0.9 FP(F|sun) good0.8 bad0.2 WP(W) sun0.7 rain0.3 WP(F=bad|W) sun0.2 rain0.9 WP(W,F=bad) sun0.14 rain0.27 WP(W | F=bad) sun0.34 rain0.66 43

QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.

Similar presentations

Presentation on theme: "QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.

Similar presentations

Presentation on theme: "QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is."— Presentation transcript:

Similar presentations

About project

Feedback