QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Reasoning under Uncertainty: Marginalization, Conditional Prob., and Bayes Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 6, 2013.
Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Probabilistic Reasoning (2)
Advanced Artificial Intelligence
QUIZ!!  T/F: Rejection Sampling without weighting is not consistent. FALSE  T/F: Rejection Sampling (often) converges faster than Forward Sampling. FALSE.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
Bayesian network inference
10/28 Temporal Probabilistic Models. Temporal (Sequential) Process A temporal process is the evolution of system state over time Often the system state.
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
Announcements Homework 8 is out Final Contest (Optional)
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
1 Midterm Exam Mean: 72.7% Max: % Kernel Density Estimation.
Bayesian Networks Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
CS 188: Artificial Intelligence Fall 2008 Lecture 18: Decision Diagrams 10/30/2008 Dan Klein – UC Berkeley 1.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
QUIZ!!  T/F: Forward sampling is consistent. True  T/F: Rejection sampling is faster, but inconsistent. False  T/F: Rejection sampling requires less.
CPSC 322, Lecture 28Slide 1 More on Construction and Compactness: Compact Conditional Distributions Once we have established the topology of a Bnet, we.
UBC Department of Computer Science Undergraduate Events More
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Announcements Project 4: Ghostbusters Homework 7
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Monte Carlo Methods for Probabilistic Inference.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 5, 2012.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Bayesian Networks Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
CS 188: Artificial Intelligence Spring 2009 Lecture 20: Decision Networks 4/2/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CS b553: Algorithms for Optimization and Learning
Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Hidden Markov Models Lirong Xia.
Approximate Inference by Sampling
CS 188: Artificial Intelligence Fall 2007
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is a lot faster than enumeration, but not always exact. FALSE  T/F: The running time of VE is independent of the order you pick the r.v.s. FALSE  T/F: The more evidence, the faster VE runs. TRUE  P(X|Y) sums to... |Y|  P(x|Y) sums to... ??  P(X|y) sums to... 1  P(x|y) sums to... P(x|y)  P(x,Y) sums to.. P(x)  P(X,Y) sums to

CSE 511a: Artificial Intelligence Spring 2013 Lecture 17: Bayes’ Nets IV– Approximate Inference (Sampling) 04/01/2013 Robert Pless Via Kilian Weinberger via Dan Klein – UC Berkeley

Announcements  Project 4 out soon.  Project 3 due at midnight. 3

Exact Inference Variable Elimination 4

General Variable Elimination  Query:  Start with initial factors:  Local CPTs (but instantiated by evidence)  While there are still hidden variables (not Q or evidence):  Pick a hidden variable H  Join all factors mentioning H  Eliminate (sum out) H  Join all remaining factors and normalize 5

6

7 Example: Variable elimination Query: What is the probability that a student attends class, given that they pass the exam? [based on slides taken from UMBC CMSC 671, 2005] P(pr|at,st)atst 0.9TT 0.5TF 0.7FT 0.1FF attendstudy prepared fair pass P(at)=.8 P(st)=.6 P(fa)=.9 P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

8 Join study factors attendstudy prepared fair pass P(at)=.8 P(st)=.6 P(fa)=.9 OriginalJointMarginal prepstudyattendP(pr|at,st)P(st)P(pr,st|sm)P(pr|sm) TTT TFT TTF TFF FTT FFT FTF FFF P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

9 Marginalize out study attend prepared, study fair pass P(at)=.8 P(fa)=.9 OriginalJointMarginal prepstudyattendP(pr|at,st)P(st)P(pr,st|ac)P(pr|at) TTT TFT TTF TFF FTT FFT FTF FFF P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

10 Remove “study” attend prepared fair pass P(at)=.8 P(fa)=.9 P(pr|at)prat 0.74TT 0.46TF 0.26FT 0.54FF P(pa|at,pre,fa)pratfa 0.9TTT 0.1TTF 0.7TFT 0.1TFF 0.7FTT 0.1FTF 0.2FFT 0.1FFF

11 Join factors “fair” attend prepared fair pass P(at)=.8 P(fa)=.9 P(pr|at)prepattend 0.74TT 0.46TF 0.26FT 0.54FF OriginalJointMarginal papreattendfair P(pa|at,pre, fa)P(fair) P(pa,fa|sm, pre) P(pa|sm,pre ) tTTT tTTF tTFT tTFF tFTT tFTF tFFT tFFF

12 Marginalize out “fair” attend prepared pass, fair pass, fair P(at)=.8 P(pr|at)prepattend 0.74TT 0.46TF 0.26FT 0.54FF OriginalJointMarginal papreattendfairP(pa|at,pre,fa)P(fair)P(pa,fa|at,pre)P(pa|at,pre) TTTT TTTF TTFT TTFF TFTT TFTF TFFT TFFF

13 Marginalize out “fair” attend prepared pass P(at)=.8 P(pr|at)prepattend 0.74TT 0.46TF 0.26FT 0.54FF P(pa|at,pre)papreattend 0.82tTT 0.64tTF tFT 0.19tFF

14 Join factors “prepared” attend prepared pass P(at)=.8 OriginalJointMarginal papreattendP(pa|at,pr)P(pr|at)P(pa,pr|sm)P(pa|sm) tTT tTF tFT tFF

15 Join factors “prepared” attend pass, prepared P(at)=.8 OriginalJointMarginal papreattendP(pa|at,pr)P(pr|at)P(pa,pr|at)P(pa|at) tTT tTF tFT tFF

16 Join factors “prepared” attend pass P(at)=.8 P(pa|at)paattend tT 0.397tF

17 Join factors attend pass P(at)=.8 OriginalJointNormalized: paattendP(pa|at)P(at)P(pa,sm)P(at|pa) TT TF

18 Join factors attend, pass OriginalJointNormalized: paattendP(pa|at)P(at)P(pa,at)P(at|pa) TT TF

Approximate Inference Sampling (particle based method) 19

Approximate Inference 20

Sampling – the basics...  Scrooge McDuck gives you an ancient coin.  He wants to know what is P(H)  You have no homework, and nothing good is on television – so you toss it 1 Million times.  You obtain x Heads, and x Tails.  What is P(H)? 21

Sampling – the basics...  Exactly, P(H)=0.7  Why? 22

Monte Carlo Method 23 Who is more likely to win? Green or Purple? What is the probability that green wins, P(G)? Two ways to solve this: 1.Compute the exact probability. 2.Play 100,000 games and see how many times green wins.

Approximate Inference  Simulation has a name: sampling  Sampling is a hot topic in machine learning, and it’s really simple  Basic idea:  Draw N samples from a sampling distribution S  Compute an approximate posterior probability  Show this converges to the true probability P  Why sample?  Learning: get samples from a distribution you don’t know  Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) 24 S A F

Forward Sampling Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass 25 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, -s, +r, +w -c, +s, -r, +w … [Excel Demo]

Forward Sampling  This process generates samples with probability: …i.e. the BN’s joint probability  Let the number of samples of an event be  Then  I.e., the sampling procedure is consistent 26

Example  We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w  If we want to know P(W)  We have counts  Normalize to get P(W) =  This will get closer to the true distribution with more samples  Can estimate anything else, too  What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?  Fast: can use fewer samples if less time (what’s the drawback?) Cloudy Sprinkler Rain WetGrass C S R W 27

Rejection Sampling  Let’s say we want P(C)  No point keeping all samples around  Just tally counts of C as we go  Let’s say we want P(C| +s)  Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s  This is called rejection sampling  It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w Cloudy Sprinkler Rain WetGrass C S R W 28

Sampling Example  There are 2 cups.  The first contains 1 penny and 1 quarter  The second contains 2 quarters  Say I pick a cup uniformly at random, then pick a coin randomly from that cup. It's a quarter (yes!). What is the probability that the other coin in that cup is also a quarter?

Likelihood Weighting  Problem with rejection sampling:  If evidence is unlikely, you reject a lot of samples  You don’t exploit your evidence as you sample  Consider P(B|+a)  Idea: fix evidence variables and sample the rest  Problem: sample distribution not consistent!  Solution: weight by probability of evidence given parents BurglaryAlarm BurglaryAlarm 30 -b, -a +b, +a -b +a -b, +a +b, +a

Likelihood Weighting  Sampling distribution if z sampled and e fixed evidence  Now, samples have weights  Together, weighted sampling distribution is consistent Cloudy R C S W 31

Likelihood Weighting 32 +c0.5 -c0.5 +c +s0.1 -s0.9 -c +s0.5 -s0.5 +c +r0.8 -r0.2 -c +r0.2 -r0.8 +s +r +w0.99 -w0.01 -r +w0.90 -w0.10 -s +r +w0.90 -w0.10 -r +w0.01 -w0.99 Samples: +c, +s, +r, +w … Cloudy Sprinkler Rain WetGrass Cloudy Sprinkler Rain WetGrass

 Inference:  Sum over weights that match query value  Divide by total sample weight  What is P(C|+w,+r)? Likelihood Weighting Example 33 CloudyRainySprinklerWet GrassWeight

Likelihood Weighting  Likelihood weighting is good  We have taken evidence into account as we generate the sample  E.g. here, W’s value will get picked based on the evidence values of S, R  More of our samples will reflect the state of the world suggested by the evidence  Likelihood weighting doesn’t solve all our problems  Evidence influences the choice of downstream variables, but not upstream ones (C isn’t more likely to get a value matching the evidence)  We would like to consider evidence when we sample every variable 34 Cloudy Rain C S R W

Markov Chain Monte Carlo*  Idea: instead of sampling from scratch, create samples that are each like the last one.  Procedure: resample one variable at a time, conditioned on all the rest, but keep evidence fixed. E.g., for P(b|c):  Properties: Now samples are not independent (in fact they’re nearly identical), but sample averages are still consistent estimators!  What’s the point: both upstream and downstream variables condition on evidence. 35 +a+c+b +a+c-b-b-a-a -b-b

Random Walks  [Explain on Blackboard] 36

Gibbs Sampling 1.Set all evidence E to e 2.Do forward sampling t obtain x 1,...,x n 3.Repeat: 1.Pick any variable X i uniformly at random. 2.Resample x i ’ from p(X i | x 1,..., x i-1, x i+1,..., x n ) 3.Set all other x j ’=x j 4.The new sample is x 1’,..., x n ’ 37

Markov Blanket 38 X Markov blanket of X: 1.All parents of X 2.All children of X 3.All parents of children of X (except X itself) X is conditionally independent from all other variables in the BN, given all variables in the markov blanket (besides X).

Gibbs Sampling 1.Set all evidence E to e 2.Do forward sampling t obtain x 1,...,x n 3.Repeat: 1.Pick any variable X i uniformly at random. 2.Resample x i ’ from p(X i | markovblanket(X i )) 3.Set all other x j ’=x j 4.The new sample is x 1’,..., x n ’ 39

Summary  Sampling can be your salvation  The dominating approach to inference in BNs  Approaches:  Forward (/Prior) Sampling  Rejection Sampling  Likelihood Weighted Sampling  Gibbs Sampling 40

Learning in Bayes Nets  Task 1: Given the network structure and given data, where a data point is an observed setting for the variables, learn the CPTs for the Bayes Net. Might also start with priors for CPT probabilities.  Task 2: Given only the data (and possibly a prior over Bayes Nets), learn the entire Bayes Net (both Net structure and CPTs).

Turing Award for Bayes Nets 42

Recap: Inference Example  Find P(W|F=bad)  Restrict all factors  No hidden vars to eliminate (this time!)  Just join and normalize Weather Forecast WP(W) sun0.7 rain0.3 FP(F|rain) good0.1 bad0.9 FP(F|sun) good0.8 bad0.2 WP(W) sun0.7 rain0.3 WP(F=bad|W) sun0.2 rain0.9 WP(W,F=bad) sun0.14 rain0.27 WP(W | F=bad) sun0.34 rain