Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 188: Artificial Intelligence Fall 2008

Similar presentations


Presentation on theme: "CS 188: Artificial Intelligence Fall 2008"— Presentation transcript:

1 CS 188: Artificial Intelligence Fall 2008
Lecture 17: Bayes Nets IV 10/28/2008 Dan Klein – UC Berkeley

2 Inference Inference: calculating some statistic from a joint probability distribution Examples: Posterior probability: Most likely explanation: L R B D T T’

3 Mini-Examples: VE Mini-Alarm Network B: there is a burglary
A: the alarm sounds C: neighbor calls B B P b 0.1 b 0.9 A C B A P b a 0.8 a 0.2 b 0.1 0.9 A C P a c 0.7 c 0.3 a 0.5

4 Basic Operation: Join First basic operation: join factors
Combining two factors: Just like a database join Build a factor over the union of the variables involved Example: Computation for each entry: pointwise products

5 Basic Operation: Eliminate
Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: Definition:

6 Example: P(A) Start Join on B Sum out B B A, B A A B P b 0.1 b 0.9 A
0.08 b 0.09 a 0.02 0.81 A P a 0.17 a 0.83 B A P b a 0.8 a 0.2 b 0.1 0.9

7 Example: Multiple Joins
B P b 0.1 b 0.9 Join on B B A P b a 0.8 a 0.2 b 0.1 0.9 B B, A B A P b a 0.08 a 0.02 b 0.09 0.81 A C c A C P a c 0.7 c 0.3 a 0.5 A C P a c 0.7 c 0.3 a 0.5

8 Example: Multiple Joins
B, A Example: Multiple Joins C Join on A B, A, C B A P b a 0.08 a 0.02 b 0.09 0.81 B A C P b a c 0.056 c 0.024 a 0.010 b 0.063 0.027 0.405 A C P a c 0.7 c 0.3 a 0.5

9 Query: P(C) Sum out B Sum out A A, C B, A, C C A C P a c 0.119 c
0.051 a 0.415 B A C P b a c 0.056 c 0.024 a 0.010 b 0.063 0.027 0.405 Sum out A C C P c 0.534 c 0.466

10 Variable Elimination Why is inference by enumeration so slow?
You join up the whole joint distribution before you sum out the hidden variables You do lots of computations irrelevant to the evidence You end up repeating a lot of work Idea: interleave joining and marginalizing! Marginalize variables as soon as possible Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration

11 P(C) : Marginalizing Early
B, A P(C) : Marginalizing Early C Sum out B A B A P b a 0.08 a 0.02 b 0.09 0.81 A P a 0.17 a 0.83 C A C P a c 0.7 c 0.3 a 0.5 A C P a c 0.7 c 0.3 a 0.5

12 Marginalizing Early Join on A Sum out A A A, C C C A C P a c 0.119 c
0.051 a 0.415 A P a 0.17 a 0.83 C Sum out A A C P a c 0.7 c 0.3 a 0.5 C C P c 0.534 c 0.466

13 General Variable Elimination
Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Sum out H Join all remaining factors and normalize

14 Example: P(B|a) Start / Select Join on B Normalize B a, B a B P b 0.1
0.9 a A B P a b 0.08 b 0.09 A B P a b 8/17 b 9/17 B A P b a 0.8 a 0.2 b 0.1 0.9

15 Example: P(B|c) Join on A B B A, c A c B P b 0.1 b 0.9 B P b 0.1
0.8 a 0.2 b 0.1 0.9 B A, c A B A C P b a c 0.24 a 0.10 b 0.03 0.45 c A C P a c 0.7 c 0.3 a 0.5

16 Example: P(B|c) Sum out A B B A, c c B P b 0.1 b 0.9 B P b 0.1 b
0.24 a 0.10 b 0.03 0.45 B C P b c 0.34 b 0.48

17 Example: P(B|c) Join on B Normalize B, c B c B P b 0.1 b 0.9 B C P
0.034 b 0.423 c Normalize B C P b c 0.34 b 0.48 B C P b c 0.073 b 0.927

18 Variable Elimination What you need to know:
Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: VE caches intermediate computations Saves time by marginalizing variables as soon as possible rather than at the end Polynomial time for tree-structured graphs – sound familiar? We will see special cases of VE later You’ll have to implement the special cases Approximations Exact inference is slow, especially with a lot of hidden nodes Approximate methods give you a (close, wrong?) answer, faster

19 Sampling Basic idea: Outline:
Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples

20 Prior Sampling Cloudy Cloudy Sprinkler Sprinkler Rain Rain WetGrass

21 Prior Sampling This process generates samples with probability
…i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent

22 Example We’ll get a bunch of samples from the BN:
c, s, r, w c, s, r, w c, s, r, w c, s, r, w If we want to know P(W) We have counts <w:4, w:1> Normalize to get P(W) = <w:0.8, w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about P(C| r)? P(C| r, w)? Cloudy Sprinkler Rain WetGrass C S R W

23 Rejection Sampling Let’s say we want P(C) Let’s say we want P(C| s)
No point keeping all samples around Just tally counts of C outcomes Let’s say we want P(C| s) Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=s This is rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) Cloudy Sprinkler Rain WetGrass C S R W c, s, r, w c, s, r, w c, s, r, w c, s, r, w

24 Likelihood Weighting Problem with rejection sampling:
If evidence is unlikely, you reject a lot of samples You don’t exploit your evidence as you sample Consider P(B|a) Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents Burglary Alarm Burglary Alarm

25 Likelihood Sampling Cloudy Cloudy Sprinkler Sprinkler Rain Rain
WetGrass WetGrass

26 Likelihood Weighting Sampling distribution if z sampled and e fixed evidence Now, samples have weights Together, weighted sampling distribution is consistent Cloudy Rain C S R W

27 Likelihood Weighting Note that likelihood weighting doesn’t solve all our problems Rare evidence is taken into account for downstream variables, but not upstream ones A better solution is Markov-chain Monte Carlo (MCMC), more advanced We’ll return to sampling for robot localization and tracking in dynamic BNs Cloudy Rain C S R W


Download ppt "CS 188: Artificial Intelligence Fall 2008"

Similar presentations


Ads by Google