CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley
Announcements Midterm THURSDAY EVENING 6-9pm in 155 Dwinelle One page (2 sides) of notes & basic calculator ok Review session tonight, 6pm, 145 Dwinelle No section this week GSI OHs on midterm prep page Assignments: P2 in glookup (let us know if there are problems) W3, P4 not until after midterm 2
Project 2 Mini-Contest! Honorable mentions: (8 wins, average 2171), Arpad Kovacs, Alan Young (7 wins, average 2053), Eui Shin (8 wins, average 2002), Katie Bedrosian, Edward Lin 2nd runner up: 5 wins, average 2445 Eric Chang, Peter Currier Depth 3 alpha-beta with complex eval function that tried hard to eat ghosts. On games where they won, their score was very good. 1st runner up: 8 wins, average score 2471 Nils Reimers Expectimax with adjustable depth, extremely conservative eval function (losing had - inf utility and winning had +inf), so the agent was pretty good at consistently winning. Winner: 7 wins, average score 3044 James Ide, Xingyan Liu Proper treatment of ghosts’ chance nodes. Elaborate evaluation function to detect if Pacman was trapped by ghosts and whether to get a power pellet or run. 3 [DEMO]
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: 4 BE A JM
Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them Example: 5 BE A JM
Example: Enumeration In this simple method, we only need the BN to synthesize the joint entries 6
Inference by Enumeration? 7
Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration We’ll need some new notation to define VE 8
Factor Zoo I Joint distribution: P(X,Y) Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) 9 TWP hotsun0.4 hotrain0.1 coldsun0.2 coldrain0.3 TWP coldsun0.2 coldrain0.3
Factor Zoo II Family of conditionals: P(X |Y) Multiple conditionals Entries P(x | y) for all x, y Sums to |Y| Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y Sums to 1 10 TWP hotsun0.8 hotrain0.2 coldsun0.4 coldrain0.6 TWP coldsun0.4 coldrain0.6
Factor Zoo III Specified family: P(y | X) Entries P(y | x) for fixed y, but for all x Sums to … who knows! In general, when we write P(Y 1 … Y N | X 1 … X M ) It is a “factor,” a multi-dimensional array Its values are all P(y 1 … y N | x 1 … x M ) Any assigned X or Y is a dimension missing (selected) from the array 11 TWP hotrain0.2 coldrain0.6
Example: Traffic Domain Random Variables R: Raining T: Traffic L: Late for class! First query: P(L) 12 T L R +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9
Track objects called factors Initial factors are local CPTs (one per node) Any known values are selected E.g. if we know, the initial factors are VE: Alternately join factors and eliminate variables 13 Variable Elimination Outline +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +t+l0.3 -t+l0.1 +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9
First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R Computation for each entry: pointwise products 14 Operation 1: Join Factors +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 T R R,T
In general, we join on a variable Take all factors mentioning that variable Join them all together with pointwise products Result is P(all LHS vars | all non-LHS vars) Leave other factors alone Example: Join on T Operation 1: Join Factors +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r+t+l0.24 +r+t-l0.56 +r-t+l0.02 +r-t-l0.18 -r+t+l0.03 -r+t-l0.07 -r-t+l0.09 -r-t-l0.81 T R L T, L R
Example: Multiple Joins 16 T R Join R L R, T L +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9
Example: Multiple Joins 17 Join T R, T, L R, T L +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r+t+l r+t-l r-t+l r-t-l r+t+l r+t-l r-t+l r-t-l 0.729
Operation 2: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: 18 +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t0.17 -t0.83
Multiple Elimination 19 Sum out R Sum out T T, LL R, T, L +r+t+l r+t-l r-t+l r-t-l r+t+l r+t-l r-t+l r-t-l t+l t-l t+l t-l l l0.886
P(L) : Marginalizing Early! 20 Sum out R T L +r+t0.08 +r-t0.02 -r+t0.09 -r-t0.81 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +t0.17 -t0.83 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 T R L +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 Join R R, T L
Marginalizing Early (aka VE*) Join TSum out T T, LL * VE is variable elimination T L +t0.17 -t0.83 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +t+l t-l t+l t-l l l0.886
If evidence, start with factors that select that evidence No evidence uses these initial factors: Computing, the initial factors become: We eliminate all vars other than query + evidence 22 Evidence +r0.1 -r0.9 +r+t0.8 +r-t0.2 -r+t0.1 -r-t0.9 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9 +r0.1 +r+t0.8 +r-t0.2 +t+l0.3 +t-l0.7 -t+l0.1 -t-l0.9
Result will be a selected joint of query and evidence E.g. for P(L | +r), we’d end up with: To get our answer, just normalize this! That’s it! 23 Evidence II +l0.26 -l0.74 +r+l r-l0.074 Normalize
General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize 24
Variable Elimination Bayes Rule 25 ABP +a+b0.08 +a bb 0.09 BAP +b+a0.8 b aa 0.2 bb +a0.1 bb aa 0.9 BP +b0.1 bb 0.9 a Ba, B Start / Select Join on BNormalize ABP +a+b8/17 +a bb 9/17
Example Choose A 26
Example Choose E Finish with B Normalize 27
Variable Elimination What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: saves time by marginalizing variables as soon as possible rather than at the end We will see special cases of VE later On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs You’ll have to implement a tree-structured special case to track invisible ghosts (Project 4)