Announcements  Office hours this week Tuesday 3.30-5 (as usual) and Wednesday 10.30-12  HW6 posted, due Monday 10/20.

Slides:



Advertisements
Similar presentations
Exact Inference in Bayes Nets
Advertisements

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Variable Elimination Dhruv Batra, Recitation 10/16/2008.
Bayesian network inference
Inference in Bayesian Nets
CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Ch. 14 – Probabilistic Reasoning Supplemental slides for CSE 327 Prof. Jeff Heflin.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Announcements Project 4: Ghostbusters Homework 7
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
Ch. 14 – Probabilistic Reasoning Supplemental slides for CSE 327 Prof. Jeff Heflin.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Inference in Bayesian Networks
Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
Bayesian Networks Probability In AI.
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Announcements Midterm: Wednesday 7pm-9pm
Inference III: Approximate Inference
CS 188: Artificial Intelligence Fall 2007
Variable Elimination Graphical Models – Carlos Guestrin
Probabilistic Reasoning
Presentation transcript:

Announcements  Office hours this week Tuesday (as usual) and Wednesday  HW6 posted, due Monday 10/20

CS 188: Artificial Intelligence Bayes Nets: Exact Inference Instructor: Stuart Russell--- University of California, Berkeley

Bayes Nets Part I: Representation Part II: Exact inference  Enumeration (always exponential complexity)  Variable elimination (worst-case exponential complexity, often better)  Inference is NP-hard in general Part III: Approximate Inference Later: Learning Bayes nets from data

 Examples:  Posterior marginal probability  P(Q|e 1,..,e k )  E.g., what disease might I have?  Most likely explanation:  argmax q,r,s P(Q=q,R=r,S=s|e 1,..,e k )  E.g., what did he say? Inference  Inference: calculating some useful quantity from a probability model (joint probability distribution)

Inference by Enumeration in Bayes Net  Reminder of inference by enumeration:  Any probability of interest can be computed by summing entries from the joint distribution  Entries from the joint distribution can be obtained from a BN by multiplying the corresponding conditional probabilities  P(B | j, m) = α P(B, j, m)  = α  e,a P(B, e, a, j, m)  = α  e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a)  So inference in Bayes nets means computing sums of products of numbers: sounds easy!!  Problem: sums of exponentially many products! BE A MJ

Can we do better?  Consider uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz  16 multiplies, 7 adds  Lots of repeated subexpressions!  Rewrite as (u+v)(w+x)(y+z)  2 multiplies, 3 adds   e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a)  = P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(  e)P(a|B,  e)P(j|a)P(m|a) + P(B)P(e)P(  a|B,e)P(j|  a)P(m|  a) + P(B)P(  e)P(  a|B,  e)P(j|  a)P(m|  a) Lots of repeated subexpressions! 6

Variable elimination: The basic ideas  Move summations inwards as far as possible  P(B | j, m) = α  e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a)  = α P(B)  e P(e)  a P(a|B,e) P(j|a) P(m|a)  Do the calculation from the inside out  I.e., sum over a first, the sum over e  Problem: P(a|B,e) isn’t a single number, it’s a bunch of different numbers depending on the values of B and e  Solution: use arrays of numbers (of various dimensions) with appropriate operations on them; these are called factors 7

Factor Zoo

Factor Zoo I  Joint distribution: P(X,Y)  Entries P(x,y) for all x, y  |X|x|Y| matrix  Sums to 1  Projected joint: P(x,Y)  A slice of the joint distribution  Entries P(x,y) for one x, all y  |Y|-element vector  Sums to P(x) A \ Jtruefalse true false P(A,J)P(A,J) P(a,J)P(a,J) Number of variables (capitals) = dimensionality of the table A \ Jtruefalse true

Factor Zoo II  Single conditional: P(Y | x)  Entries P(y | x) for fixed x, all y  Sums to 1  Family of conditionals: P(X |Y)  Multiple conditionals  Entries P(x | y) for all x, y  Sums to |Y| A \ Jtruefalse true P(J|a)P(J|a) A \ Jtruefalse true false P(J|A)P(J|A) } - P(J|a) } - P(J|  a)

Operation 1: Pointwise product  First basic operation: pointwise product of factors (similar to a database join, not matrix multiply!)  New factor has union of variables of the two original factors  Each entry is the product of the corresponding entries from the original factors  Example: P(J|A) x P(A) = P(A,J) P(J|A)P(J|A) P(A)P(A) P(A,J)P(A,J) A \ Jtruefalse true false A \ Jtruefalse true false true0.1 false0.9 x=

Example: Making larger factors  Example: P(A,J) x P(A,M) = P(A,J,M) P(A,J)P(A,J) A \ Jtruefalse true false x= P(A,M)P(A,M) A \ Mtruefalse true false A=true A=false P(A,J,M)P(A,J,M)

Example: Making larger factors  Example: P(U,V) x P(V,W) x P(W,X) = P(U,V,W,X)  Sizes: [10,10] x [10,10] x [10,10] = [10,10,10,10]  I.e., 300 numbers blows up to 10,000 numbers!  Factor blowup can make VE very expensive

Operation 2: Summing out a variable  Second basic operation: summing out (or eliminating) a variable from a factor  Shrinks a factor to a smaller one  Example:  j P(A,J) = P(A,j) + P(A,  j) = P(A) A \ Jtruefalse true false true0.1 false0.9 P(A)P(A) P(A,J)P(A,J) Sum out J

Summing out from a product of factors  Project the factors each way first, then sum the products  Example:  a P(a|B,e) x P(j|a) x P(m|a)  = P(a|B,e) x P(j|a) x P(m|a) +  P(  a|B,e) x P(j|  a) x P(m|  a)

Variable Elimination

 Query:  Start with initial factors:  Local CPTs (but instantiated by evidence)  While there are still hidden variables (not Q or evidence):  Pick a hidden variable H  Join all factors mentioning H  Eliminate (sum out) H  Join all remaining factors and normalize

Variable Elimination function VariableElimination(Q, e, bn) returns a distribution over Q factors ← [ ] for each var in ORDER(bn.vars) do factors ← [MAKE-FACTOR(var, e)|factors] if var is a hidden variable then factors ← SUM-OUT(var,factors) return NORMALIZE(POINTWISE-PRODUCT(factors)) 18

Example Choose A

Example Choose E Finish with B Normalize

Order matters  Order the terms Z, A, B C, D  P(D) = α  z,a,b,c P(z) P(a|z) P(b|z) P(c|z) P(D|z)  = α  z P(z)  a P(a|z)  b P(b|z)  c P(c|z) P(D|z)  Largest factor has 2 variables (D,Z)  Order the terms A, B C, D, Z  P(D) = α  a,b,c,z P(a|z) P(b|z) P(c|z) P(D|z) P(z)  = α  a  b  c  z P(a|z) P(b|z) P(c|z) P(D|z) P(z)  Largest factor has 4 variables (A,B,C,D)  In general, with n leaves, factor of size 2 n D Z ABC

VE: Computational and Space Complexity  The computational and space complexity of variable elimination is determined by the largest factor (and it’s space that kills you)  The elimination ordering can greatly affect the size of the largest factor.  E.g., previous slide’s example 2 n vs. 2  Does there always exist an ordering that only results in small factors?  No!

Worst Case Complexity? Reduction from SAT  CNF clauses: 1.A v B v C 2.C v D v  A 3.B v C v  D  P(AND) > 0 iff clauses are satisfiable  => NP-hard  P(AND) = S.0.5 n where S is the number of satisfying assignments for clauses  => #P-hard

Polytrees  A polytree is a directed graph with no undirected cycles  For poly-trees the complexity of variable elimination is linear in the network size if you eliminate from the leave towards the roots  This is essentially the same theorem as for tree- structured CSPs

Bayes Nets Part I: Representation Part II: Exact inference  Enumeration (always exponential complexity)  Variable elimination (worst-case exponential complexity, often better)  Inference is NP-hard in general Part III: Approximate Inference Later: Learning Bayes nets from data