Announcements Midterm: Wednesday 7pm-9pm

Slides:



Advertisements
Similar presentations
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Advertisements

Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Inference in Bayesian Nets
CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Bayesian networks Chapter 14. Outline Syntax Semantics.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Announcements  Office hours this week Tuesday (as usual) and Wednesday  HW6 posted, due Monday 10/20.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
CS 541: Artificial Intelligence
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Read R&N Ch Next lecture: Read R&N
Approximate Inference
Artificial Intelligence
CS 4/527: Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
Bayesian Networks Probability In AI.
CS 4/527: Artificial Intelligence
CSCI 121 Special Topics: Bayesian Networks Lecture #2: Bayes Nets
CAP 5636 – Advanced Artificial Intelligence
Read R&N Ch Next lecture: Read R&N
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
Professor Marie desJardins,
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CAP 5636 – Advanced Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Inference III: Approximate Inference
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
CS 188: Artificial Intelligence Fall 2007
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2006
Read R&N Ch Next lecture: Read R&N
Variable Elimination Graphical Models – Carlos Guestrin
Bayesian Networks CSE 573.
Warm-up as you walk in Each node in a Bayes net represents a conditional probability distribution. What distribution do you get when you multiply all of.
Probabilistic Reasoning
CS 188: Artificial Intelligence Fall 2008
Bayesian Networks: Structure and Semantics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Announcements Midterm: Wednesday 7pm-9pm See midterm prep page (posted on Piazza, inst.eecs page) Four rooms; your room determined by last two digits of your SID: 00-32: Dwinelle 155 33-45: Genetics and Plant Biology 100 46-62: Hearst Annex A1 63-99: Pimentel 1 Discussions this week by topic Survey: complete it before midterm; 80% participation = +1pt

Bayes net global semantics Bayes nets encode joint distributions as product of conditional distributions on each variable: P(X1,..,Xn) = i P(Xi | Parents(Xi))

Conditional independence semantics Every variable is conditionally independent of its non-descendants given its parents Conditional independence semantics <=> global semantics

Example V-structure JohnCalls independent of Burglary given Alarm? Yes JohnCalls independent of MaryCalls given Alarm? Burglary independent of Earthquake? Burglary independent of Earthquake given Alarm? NO! Given that the alarm has sounded, both burglary and earthquake become more likely But if we then learn that a burglary has happened, the alarm is explained away and the probability of earthquake drops back Burglary Earthquake Alarm John calls Mary calls

Markov blanket A variable’s Markov blanket consists of parents, children, children’s other parents Every variable is conditionally independent of all other variables given its Markov blanket

CS 188: Artificial Intelligence Bayes Nets: Exact Inference Please retain proper attribution, including the reference to ai.berkeley.edu. Thanks! Instructor: Sergey Levine and Stuart Russell--- University of California, Berkeley

Bayes Nets Part I: Representation Part II: Exact inference Enumeration (always exponential complexity) Variable elimination (worst-case exponential complexity, often better) Inference is NP-hard in general Part III: Approximate Inference Later: Learning Bayes nets from data

Inference Inference: calculating some useful quantity from a probability model (joint probability distribution) Examples: Posterior marginal probability P(Q|e1,..,ek) E.g., what disease might I have? Most likely explanation: argmaxq,r,s P(Q=q,R=r,S=s|e1,..,ek) E.g., what did he say?

Inference by Enumeration in Bayes Net Reminder of inference by enumeration: Any probability of interest can be computed by summing entries from the joint distribution Entries from the joint distribution can be obtained from a BN by multiplying the corresponding conditional probabilities P(B | j, m) = α P(B, j, m) = α e,a P(B, e, a, j, m) = α e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a) So inference in Bayes nets means computing sums of products of numbers: sounds easy!! Problem: sums of exponentially many products! B E A J M

Can we do better? Consider uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz 16 multiplies, 7 adds Lots of repeated subexpressions! Rewrite as (u+v)(w+x)(y+z) 2 multiplies, 3 adds e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a) = P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(e)P(a|B,e)P(j|a)P(m|a)

Variable elimination: The basic ideas Move summations inwards as far as possible P(B | j, m) = α e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a) = α P(B) e P(e) a P(a|B,e) P(j|a) P(m|a) Do the calculation from the inside out I.e., sum over a first, then sum over e Problem: P(a|B,e) isn’t a single number, it’s a bunch of different numbers depending on the values of B and e Solution: use arrays of numbers (of various dimensions) with appropriate operations on them; these are called factors

Factor Zoo

Factor Zoo I P(A,J) Joint distribution: P(X,Y) Projected joint: P(x,Y) Entries P(x,y) for all x, y |X|x|Y| matrix Sums to 1 Projected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for one x, all y |Y|-element vector Sums to P(x) P(A,J) A \ J true false 0.09 0.01 0.045 0.855 P(a,J) A \ J true false 0.09 0.01 Number of variables (capitals) = dimensionality of the table

Factor Zoo II } - P(J|a) } - P(J|a) P(J|a) Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y Sums to 1 Family of conditionals: P(X |Y) Multiple conditionals Entries P(x | y) for all x, y Sums to |Y| A \ J true false 0.9 0.1 P(J|A) A \ J true false 0.9 0.1 0.05 0.95 } - P(J|a) } - P(J|a)

Operation 1: Pointwise product First basic operation: pointwise product of factors (similar to a database join, not matrix multiply!) New factor has union of variables of the two original factors Each entry is the product of the corresponding entries from the original factors Example: P(J|A) x P(A) = P(A,J) P(J|A) P(A,J) P(A) A \ J true false 0.9 0.1 0.05 0.95 A \ J true false 0.09 0.01 0.045 0.855 true 0.1 false 0.9 x =

Example: Making larger factors Example: P(A,J) x P(A,M) = P(A,J,M) P(A,J,M) P(A,J) P(A,M) A \ J true false 0.09 0.01 0.045 0.855 A \ M true false 0.07 0.03 0.009 0.891 x = A=false A=true

Example: Making larger factors Example: P(U,V) x P(V,W) x P(W,X) = P(U,V,W,X) Sizes: [10,10] x [10,10] x [10,10] = [10,10,10,10] I.e., 300 numbers blows up to 10,000 numbers! Factor blowup can make VE very expensive

Operation 2: Summing out a variable Second basic operation: summing out (or eliminating) a variable from a factor Shrinks a factor to a smaller one Example: j P(A,J) = P(A,j) + P(A,j) = P(A) P(A,J) P(A) A \ J true false 0.09 0.01 0.045 0.855 Sum out J true 0.1 false 0.9

Summing out from a product of factors Project the factors each way first, then sum the products Example: a P(a|B,e) x P(j|a) x P(m|a) = P(a|B,e) x P(j|a) x P(m|a) + P(a|B,e) x P(j|a) x P(m|a)

Variable Elimination

Variable Elimination Query: P(Q|E1=e1,.., Ek=ek) Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize

Variable Elimination function VariableElimination(Q , e, bn) returns a distribution over Q factors ← [ ] for each var in ORDER(bn.vars) do factors ← [MAKE-FACTOR(var, e)|factors] if var is a hidden variable then factors ← SUM-OUT(var,factors) return NORMALIZE(POINTWISE-PRODUCT(factors))

Example Query P(B | j,m) P(B) P(E) P(A|B,E) P(j|A) P(m|A) P(A|B,E) Choose A P(A|B,E) P(j|A) P(m|A) P(j,m|B,E) P(B) P(E) P(j,m|B,E)

Example P(B) P(E) P(j,m|B,E) P(E) P(j,m|B) P(j,m|B,E) P(B) P(j,m|B) Choose E P(E) P(j,m|B,E) P(j,m|B) P(B) P(j,m|B) Finish with B P(B) P(j,m|B) P(j,m,B) Normalize P(B | j,m)

Order matters Order the terms Z, A, B C, D P(D) = α z,a,b,c P(z) P(a|z) P(b|z) P(c|z) P(D|z) = α z P(z) a P(a|z) b P(b|z) c P(c|z) P(D|z) Largest factor has 2 variables (D,Z) Order the terms A, B C, D, Z P(D) = α a,b,c,z P(a|z) P(b|z) P(c|z) P(D|z) P(z) = α a b c z P(a|z) P(b|z) P(c|z) P(D|z) P(z) Largest factor has 4 variables (A,B,C,D) In general, with n leaves, factor of size 2n

VE: Computational and Space Complexity The computational and space complexity of variable elimination is determined by the largest factor (and it’s space that kills you) The elimination ordering can greatly affect the size of the largest factor. E.g., previous slide’s example 2n vs. 2 Does there always exist an ordering that only results in small factors? No!

Worst Case Complexity? Reduction from SAT CNF clauses: A v B v C C v D v A B v C v D P(AND) > 0 iff clauses are satisfiable => NP-hard P(AND) = S x 0.5n where S is the number of satisfying assignments for clauses => #P-hard Make this model simpler (no reason for fan-in constraint)

Polytrees A polytree is a directed graph with no undirected cycles For poly-trees the complexity of variable elimination is linear in the network size if you eliminate from the leave towards the roots This is essentially the same theorem as for tree-structured CSPs

Bayes Nets Part I: Representation Part II: Exact inference Enumeration (always exponential complexity) Variable elimination (worst-case exponential complexity, often better) Inference is NP-hard in general Part III: Approximate Inference Later: Learning Bayes nets from data