Inference in Bayesian Nets

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Introduction of Probabilistic Reasoning and Bayesian Networks
Probabilistic Reasoning (2)
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Markov Networks.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Toothache  toothache catch  catch catch  catch cavity  cavity Joint PDF.
Bayesian network inference
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Belief Propagation, Junction Trees, and Factor Graphs
Example applications of Bayesian networks
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Announcements Homework 8 is out Final Contest (Optional)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
2 Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions Exact inference by enumeration Exact.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Announcements  Office hours this week Tuesday (as usual) and Wednesday  HW6 posted, due Monday 10/20.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Probabilistic Reasoning Inference and Relational Bayesian Networks.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
CS 541: Artificial Intelligence
CS 2750: Machine Learning Directed Graphical Models
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
CS b553: Algorithms for Optimization and Learning
Bayesian Networks Probability In AI.
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Still More Uncertainty
CSCI 5822 Probabilistic Models of Human and Machine Learning
Structure and Semantics of BN
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Structure and Semantics of BN
Approximate Inference by Sampling
Markov Networks.
Probabilistic Reasoning
Bayesian Networks: Structure and Semantics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Inference in Bayesian Nets Objective: calculate posterior prob of a variable x conditioned on evidence Y and marginalizing over Z (unobserved vars) Exact methods: Enumeration Factoring Variable elimination Factor graphs (read 8.4.2-8.4.4 in Bishop, p. 398-411) Belief propagation Approximate Methods: sampling (read Sec 14.5)

from: Inference in Bayesian Networks (D’Ambrosio, 1999)

Factors A factor is a multi-dimensional table, like a CPT fAJM(B,E) 2x2 table with a “number” for each combination of B,E Specific values of J and M were used A has been summed out f(J,A)=P(J|A) is 2x2: fJ(A)=P(j|A) is 1x2: {p(j|a),p(j|a)} p(j|a) p(j|a) p(j|a) p(j|a)

Use of factors in variable elimination:

Pointwise product given 2 factors that share some variables: f1(X1..Xi,Y1..Yj), f2(Y1..Yj,Z1..Zk) resulting table has dimensions of union of variables, f1*f2=F(X1..Xi,Y1..Yj,Z1..Zk) each entry in F is a truth assignment over vars and can be computed by multiplying entries from f1 and f2 A B f1(A,B) T 0.3 F 0.7 0.9 0.1 B C f2(B,C) T 0.2 F 0.8 0.6 0.4 A B C F(A,B,C) T 0.3x0.2 F 0.3x0.8 0.7x0.6 0.7x0.4 0.9x0.2 0.9x0.8 0.1x0.6 0.1x0.4

Factor Graph Bipartite graph variable nodes and factor nodes one factor node for each factor in joint prob. edges connect to each var contained in each factor

F(B) F(E) B E F(A,B,E) A F(J,A) F(M,A) J M

Message passing Choose a “root” node, e.g. a variable whose marginal prob you want, p(A) Assign values to leaves For variable nodes, pass m=1 For factor nodes, pass prior: f(X)=p(X) Pass messages from var node v to factor u Product over neighboring factors Pass messages from factor u to var node v sum out neighboring vars w

Terminate when root receives messages from all neighbors …or continue to propagate messages all the way back to leaves Final marginal probability of var X: product of messages from each neighboring factor; marginalizes out all variables in tree beyond neighbor Conditioning on evidence: Remove dimension from factor (sub-table) F(J,A) -> FJ(A)

Belief Propagation (this figure happens to come from http://www.pr-owl.org/basics/bn.php) see also: wiki, Ch. 8 in Bishop PR&ML

Computational Complexity Belief propagation is linear in the size of the BN for polytrees Belief propagation is NP-hard for trees with “cycles”

Inexact Inference Sampling Generate a (large) set of atomic events (joint variable assignments) <e,b,-a,-j,m> <e,-b,a,-j,-m> <-e,b,a,j,m> ... Answer queries like P(J=t|A=f) by averaging how many times events with J=t occur among those satisfying A=f

Direct sampling create an independent atomic event repeat many times for each var in topological order, choose a value conditionally dependent on parents sample from p(Cloudy)=<0.5,0.5>; suppose T sample from p(Sprinkler|Cloudy=T)=<0.1,0.9>, suppose F sample from P(Rain|Cloudy=T)=<0.8,0.2>, suppose T sample from P(WetGrass|Sprinkler=F,Rain=T)=<0.9,0,1>, suppose T event: <Cloudy,Sprinkler,Rain,WetGrass> repeat many times in the limit, each event occurs with frequency proportional to its joint probability, P(Cl,Sp,Ra,Wg)= P(Cl)*P(Sp|Cl)*P(Ra|Cl)*P(Wg|Sp,Ra) averaging: P(Ra,Cl) = Num(Ra=T&Cl=T)/|Sample|

Rejection sampling to condition upon evidence variables e, average over samples that satisfy e P(j,m|e,b) <e,b,-a,-j,m> <e,-b,a,-j,-m> <-e,b,a,j,m> <-e,-b,-a,-j,m> <-e,-b,a,-j,-m> <e,b,a,j,m> <-e,-b,a,j,-m> <e,-b,a,j,m> ...

Likelihood weighting sampling might be inefficient if conditions are rare P(j|e) – earthquakes only occur 0.2% of the time, so can only use ~2/1000 samples to determine frequency of JohnCalls during sample generation, when reach an evidence variable ei, force it to be known value accumulate weight w=P p(ei|parents(ei)) now every sample is useful (“consistent”) when calculating averages over samples x, weight them: P(j|e) = aSconsistent w(x)=<SJ=T w(x), SJ=F w(x)>

Gibbs sampling (MCMC) start with a random assignment to vars set evidence vars to observed values iterate many times... pick a non-evidence variable, X define Markov blanket of X, mb(X) parents, children, and parents of children re-sample value of X from conditional distrib. P(X|mb(X))=aP(X|parents(X))*P P(y|parents(X)) for ychildren(X) generates a large sequence of samples, where each might “flip a bit” from previous sample in the limit, this converges to joint probability distribution (samples occur for frequency proportional to joint PDF)

Other types of graphical models Hidden Markov models Gaussian-linear models Dynamic Bayesian networks Learning Bayesian networks known topology: parameter estimation from data structure learning: topology that best fits the data Software BUGS Microsoft