Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations Arthur Choi and Adnan Darwiche University of California, Los Angeles.

Slides:



Advertisements
Similar presentations
Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Exact Inference in Bayes Nets
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Bayesian network inference
Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Network Representation Continued
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Today Logistic Regression Decision Trees Redux Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.
If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
A Brief Introduction to Graphical Models
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
Belief Propagation Revisited Adnan Darwiche. Graphical Models Battery Age Alternator Fan Belt Battery Charge Delivered Battery Power Starter Radio LightsEngine.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Announcements Project 4: Ghostbusters Homework 7
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Belief Propagation and its Generalizations Shane Oldenburger.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Pattern Recognition and Machine Learning
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
ED-BP: Belief Propagation via Edge Deletion UCLA Automated Reasoning Group Arthur Choi, Adnan Darwiche, Glen Lenker, Knot Pipatsrisawat Last updated 07/15/2010:
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Learning Deep Generative Models by Ruslan Salakhutdinov
Belief Propagation and Approximate Inference: Compensating for Relaxations Arthur Choi.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Networks.
Arthur Choi and Adnan Darwiche UCLA
Structure and Semantics of BN
Arthur Choi and Adnan Darwiche UCLA
Structure and Semantics of BN
Lecture 3: Exact Inference in GMs
Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche University of California, Los Angeles.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Mean Field and Variational Methods Loopy Belief Propagation
Generalized Belief Propagation
Presentation transcript:

Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations Arthur Choi and Adnan Darwiche University of California, Los Angeles

Many-Pairs Mutual Information X Y mutual information

d-Separation If X and Y are d-separated by Z then X and Y are independent given Z Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Are R and B d-separated by E?

d-Separation Each path is a pipe. Each variable is a valve. A valve is either open or closed. Are R and B d-separated by A? Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R)

d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) W Sequential Valve

d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Radio? (R) W Call? (C) Divergent Valve

d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Convergent Valve W

d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Are R and B d-separated by E? E is closed. A is closed. R and B is d-separated.

d-Separation E is open. A is open. R and B are not d-separated. Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) Are R and B d-separated by A?

d-Separation Earthquake? (E) Burglary? (B) Alarm? (A) Call? (C) Radio? (R) What if E or A are “nearly” closed ? Are R and B “nearly” independent ?

Mutual Information and Entropy Mutual Information: non-negative; zero iff X and Y are ind. given z

d-Separation versus MI d-Separation hard outcomes graphical test no inference needed efficient Mutual Information soft outcomes non-graphical requires inference joint marginals on pairs of variables many-pairs MI is difficult

d-Separation versus MI d-Separation hard outcomes graphical test no inference needed efficient Mutual Information soft outcomes non-graphical requires inference joint marginals on pairs of variables many-pairs MI is difficult soft d-Separation (in polytrees) combine advantages of d-Separation and MI graphical test with soft outcomes

Mutual Information and Entropy Mutual Information: non-negative; zero iff X and Y are ind. given z Entropy: non-negative; zero iff X is fixed; maximized by uniform distribution

Soft d-Separation in Polytrees W Sequential Valve Theorem 1: MI(X;Y | z)  ENT(W | z) XWY

Soft d-Separation in Polytrees W Divergent Valve Theorem 1: MI(X;Y | z)  ENT(W | z) X W Y

Soft d-Separation in Polytrees N1N1 WN1N1 Convergent Valve Theorem 2: MI(X;Y | z)  MI(N 1 ;N 2 | z) XN1N1 W N2N2 Y

Soft d-Separation in Polytrees soft d-separation X W1W1 W2W2 W3W3 W4W4 W5W5 W6W6 Y sd-sep(X,z,Y) = 0if X and Y disconnected = MI(X;Y|z)if X and Y are adjacent = smallest valve bound, otherwise

Soft d-Separation in Polytrees soft d-separation X W1W1 W2W2 W3W3 W4W4 W5W5 W6W6 Y sd-sep(X,z,Y) = 0if X and Y disconnected = MI(X;Y|z)if X and Y are adjacent = smallest valve bound, otherwise MI(X;Y|z)  sd-sep(X,z,Y)

d-Separation vs. MI vs. soft d-sep d-Separation hard outcomes graphical test no inference needed efficient MI soft outcomes non-graphical requires inference joint marginals on pairs of variables many-pairs MI is difficult soft d-sep soft outcomes graphical test requires inference family and node marginals efficient in polytrees

Many-Pairs Mutual Information Mutual information can be expensive, even in polytrees Bayesian network n variables, at most w parents and s states One run of BP: O(ns w ) time single pair: MI: O(s) runs of BP, O(s  ns w ) time Pr(X,Y|z) = Pr(X|Y,z) Pr(Y|z) sd-sep: one run of BP, O(n + ns w ) time k-pairs: MI: O(ks) runs of BP, O(ks  ns w ) time sd-sep: one run of BP, O(kn + ns w ) time

Application: ED-BP Loopy BP marginals Exact Inference ED-BP networks: [CD06] recover edges: mutual information

Empirical Analysis soft d-separation versus true MI Start with polytree ED-BP approximation (equivalently, run loopy BP) Score deleted edges by sd-sep and true-MI efficiency important here Recover the highest ranking edges approximation accuracy important here

Empirical Analysis edge rank (true MI) alarm true-MI edge rank (true MI) alarm true-MI

Empirical Analysis edge rank (true MI) alarm true-MI sd-sep edge rank (true MI) alarm true-MI sd-sep

Empirical Analysis x edges recovered average KL-error alarm random

Empirical Analysis x edges recovered average KL-error alarm random true-MI

Empirical Analysis x edges recovered average KL-error alarm random true-MI sd-sep

Empirical Analysis edge rank (true MI) pigs true-MI sd-sep edge rank (true MI) pigs true-MI sd-sep

Empirical Analysis x edges recovered average KL-error pigs random true-MI sd-sep

Empirical Analysis networkmethod0%10%20%rank time# deleted# params barleyrandom115ms120ms141ms0ms MI111ms93ms2999ms sd-sep110ms125ms46ms65.84x diabetesrandom732ms1103ms1651ms0ms MI550ms674ms84604ms sd-sep957ms1639ms132ms641.99x mildewrandom238ms241ms243ms0ms MI233ms263ms6661ms sd-sep245ms323ms42ms157.26x munin1random13ms14ms22ms0ms MI12ms10ms680ms sd-sep10ms 35ms19.57x

Alternative Proposals & Extensions Extensions to general networks convergent valves problematic look at node-disjoint paths Extensions to undirected models entropy bounds on nodes find separating set with minimum aggregate bound optimal solution via network flows easier to generalize, bounds not as tight

Thanks!