Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Efficient Inference for General Hybrid Bayesian Networks Wei Sun PhD in Information Technology George Mason University, 2007.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Exact Inference in Bayes Nets
Computer vision: models, learning and inference Chapter 8 Regression.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Dynamic Bayesian Networks (DBNs)
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
Chapter 4: Linear Models for Classification
MATH 685/ CSI 700/ OR 682 Lecture Notes
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Visual Recognition Tutorial
Variational Inference and Variational Message Passing
Bayesian network inference
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Extending Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka.
Propagation in Poly Trees Given a Bayesian Network BN = {G, JDP} JDP(a,b,c,d,e) = p(a)*p(b|a)*p(c|e,b)*p(d)*p(e|d) a d b e c.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Inference in Gaussian and Hybrid Bayesian Networks ICS 275B.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Muhammad Moeen YaqoobPage 1 Moment-Matching Trackers for Difficult Targets Muhammad Moeen Yaqoob Supervisor: Professor Richard Vinter.
CSC2535 Spring 2013 Lecture 2a: Inference in factor graphs Geoffrey Hinton.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Probabilistic Robotics Bayes Filter Implementations.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Probability Refresher COMP5416 Advanced Network Technologies.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Lecture 2: Statistical learning primer for biologists
Dropout as a Bayesian Approximation
Belief Propagation and its Generalizations Shane Oldenburger.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Short Introduction to Particle Filtering by Arthur Pece [ follows my Introduction to Kalman filtering ]
Nonlinear State Estimation
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Bayesian Belief Propagation for Image Understanding David Rosenberg.
Extending Expectation Propagation for Graphical Models
PSG College of Technology
StatSense In-Network Probabilistic Inference over Sensor Networks
Filtering and State Estimation: Basic Concepts
Pattern Recognition and Image Analysis
Unscented Kalman Filter
Class #16 – Tuesday, October 26
Extending Expectation Propagation for Graphical Models
Lecture 3: Exact Inference in GMs
Parametric Methods Berlin Chen, 2005 References:
Markov Networks.
Presentation transcript:

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009

2 Outline Inference for hybrid Bayesian networks Message passing algorithm Direct message passing between discrete and continuous variables Gaussian mixture reduction Issues

3 Hybrid Bayesian Networks Type Continuous features Discrete features speed frequency … location category Type1 Class 2 … Both DISCRETE and CONTINUOUS variables are involved in a hybrid model.

4 Hybrid Bayesian Networks – Cont. The simplest hybrid BN model – Conditional Linear Gaussian (CLG)  no discrete child for continuous parent.  Linear relationship between continuous variables.  Clique Tree algorithm provides exact solution. General hybrid BNs  arbitrary continuous densities and arbitrary functional relationships between continuous variables.  No exact algorithm in general.  Approximate methods include discretization, simulation, conditional loopy propagation, etc.

5 Innovation Message passing between pure discrete variables or between pure continuous variables is well defined. But it is an open issue to exchange messages between heterogeneous variables. In this paper, we unify the message passing framework to exchange information between arbitrary variables.  Provides exact solutions for polytree CLG, with full density estimations, v.s. Clique Tree algorithm provides only first two moments. Both have same complexity.  Integrates unscented transformation to provide approximate solution for nonlinear non-Gaussian models.  Uses Gaussian mixture (GM) to represent continuous messages.  May apply GM reduction techniques to make the algorithm scalable.

6 Why Message Passing Local, distributed, less computations.

7 Message Passing in Polytree In polytree, any node d-separate the sub-network above it from the sub- network below it. For a typical node X in a polytree, evidence can be divided into two exclusive sets, and processed separately: Define messages and messages as: Multiply-connected network may not be partitioned into two separate sub- networks by a node. Then the belief of node X is:

8 Message Passing in Polytree – Cont In message passing algorithm, each node maintains Lambda value and Pi value for itself. Also it sends Lambda message to its parent and Pi message to its child. After finite-number iterations of message passing, every node obtains its correct belief. For polytree, MP returns exact belief; For networks with loop, MP is called loopy propagation that often gives good approximation to posterior distributions.

9 Message Passing in Hybrid Networks For continuous variable, messages are represented by Gaussian mixture (GM). Each state of discrete parent introduces A Gaussian component in continuous message. Unscented transformation is used to compute continuous message when function relationship defined in CPD (Conditional Probability Distribution) is nonlinear. When messages propagate, size of GM increased exponentially. Error-bounded GM reduction technique maintains the scalability of the algorithm.

10 Direct Passing between Disc. & Cont. U D X Non-negative constant. Gaussian mixture with discrete pi message as mixing prior, and is the inverse of function defined in CPD of X. Gaussian mixture with discrete pi message as mixing prior, and is the function specified in CPD of X. Message exchanged directly between discrete and continuous nodes, Size of GM increased when messages propagate. Need GM reduction technique to maintain scalability. A continuous node with both discrete and continuous parents.

11 Complexity Exploding?? Z Y XW T U A B

12 Scalability - Gaussian Mixture Reduction

13 Gaussian Mixture Reduction – Cont. Normalized integrated square error = 0.45%

14 Example – 4-comp. GM to 20-comp. GM NISE < 1%

15 Scalability - Error Propagation Approximate messages propagate, and so do the errors. We can have each approximation bounded. However, total errors after propagations is very difficult to estimate. Ongoing research:  having each GM reduction bounded with small error, we aim to have total approximation errors are still bounded, at least empirically.

16 Numerical Experiments – Polytree CLG Poly12CLG – a polytree BN model DMP v.s. Clique Tree Both have same complexity. Both provide exact solution for polytree. DMP provides full density estimation, while CT provides only the first two moments for continuous variables.

17 Numerical Experiments – Polytree CLG, with GM Reduction Poly12CLG – a polytree BN model GM pi value -> single Gaussian approx.

18 Numerical Experiments – Polytree CLG, with GM Reduction Poly12CLG – a polytree BN model GM lambda message -> single Gaussian approx.

19 Numerical Experiments – Polytree CLG, with GM Reduction Poly12CLG – a polytree BN model GM pi and lambda message -> single Gaussian approx.

20 Reduce GM under Bounded Error Each GM reduction has bounded error < 5%, then the inference performance improved significantly.

21 Numerical Experiments – Network with loops Loop13CLG – a BN model with loops Errors are from 1% to 5% due to loopy propagation.

22 Empirical Insights Combining pi does not affect network ‘above’; Combining lambda does not affect network ‘below’; Approximation errors due to GM reduction diminish for discrete nodes further away from the discrete parent nodes. Loopy propagation usually provides accurate estimations.

23 Summary & Future Research DMP provides an alternative algorithm for efficient inference in hybrid BNs:  Exact for polytree model  Full density estimations  Same complexity as Clique Tree  Scalable in trading off accuracy with computational complexity  Distributed algorithm, local computations only

24

25 A1A1 A2A2 AnAn A3A3 … Y1Y1 Y2Y2 Y3Y3 YnYn TE

26 Pi Value of A Cont. Node with both Disc. & Cont. Parents U D X Pi value of a continuous node is essentially a distribution transformed by the function defined in CPD of this node, with input distributions as all of pi messages sent from its parents. Pi value of a continuous node is essentially a distribution transformed by the function defined in CPD of this node, with input distributions as all of pi messages sent from its parents. With both discrete and continuous parents, pi value of the continuous node can be represented by a Gaussian mixture. With both discrete and continuous parents, pi value of the continuous node can be represented by a Gaussian mixture. Gaussian mixture with discrete pi message as mixing prior, and is the function specified in CPD of X.

27 Lambda Value of A Cont. Node Lambda value of a continuous node is a product of all lambda messages sent from its children. Lambda value of a continuous node is a product of all lambda messages sent from its children. Lambda message sending to a continuous node is definitely a continuous message in the form of Gaussian mixture because only continuous child is allowed for continuous node. Lambda message sending to a continuous node is definitely a continuous message in the form of Gaussian mixture because only continuous child is allowed for continuous node. Product of Gaussian mixture will be a Gaussian mixture with exponentially increased size. Product of Gaussian mixture will be a Gaussian mixture with exponentially increased size. X

28 Pi Message Sending to Cont. node from Disc. Parent U D X Pi message sending to a continuous node ‘X’ from its discrete parent is the product of pi value of the discrete parent and all of lambda messages sending to this discrete parent from all children except ‘X’. Pi message sending to a continuous node ‘X’ from its discrete parent is the product of pi value of the discrete parent and all of lambda messages sending to this discrete parent from all children except ‘X’. Lambda message sending to discrete node from its child is always a discrete vector. Lambda message sending to discrete node from its child is always a discrete vector. Pi value of discrete node is always a discrete distribution. Pi value of discrete node is always a discrete distribution. Pi message sending to a continuous node from its discrete parent is a discrete vector, representing the discrete parent’s state probabilities. Pi message sending to a continuous node from its discrete parent is a discrete vector, representing the discrete parent’s state probabilities.

29 Pi Message Sending to Cont. node from Cont. Parent U D X Pi message sending to a continuous node ‘X’ from its continuous parent is the product of pi value of the continuous parent and all of lambda messages sending to the continuous parent from all children except ‘X’. Pi message sending to a continuous node ‘X’ from its continuous parent is the product of pi value of the continuous parent and all of lambda messages sending to the continuous parent from all children except ‘X’. Lambda message sending to a continuous node from its child is always a continuous message, represented by GM. Lambda message sending to a continuous node from its child is always a continuous message, represented by GM. Pi value of a continuous node is always a continuous distribution, also represented by GM. Pi value of a continuous node is always a continuous distribution, also represented by GM. Pi message sending to a continuous node from its continuous parent is a continuous message, represented by a GM. Pi message sending to a continuous node from its continuous parent is a continuous message, represented by a GM.

30 Lambda Message Sending to Disc. Parent from Cont. node U D X Given each state of discrete parent, a function is defined between continuous node and its continuous parent. Given each state of discrete parent, a function is defined between continuous node and its continuous parent. For each state of discrete parent, lambda message sent from a continuous node is a integration of two continuous distributions (both represented by GM), resulting in a non-negative constant. For each state of discrete parent, lambda message sent from a continuous node is a integration of two continuous distributions (both represented by GM), resulting in a non-negative constant.

31 Lambda Message Sending to Cont. Parent from Cont. Node U D X Lambda message sending from a continuous node to its continuous parent is a Gaussian mixture using the pi message sending to it from its discrete parent as the mixing prior. Lambda message sending from a continuous node to its continuous parent is a Gaussian mixture using the pi message sending to it from its discrete parent as the mixing prior. Pi message sending to continuous node from its discrete parent is a discrete vector, serving as the mixing prior. Pi message sending to continuous node from its discrete parent is a discrete vector, serving as the mixing prior.

32 Unscented Transformation Unscented transformation (UT) is a deterministic sampling method Unscented transformation (UT) is a deterministic sampling method  UT approximates the first two moments of a continuous random variable transformed via an arbitrary nonlinear function.  UT is based on the principle that it is easier to approximate a probability distribution than a nonlinear function. deterministic sample points are chosen and propagated via the original function. deterministic sample points are chosen and propagated via the original function. Where is the dimension of X, is a scaling parameter. UT keeps the original function unchanged and results are exact for linear function.

33 Why Message Passing Local Distributed Less computations