Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Lauritzen-Spiegelhalter Algorithm
Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.
Statistical Methods in AI/ML Bucket elimination Vibhav Gogate.
Bayesian Networks Bucket Elimination Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Anagh Lal Tuesday, April 08, Chapter 9 – Tree Decomposition Methods- Part II Anagh Lal CSCE Advanced Constraint Processing.
MURI Progress Report, June 2001 Advances in Approximate and Hybrid Reasoning for Decision Making Under Uncertainty Rina Dechter UC- Irvine Collaborators:
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.
MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Introduction Combining two frameworks
Bayesian network inference
1 Exact Inference Algorithms Bucket-elimination and more COMPSCI 179, Spring 2010 Set 8: Rina Dechter (Reading: chapter 14, Russell and Norvig.
Global Approximate Inference Eran Segal Weizmann Institute.
1 Exact Inference Algorithms for Probabilistic Reasoning; COMPSCI 276 Fall 2007.
Approximation Techniques bounded inference COMPSCI 276 Fall 2007.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
Tree Decomposition methods Chapter 9 ICS-275 Spring 2007.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Tutorial #9 by Ma’ayan Fishelson
Exact Inference: Clique Trees
On the Power of Belief Propagation: A Constraint Propagation Perspective Rina Dechter Bozhena Bidyuk Robert Mateescu Emma Rollon.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
AND/OR Search for Mixed Networks #CSP Robert Mateescu ICS280 Spring Current Topics in Graphical Models Professor Rina Dechter.
Inference in Gaussian and Hybrid Bayesian Networks ICS 275B.
1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos.
Presented by: Ma’ayan Fishelson. Proposed Projects 1.Performing haplotyping on the input data. 2.Creating a friendly user-interface for the statistical.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
Introduction to Bayesian Networks
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
Variable and Value Ordering for MPE Search Sajjad Siddiqi and Jinbo Huang.
Approximation Techniques bounded inference 275b. SP22 Mini-buckets: “local inference” The idea is similar to i-consistency: bound the size of recorded.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Belief Propagation and its Generalizations Shane Oldenburger.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.
Foundations of Constraint Processing, Spring 2009 Structure-Based Methods: An Introduction 1 Foundations of Constraint Processing CSCE421/821, Spring 2009.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Tutorial #9 by Ma’ayan Fishelson. 2 Bucket Elimination Algorithm An algorithm for performing inference in a Bayesian network. Similar algorithms can.
Probabilistic Equational Reasoning Arthur Kantor
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
Tree Decomposition methods Chapter 9 ICS-275 Spring 2009.
ED-BP: Belief Propagation via Edge Deletion UCLA Automated Reasoning Group Arthur Choi, Adnan Darwiche, Glen Lenker, Knot Pipatsrisawat Last updated 07/15/2010:
Constraint Optimization And counting, and enumeration 275 class
Bucket Renormalization for Approximate Inference
Generalized Belief Propagation
Class #16 – Tuesday, October 26
Iterative Join Graph Propagation
Presentation transcript:

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002 Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative Join- Graph Propagation”, UAI 2002

What is Mini-Clustering? Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks MC is an anytime version of join-tree clustering MC applies message passing along a cluster tree The complexity of MC is controlled by a user-adjustable parameter, the i-bound Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling)

Motivation Probabilistic reasoning using belief networks is known to be NP-hard Nevertheless, approximate inference can be a powerful tool for decision making under uncertainty We propose an anytime version of Cluster Tree Elimination

Outline Preliminaries Belief networks Tree decompositions Tree Clustering algorithm Mini-Clustering algorithm Experimental results

The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y  X given evidence e. We focus on the basic case where Y is a single variable X i G E F CD B A Belief networks

Tree decompositions

A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC G E F CD B A Belief networkTree decomposition

G E F C D B A Example: Join-tree A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) B E F p(e|b,f) E F G p(g|e,f) EF BF BC

Cluster Tree Elimination Cluster Tree Elimination (CTE) is an exact algorithm that works by passing messages along a tree decomposition Basic idea: Each node sends only one message to each of its neighbors Node u sends a message to its neighbor v only when u received messages from all its other neighbors Previous work on tree clustering: Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)

Cluster Tree Elimination Cluster Tree Elimination (CTE) is an exact algorithm It works by passing messages along a tree decomposition Basic idea: Each node sends only one message to each of its neighbors Node u sends a message to its neighbor v only when u received messages from all its other neighbors

Cluster Tree Elimination Previous work on tree clustering: Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)

u v x1x1 x2x2 xnxn Belief Propagation h(u,v)

u v x1x1 x2x2 xnxn Belief Propagation h(u,v)

ABC BEF EFG EF BF BC BCDF G E F CD B A Cluster Tree Elimination - example

Cluster Tree Elimination - the messages A B C p(a), p(b|a), p(c|a,b) B C D F p(d|b), p(f|c,d) h (1,2) (b,c) B E F p(e|b,f), h (2,3) (b,f) E F G p(g|e,f) EF BC BF sep(2,3)={B,F} elim(2,3)={C,D}

Cluster Tree Elimination - properties Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence. Time complexity: O ( deg  (n+N)  d w*+1 ) Space complexity: O ( N  d sep ) wheredeg = the maximum degree of a node n = number of variables (= number of CPTs) N = number of nodes in the tree decomposition d = the maximum domain size of a variable w* = the induced width sep = the separator size

Mini-Clustering - motivation Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible

Mini-Clustering - the basic idea Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini-cluster The idea was explored for variable elimination (Mini- Bucket)

Mini-Clustering Motivation: Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem When the induced width w* is big, CTE algorithm becomes infeasible The basic idea: Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables Accuracy parameter i = maximum number of variables in a mini-cluster The idea was explored for variable elimination (Mini-Bucket)

Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables TC computes the ‘exact’ message: We want to process each  f  mc(k) f separately Mini-Clustering

Approximate each  f  mc(k) f, k=2,…,p and take it outside the summation How to process the mini-clusters to obtain approximations or bounds: Process all mini-clusters by summation - this gives an upper bound on the joint probability A tighter upper bound: process one mini-cluster by summation and the others by maximization Can also use mean operator (average) - this gives an approximation of the joint probability

Split a cluster into mini-clusters =>bound complexity Idea of Mini-Clustering

EF BF BC ABC BEF EFG BCDF Mini-Clustering - example

Mini-Clustering - the messages, i=3 A B C p(a), p(b|a), p(c|a,b) B C D p(d|b), h (1,2) (b,c) C D F p(f|c,d) B E F p(e|b,f), h 1 (2,3) (b), h 2 (2,3) (f) E F G p(g|e,f) EF BC BF sep(2,3)={B,F} elim(2,3)={C,D}

Cluster Tree Elimination vs. Mini-Clustering ABC BEF EFG EF BF BC BCDF ABC BEF EFG EF BF BC BCDF

Mini-Clustering Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(X i,e) of each variable and each of its values. Time & space complexity: O(n  hw*  d i ) where hw* = max u | {f | f   (u)   } |

Normalization Algorithms for the belief updating problem compute, in general, the joint probability: Computing the conditional probability: is easy to do if exact algorithms can be applied becomes an important issue for approximate algorithms

MC can compute an (upper) bound on the joint P(X i,e) Deriving a bound on the conditional P(X i |e) is not easy when the exact P(e) is not available If a lower bound would be available, we could use: as an upper bound on the posterior In our experiments we normalized the results and regarded them as approximations of the posterior P(X i |e) Normalization

Experimental results We tested MC with max and mean operators Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization (approximate) Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks

Experimental results Measures: Normalized Hamming Distance pick most likely value (for exact and for approximate) take ratio between number of disagreements and total number of variables average over problems BER (Bit Error Rate) - for coding networks Absolute error difference between exact and the approximate, averaged over all values, all variables, all problems Relative error difference between exact and the approximate, divided by the exact, averaged over all values, all variables, all problems Time

Experimental results Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization (approximate) Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks We tested MC with max and mean operators Measures: Normalized Hamming Distance (NHD) BER (Bit Error Rate) Absolute error Relative error Time

Random networks - Absolute error evidence=0evidence=10

Coding networks - Bit Error Rate sigma=0.22sigma=.51

Noisy-OR networks - Absolute error evidence=10evidence=20

CPCS422 - Absolute error evidence=0evidence=10

Grid 15x evidence

Grid 15x evidence

Grid 15x evidence

N=100, P=3, w*=7 Coding Networks 1

N=100, P=4, w*=11 Coding Networks 2

CPCS54 - w*=15

N=50, P=2, w*=10 Noisy-OR Networks 1

N=50, P=3, w*=16 Noisy-OR Networks 2

N=50, P=2, w*=10 Random Networks 1

N=50, P=3, w*=16 Random Networks 2

Conclusion MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms