Statistical Methods in AI/ML Bucket elimination Vibhav Gogate.

Slides:



Advertisements
Similar presentations
Tree Clustering for Constraint Networks 1 Chris Reeson Advanced Constraint Processing Fall 2009 By Rina Dechter & Judea Pearl Artificial Intelligence,
Advertisements

Constraint Satisfaction Problems
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Presented By: Saleh A. Almugrin * Based and influenced by many works of Hans L. Bodlaender, * Based and influenced by many works of Hans.
Lauritzen-Spiegelhalter Algorithm
Minimum Spanning Tree Sarah Brubaker Tuesday 4/22/8.
Anagh Lal Tuesday, April 08, Chapter 9 – Tree Decomposition Methods- Part II Anagh Lal CSCE Advanced Constraint Processing.
Exact Inference in Bayes Nets
1 Steiner Tree on graphs of small treewidth Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.
Minimum Spanning Trees
CSE 421 Algorithms Richard Anderson Lecture 23 Network Flow Applications.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Junction tree Algorithm :Probabilistic Graphical Models Recitation: 10/04/07 Ramesh Nallapati.
From Variable Elimination to Junction Trees
Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
CS774. Markov Random Field : Theory and Application Lecture 06 Kyomin Jung KAIST Sep
1 Directional consistency Chapter 4 ICS-179 Spring 2010 ICS Graphical models.
Discussion #36 Spanning Trees
Recent Development on Elimination Ordering Group 1.
Global Approximate Inference Eran Segal Weizmann Institute.
1 Directional consistency Chapter 4 ICS-275 Spring 2007.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Graphs and Trees This handout: Trees Minimum Spanning Tree Problem.
Belief Propagation, Junction Trees, and Factor Graphs
Exact Inference: Clique Trees
PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.
Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)
1 Treewidth, partial k-tree and chordal graphs Delpensum INF 334 Institutt fo informatikk Pinar Heggernes Speaker:
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Daphne Koller Variable Elimination Graph-Based Perspective Probabilistic Graphical Models Inference.
Optimization of Wavelength Assignment for QoS Multicast in WDM Networks Xiao-Hua Jia, Ding-Zhu Du, Xiao-Dong Hu, Man-Kei Lee, and Jun Gu, IEEE TRANSACTIONS.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
Solving Bayesian Decision Problems: Variable Elimination and Strong Junction Tree Methods Presented By: Jingsong Wang Scott Langevin May 8, 2009.
Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
1 Directional consistency Chapter 4 ICS-275 Spring 2009 ICS Constraint Networks.
Programming Abstractions Cynthia Lee CS106X. Graphs Topics Graphs! 1.Basics  What are they? How do we represent them? 2.Theorems  What are some things.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Foundations of Constraint Processing, Spring 2009 Structure-Based Methods: An Introduction 1 Foundations of Constraint Processing CSCE421/821, Spring 2009.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Today Graphical Models Representing conditional dependence graphically
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
An introduction to chordal graphs and clique trees
Structure-Based Methods Foundations of Constraint Processing
2-5 Reason Using Properties from Algebra
Structure-Based Methods Foundations of Constraint Processing
Chapter 9: Graphs Basic Concepts
Structure-Based Methods Foundations of Constraint Processing
Boi Faltings and Martin Charles Golumbic
Running example The 4-houses puzzle:
Structure-Based Methods Foundations of Constraint Processing
Boi Faltings and Martin Charles Golumbic
Variable Elimination 2 Clique Trees
Lecture 3: Exact Inference in GMs
Clique Tree Algorithm: Computation
Variable Elimination Graphical Models – Carlos Guestrin
Structure-Based Methods Foundations of Constraint Processing
Chapter 9: Graphs Basic Concepts
An Introduction Structure-Based Methods
Structure-Based Methods Foundations of Constraint Processing
Presentation transcript:

Statistical Methods in AI/ML Bucket elimination Vibhav Gogate

Bucket Elimination: Initialization A B C D E F  (A,C)  (C,E)  (D,F)  (B,D)  (C,D)  (A,B) You put each function in exactly one bucket How? Along the order, find the first bucket such that one of the variable’s in the function’s scope is the bucket variable AEDFBCAEDFBC  (E,F)

Bucket elimination: Processing Buckets Process in order Multiply all the functions in the bucket Sum-out the bucket variable Put the new function in one of the buckets obeying the initialization constraint A B C D E F  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) AEDFBCAEDFBC ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? A B C D E F AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z

Bucket elimination: Why it works? AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z and so on.

Bucket elimination: Complexity AEDFBCAEDFBC  (C,E)  (E,F)  (D,F)  (B,D)  (C,D) ψ(B,C) ψ(C,F) ψ(B,C,F) ψ 2 (B,C) ψ(C)  (A,C)  (A,B) Z exp(3) exp(4) exp(3) exp(2) exp(1) ≈6exp(3) Complexity: O(nexp(w)) w: scope of the largest function generated n:#variables

Bucket elimination: Determining complexity graphically Schematic operation on a graph – Process nodes in order – Connect all children of a node to each other E D F B C A A B C D E F

Bucket elimination: Complexity Complexity of processing a bucket “i” – exp(children i ) Complexity of bucket elimination – nexp(max(children i )) E D F B C A

Treewidth and Tree Decompositions Running schematic bucket elimination yields a chordal graph – Each cycle of length > 3 has a chord (an edge connecting two nodes that are not adjacent in the cycle) Every chordal graph can be represented using a tree decomposition

Tree Decomposition of Chordal graphs E D F B C AABC EFC DBCF FBC BC C FC FBC BC C

Tree Decomposition and Treewidth: Definition Given a network and its interaction graph Tree Decomposition is a set of subset of variables connected by a tree such that: – Each variable is present in at least one subset – Each edge is present in at least one subset – The set of subsets containing a variable “X” form a connected sub-tree Running intersection property Width of a tree decomposition: Cardinality of the maximum subset minus 1 Treewidth: minimum width out of all possible tree decompositions

Bucket elimination: Complexity Best possible complexity: O(nexp(w+1)) where w is the treewidth of the graph Thus, we have a graph-based algorithm for determining the complexity of bucket elimination. If w is small, we can solve the problem efficiently!

Generating Tree Decompositions Computing treewidth is NP-hard Branch and Bound algorithm (Gogate&Dechter, 2004) Best-first search algorithm – (Dow and Korf, 2009) Heuristics in practice – min-fill heuristic – min-degree heuristic

Min-degree and min-fill min-degree – At each point, select a variable with minimum degree (ties broken arbitrarily) – Connect the children of the variable to each other min-fill – At each point, select a variable that adds the minimum number of edges to the current graph – Connect the children of the selected variable to each other

Computing all Marginals Bucket elimination computes – P(e) or Z – P(X i |e) where “X i ” is the last variable eliminated To compute all marginals P(X i |e) for all variables X i – Run bucket elimination “n” times Efficient algorithm – Junction tree algorithm or bucket tree propagation – Requires only two passes to compute all marginals

Junction tree algorithm: An exact message passing algorithm Construct a tree decomposition T Initialize the tree decomposition as in bucket elimination Select an arbitrary node of T as root Pass messages from leaves to root (upward pass) Pass messages from root to leaves (downward pass)

Message passing Equations Multiply all received messages except from R Multiply all functions Sum-out all variables except the separator S R

Computing all marginals S

Message passing Equations Select “EFC” as root Pass messages from leaves to root Pass messages from root to leaves ABC EFC DBCF FBC BC C FC FBC BC C  (C,E)  (E,F)  (D,F)  (B,D)  (C,D)  (A,C)  (A,B)

Architectures Shenoy-Shafer architecture Hugin architecture – Associate one function with each cluster – Requires multiplication – Smaller time complexity – Higher space complexity