Bucket Renormalization for Approximate Inference

Slides:

Advertisements

Similar presentations

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

. Exact Inference in Bayesian Networks Lecture 9.

Fast Algorithms For Hierarchical Range Histogram Constructions

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Exact Inference in Bayes Nets

MPE, MAP AND APPROXIMATIONS Lecture 10: Statistical Methods in AI/ML Vibhav Gogate The University of Texas at Dallas Readings: AD Chapter 10.

Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.

CS774. Markov Random Field : Theory and Application Lecture 06 Kyomin Jung KAIST Sep

A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.

Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Recent Development on Elimination Ordering Group 1.

Global Approximate Inference Eran Segal Weizmann Institute.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Efficient Quantum State Tomography using the MERA in 1D critical system Presenter : Jong Yeon Lee (Undergraduate, Caltech)

Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

Probabilistic Graphical Models

Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

CS774. Markov Random Field : Theory and Application Lecture 02

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Two Approximate Algorithms for Belief Updating Mini-Clustering - MC Robert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating",

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

Learning Deep Generative Models by Ruslan Salakhutdinov

Extending Expectation Propagation for Graphical Models

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Inference in Bayesian Networks

Exact Inference Continued

Recovering Temporally Rewiring Networks: A Model-based Approach

Computability and Complexity

Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

Markov Networks.

Effective Social Network Quarantine with Minimal Isolation Costs

Synthesis of MCMC and Belief Propagation

Minimum Spanning Trees

Arthur Choi and Adnan Darwiche UCLA

CSCI 5822 Probabilistic Models of Human and Machine Learning

Exact Inference ..

Chapter 11 Limitations of Algorithm Power

Bucket Renormalization for Approximate Inference

Markov Random Fields Presented by: Vladan Radosavljevic.

≠ Particle-based Variational Inference for Continuous Systems

Minimum Spanning Trees

Arthur Choi and Adnan Darwiche UCLA

Algorithms and Theory of

Exact Inference Continued

Expectation-Maximization & Belief Propagation

Extending Expectation Propagation for Graphical Models

Approximate Inference by Sampling

Lecture 3: Exact Inference in GMs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Minimum Spanning Trees

Parallel Programming in C with MPI and OpenMP

Markov Networks.

Iterative Join Graph Propagation

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Bucket Renormalization for Approximate Inference Sungsoo Ahn1 Joint work with Michael Chertkov2, Adrian Weller3 and Jinwoo Shin1 1Korea Advanced Institute of Science and Technology (KAIST) 2Los Alamos National Laboratory (LANL) 3University of Cambridge June 7th, 2018

Goal: approximate inference in GMs Graphical model (GM) is a family of distributions, factorized by graphs. E.g., Ising model [Ising, 1920] for distribution of atomic spins. This talk is about undirected GMs with discrete variables. Protein structure (A) being modeled by graphical model (B) [Kamisety et al, 2008]

Goal: approximate inference in GMs Graphical model (GM) is a family of distributions, factorized by graphs.

Goal: approximate inference in GMs Graphical model (GM) is a family of distributions, factorized by graphs. Distribution requires space to specify (for binary variables). GM factorization allows to be stored in order of space. Partition function is essential for inference & normalization . However, NP-hard to compute, so we need approximations. e.g., MCMC, variational inference and approximate variable elimination

Approximate variable elimination Sequentially summing out variables (approximately) one-by-one. e.g., mini bucket elimination for upper bounding Z. terminates in fixed number of iterations. compared to other families, much faster but inaccurate. Bucket Renormalization New approximate variable eliminations with superior performance. variant of mini bucket elimination, but without bounding property. can also be seen as low-rank approximation of GMs

Summary for rest of the talk Variable (bucket) elimination Mini bucket renormalization (MBR) Global bucket renormalization (GBR)

Variable (bucket) elimination for exact Z For each variable in GM: Collect adjacent factors, i.e., bucket. Sum variable over bucket to generate a new factor.

Variable (bucket) elimination for exact Z For each variable in GM: Collect adjacent factors, i.e., bucket. Sum variable over bucket to generate a new factor.

Variable (bucket) elimination for exact Z For each variable in GM: Collect adjacent factors, i.e., bucket. Sum variable over bucket to generate a new factor. Requires computation & memory.

Variable (bucket) elimination for exact Z For each variable in GM: Collect adjacent factors, i.e., bucket. Generate new factor by marginalizing bucket over the variable. Complexity is determined by size of bucket. Key idea: replacing with approximation.

Summary for rest of the talk Variable (bucket) elimination Mini bucket renormalization (MBR) Global bucket renormalization (GBR)

Mini bucket renormalization Idea 1. Splitting variables, then adding compensating factors.

Mini bucket renormalization Idea 1. Splitting variables, then adding compensating factors. Number of splitting is decided by available resources. Choosing a nice compensation factor is important. Idea 2. Comparing with the optimal compensation.

Algorithm description Given variable to marginalize: Split the variables and generate mini buckets: Add compensating factors for each of split variables: Generate new factors by summing out each mini buckets:

Mini bucket renormalization Idea 2. Comparing with the optimal compensation. The resulting optimization is equivalent to rank-1 truncated SVD. minimize L2-difference

Connection to rank-1 truncated SVD Eventually, we are minimizing error of rank-1 projection:

Algorithm description Given compensating factors to choose: Sum out over mini bucket and : Compare with the optimal compensation:

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Illustration of mini bucket renormalization MBR with elimination order 1,2,3,4,5 with memory budget .

Why MBR is called a renormalization Splitting & compensation without variable elimination results in a renormalized GM: This can be interpreted as a tractable approximation to original GM.

Summary for rest of the talk Variable (bucket) elimination Mini bucket renormalization (MBR) Global bucket renormalization (GBR)

Global bucket renormalization (GBR) Recall from mini bucket renormalization: minimize L2-difference GBR aims to find a better choice of compensation at cost of additional computation.

Global bucket renormalization (GBR) Idea: increasing the scope of comparison: minimize L2-difference

Global bucket renormalization (GBR) Idea: increasing the scope of comparison: However, complexity is hard as computing the partition function. As a heuristic, we perform comparison in the renormalized GM. minimize L2-difference

Global bucket renormalization (GBR) Idea: increasing the scope of comparison: However, complexity is hard as computing the partition function. As a heuristic, we perform comparison in the renormalized GM.

Experiments We measure log-Z approximation ratio of our algorithms: mini bucket renormalization (MBR) global bucket renormalization (GBR) and compare with 4 existing algorithms: mini bucket elimination (MBE) weighted mini bucket elimination (WMBE) belief propagation (BP) and mean field approximation (MF).

Ising GM experiments Comparison over varying interaction parameter (or temperature). GBR > MBR > MF > WMBE ≈ MBE > BP GBR > MBR > BP > MF > WMBE > MBE complete graph with 15 variables grid graph with 15x15 variables

UAI 2014 competition experiments Number in brackets denote # of algorithms dominating others. GBR ≈ MBR > BP > WMBE > MBE Promedus dataset Linkage dataset

Conclusion We proposed bucket renormalization, based on splitting & compensation Highly inspired from tensor network renormalization (TNR) in statistica l physics and tensor decomposition algorithms. Arxiv version available at: https://arxiv.org/abs/1803.05104 Thank you for listening!

Ising GM experiments Comparison over varying order of available memory ibound. complete graph with 15 variables grid graph with 15x15 variables