Variational Inference in Bayesian Submodular Models

Slides:

Advertisements

Similar presentations

Mean-Field Theory and Its Applications In Computer Vision1 1.

Advertisements

Bayesian Belief Propagation

Beyond Convexity – Submodularity in Machine Learning

Concurrency Control for Machine Learning

Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS590M 2008 Fall: Paper Presentation

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.

Tutorial at ICCV (Barcelona, Spain, November 2011)

Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Learning with Inference for Discrete Graphical Models Nikos Komodakis Pawan Kumar Nikos Paragios Ramin Zabih (presenter)

Exact Inference in Bayes Nets

Parallel Double Greedy Submodular Maxmization Xinghao Pan, Stefanie Jegelka, Joseph Gonzalez, Joseph Bradley, Michael I. Jordan.

Computer vision: models, learning and inference Chapter 8 Regression.

Martin Burger Total Variation 1 Cetraro, September 2008 Variational Methods and their Analysis Questions: - Existence - Uniqueness - Optimality conditions.

Optimization Tutorial

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.

Separating Hyperplanes

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Computer vision: models, learning and inference

Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.

1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.

Message Passing Algorithms for Optimization

MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.

Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.

Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Probabilistic Inference Lecture 4 – Part 2 M. Pawan Kumar Slides available online

Submodularity in Machine Learning

Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.

Planar Cycle Covering Graphs for inference in MRFS The Typhon Algorithm A New Variational Approach to Ground State Computation in Binary Planar Markov.

Minimizing general submodular functions

Probabilistic Graphical Models

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

Lena Gorelick joint work with O. Veksler I. Ben Ayed A. Delong Y. Boykov.

Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,

Markov Random Fields Probabilistic Models for Images

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet Khli Microsoft Research Cambridge Presented.

Lecture 2: Statistical learning primer for biologists

Eric Xing © Eric CMU, Machine Learning Algorithms and Theory of Approximate Inference Eric Xing Lecture 15, August 15, 2010 Reading:

Mean field approximation for CRF inference

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

ICCV 2007 National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Half Quadratic Analysis for Mean Shift: with Extension.

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Bayesian Conditional Random Fields using Power EP Tom Minka Joint work with Yuan Qi and Martin Szummer.

Learning Deep Generative Models by Ruslan Salakhutdinov

Multiplicative updates for L1-regularized regression

Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides

Markov Random Fields with Efficient Approximations

Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab

Multimodal Learning with Deep Boltzmann Machines

Distributed Submodular Maximization in Massive Datasets

Statistical Learning Dong Liu Dept. EEIS, USTC.

Markov Networks.

Cost-effective Outbreak Detection in Networks

≠ Particle-based Variational Inference for Continuous Systems

Guess Free Maximization of Submodular and Linear Sums

Presentation transcript:

Variational Inference in Bayesian Submodular Models Josip Djolonga joint work with Andreas Krause

Motivation inference with higher order potentials MAP Computation ✓ Inference? ✘ We provide a method for inference in such models … as long as the factors are submodular

Submodular functions Set functions that have the diminishing returns property Many applications in machine learning sensor placement, summarization, structured norms, dictionary learning etc. Important implications for optimization

From optimization to distributions Instead of optimization, we take a Bayesian approach Log-supermodular Log-submodular Equivalent to distributions over binary vectors Conjugacy, closed under conditioning

Our workhorse: modular functions Additive submodular functions, defined as They have completely factorized distributions, with marginals

Remark on variational approximations Most useful when conditioning on data posterior prior x likelihood

Sub- and superdifferentials As convex functions, submodular functions have subdifferentials But they also have superdifferentials

The idea Using elements from the sub/superdifferentials we can bound F Which in turn yields bounds on the partition function We optimize over these upper and lower bounds Intervals for marginals, model evidence etc.

Optimize over subgradients The problem provably reduces to [D., Krause ’14] Equivalent to min-norm problem [D., Krause ’15, c.f. Nagano ‘07] Corollary: Thesholding the solution at ½ gives you a MAP configuration. (i.e., approximation shares mode).

Min-norm problem in a more familiar (proximal) form Cut on a chain Cuts in general

Connection to mean-field The mean field variational problem is In contrast, the Fenchel dual of our variational problem is Multilinear extension Expensive to evaluate, non-convex Lovász extension / Choquet integral

Divergence minimization Theorem: We optimize over all factorized distributions the Rényi divergence of infinite order P Q Approximating distribution is „inclusive“

The big picture how they all relate Divergence minimization qualitative results Dual: Lovász – Entropy Minimize bound on Z Convex minimization on B(F) Min-norm point fast algorithms ≡ ≡ ≡ ≡

Message passing for decomposable functions Sometimes, the functions have some structure so that

Message passing the messages

Message-passing convergence guarantees [based on Nishishara et al.] Equivalent to doing block coordinate descent One can also easily compute duality gaps at every iteration

Qualitative results (data from Jegelka et al.) Pairwise only (strong ↔ weak prior)

Experimental results over 36 finely labelled images (from Jegelka et al.) Higher-order Pairwise only Full image Near boundary

Message passing example (427x640 with 20,950 factors, approx. 1 Message passing example (427x640 with 20,950 factors, approx. 1.5s / iter)

Summary Algorithms for variational inference in submodular models Both lower and upper bounds on the partition function But also completely factorized approximate distributions Convergent message passing for models with structure

Thanks! Python code at http://people.inf.ethz.ch/josipd/