Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi and Tommi S. Jaakkola, MIT NIPS 2006 Presented by: John Paisley Duke University, ECE 3/13/2009.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,

Combining Information from Related Regressions Duke University Machine Learning Group Presented by Kai Ni Apr. 27, 2007 F. Dominici, G. Parmigiani, K.

Optimization Tutorial

Shinichi Nakajima Sumio Watanabe　 Tokyo Institute of Technology

Linear Discriminant Functions

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Guillaume Bouchard Xerox Research Centre Europe

BAYESIAN INFERENCE Sampling techniques

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.

Variational Methods TCD Interests Simon Wilson. Background We are new to this area of research – so we can’t say very much about it – but we’re enthusiastic!

Shawn Sickel A Comparison of some Iterative Methods in Scientific Computing.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Machine Learning CMPT 726 Simon Fraser University

Expectation Maximization Algorithm

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Online Learning for Latent Dirichlet Allocation

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Variational Inference for the Indian Buffet Process

Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

1 Analytic Solution of Hierarchical Variational Bayes Approach in Linear Inverse Problem Shinichi Nakajima, Sumio Watanabe Nikon Corporation Tokyo Institute.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Biointelligence Laboratory, Seoul National University

linear  2.3 Newton’s Method ( Newton-Raphson Method ) 1/12 Chapter 2 Solutions of Equations in One Variable – Newton’s Method Idea: Linearize a nonlinear.

Bayes Theorem The most likely value of x derived from this posterior pdf therefore represents our inverse solution. Our knowledge contained in is explicitly.

Lecture 2: Statistical learning primer for biologists

Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.

Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.

CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct

Recitation4 for BigData Jay Gu Feb MapReduce.

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Mean field approximation for CRF inference

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis B. P. Carlin and A. E. Gelfand Statistics and Computing 1991 A Generic Approach to Posterior.

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

ベーテ自由エネルギーに対するCCCPアルゴリズムの拡張

A Fast Trust Region Newton Method for Logistic Regression

Variational Bayes Model Selection for Mixture Distribution

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Probabilistic Models for Linear Regression

A Comparison of some Iterative Methods in Scientific Computing

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Markov Networks.

Presented by: Mingyuan Zhou Duke University, ECE Feb 22, 2013

Akio Utsugi National Institute of Bioscience and Human-technology,

More about Posterior Distributions

Stochastic Optimization Maximization for Latent Variable Models

Presented by: Mingyuan Zhou Duke University, ECE February 18, 2011

Pattern Recognition and Machine Learning

Heuristic Search Value Iteration

Mixture Models with Adaptive Spatial Priors

The EM Algorithm With Applications To Image Epitome

CS639: Data Management for Data Science

Markov Networks.

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

Section 3: Second Order Methods

ACHIEVEMENT DESCRIPTION

Presentation transcript:

Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi and Tommi S. Jaakkola, MIT NIPS 2006 Presented by: John Paisley Duke University, ECE 3/13/2009

Outline Introduction PX-VB algorithm Applications –Bayesian Probit Regression –Automatic Relevance Determination Convergence Properties Conclusion

Introduction Variational Bayes is a popular method for approximating the posterior distribution of a model. Can be slow to converge if variables are strongly correlated Parameter-expanded methods can speed convergence by adding auxiliary parameters, which can remove the strong coupling of parameters.

PX-VB algorithm Auxiliary variables are added and optimized with each iteration. The original parameters are then recovered by setting the auxiliary variables to the values that recover the original model.

Bayesian Probit Regression The original model: Where TN is the truncated-Gaussian The parameter-expanded model: Where q(z_n) and q(w) updated with this is followed by the inverse mapping

Bayesian Probit Regression: Results

Automatic Relevance Determination (RVM) Separate auxiliary variables As well as an auxiliary variable for \alpha, the details for which are omitted Shared auxiliary variable The auxiliary variable c is optimized with each iteration using the iterative Newton method, as no closed form solution exists.

Automatic Relevance Determination: Results

Convergence Properties A general convergence theorem was presented and proven:

Conclusion The theorem and proof shows that as long as the inverse mapping function, M_a, has a largest eigenvalue smaller than 1, PX-VB is guaranteed to converge faster than VB, with the rate of convergence increasing as this value decreases. The approach presented was a general method for speeding up VB inference. This was demonstrated on two popular Bayesian models.