Accelerated Sampling for the Indian Buffet Process

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Learning the structure of Deep sparse Graphical Model Ryan Prescott Adams Hanna M Wallach Zoubin Ghahramani Presented by Zhengming Xing Some pictures are.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Online Learning for Collaborative Filtering
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
A unifying framework for hybrid data-assimilation schemes Peter Jan van Leeuwen Data Assimilation Research Center (DARC) National Centre for Earth Observation.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Variational Inference for the Indian Buffet Process
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Latent Class Regression Model Graphical Diagnostics Using an MCMC Estimation Procedure Elizabeth S. Garrett Scott L. Zeger Johns Hopkins University
An Introduction to Kalman Filtering by Arthur Pece
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Genotype Calling Matt Schuerman. Biological Problem How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Short Introduction to Particle Filtering by Arthur Pece [ follows my Introduction to Kalman filtering ]
Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.
Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
The Phylogenetic Indian Buffet Process : A Non- Exchangeable Nonparametric Prior for Latent Features By: Kurt T. Miller, Thomas L. Griffiths and Michael.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Fast search for Dirichlet process mixture models
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Nonparametric Bayesian Learning of Switching Dynamical Processes
Simple Instances of Swendson-Wang & RJMCMC
Particle Filtering for Geometric Active Contours
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Random walk initialization for training very deep feedforward networks
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
STA 216 Generalized Linear Models
Chapter 5 Markov Analysis
SMEM Algorithm for Mixture Models
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Nonparametric Bayesian Texture Learning and Synthesis
MCMC Inference over Latent Diffeomorphisms
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Wellington Cabrera Advisor: Carlos Ordonez
Wellington Cabrera Advisor: Carlos Ordonez
Vector Spaces RANK © 2012 Pearson Education, Inc..
Sequential Learning with Dependency Nets
Fitting generalized linear models
Stochastic Methods.
Presentation transcript:

Accelerated Sampling for the Indian Buffet Process Finale Doshi-Velez and Zoubin Ghahramani ICML 2009 Presented by: John Paisley, Duke University

Introduction The IBP is a nonparametric prior for inferring the number of underlying features in a dataset, as well as which subset of these features are used by a given observation, e.g., the number of underlying notes in a piece of music and which notes are used at what times. Gibb sampling currently has some issues: The uncollapsed Gibbs sampler is slow to mix, while the collapsed Gibbs sampler is slow to complete an iteration when the number of samples is large. This paper presents an accelerated Gibbs sampling method for the linear-Gaussian model that is fast in both respects.

The IBP and Linear-Gaussian Model The IBP story: The first customer walks into the buffet and samples dishes. The Nth customer samples previously tasted dishes with probability proportional to the number of previous customers who have sampled the dish, and samples more dishes. This prior can be used for the linear-Gaussian model, to infer the number of underlying vectors that are linearly combined to construct a matrix.

IBP Calculations for Z Integrating out A, the posteriors for for Z are calculated for existing features as, and for new features as, Collapsing out the loadings matrix, A, the likelihood term is, which significantly increases the amount of computation that needs to be done to calculate the likelihood.

IBP Calculations for Z When A is not integrated out, inference is faster because matrices do not need to be inverted. Also, when finding the posterior for a value in the nth row of Z, we don’t need to worry about the other rows of Z since the likelihood can be represented as For the linear-Gaussian model, the posterior of A is Gaussian with,

Accelerated Gibbs Sampling Goal: Develop a sampler that mixes like the collapsed sampler, but has the speed of the uncollapsed sampler. To do this, select a window and split the data into two groups.

Accelerated Gibbs Sampling By splitting the data this way, we can write the probabilities for Z as, We therefore don’t need to worry about X-w when calculating likelihoods.

Accelerated Gibbs Sampling We can efficiently compute means and covariances with data removed, And then efficiently update the mean and covariance using all data using the statistics for the window. And rank-one updates can be used when updating

Experiments (Synthetic Data) Generate a linear-Gaussian model from the IBP prior with D = 10.

Experiments (real data) Experiments were run on several real datasets for 500 iterations or 150 hours (max). Data set information is below. Per-iteration run times and performance measures are at right.

Experiments (real data)

Discussion and Conclusion The accelerated Gibbs sampler achieved similar performance with the other two sampling methods, but at a much faster rate. Rank-one updates are less precise (due to round-offs). It is important to sometimes invert the entire matrix. Performance is also faster than slice-sampling. Also, it doesn’t rely on proposals (Metropolis-Hastings) or particle counts (particle filters). Efficient computations were obtained by collapsing locally on a window using posteriors calculated from data outside of this window.